Automatic free text tagging

Free text tagging is the task of assigning a few natural language phrases to documents which summarize them and semantically represent their content. The Group has developed a solution for the automatic tagging of the [origo] archive.

By registering an average of 1.330.000 unique visitors daily and reaching about 45 percent of all the internet users in Hungary [origo] is the most visited news site in the country. [origo] was launched in December, 1998 and more than 370.000 articles have been published since then, making [origo] admittedly the owner of the largest and most diverse Hungarian digital news archives.

From the application side, the tagging of news has several benefits, like contextual advertising, organization of the news set, behavioral targeting and increasing connectivity. From a researcher's point of view, it differs from the tasks of tag recommendation and keyphrase extraction, and has several special characteristics (like the importance of NEs, structured documents etc.).

The 370 thousands of articles in the news archive could not have been tagged by either the community of readers or the team of journalists. We showed that free-text-tagging can be carried out by an automatic system achieving a satisfactory accuracy of 77.5 percent.

Reference

  • Richárd Farkas, Berend Gábor, István Hegedûs, András Kárpáti, Balázs Krich: Automatic free-text-tagging of online news archives. Proceedings of the 19th European Conference on Artificial Intelligence - ECAI 2010 (nominated for the best paper of the conference).