Keyphrase extraction


We created a supervised system for extracting the keyphrases for the reason of authors' opinion from product reviews collected from the site. The datasets for two fairly different product review domains related to movies and mobile phones were constructed semiautomatically based on the pros and cons entered by the authors. These pros and cons are ill-structured free-text annotations and their length, depth and style are extremely heterogeneous. In order to have clean gold-standard corpora, we manually revised the segmentation and the contents of the pros and cons, and obtained sets of tag-like keyphrases.

The system illustrates that the classic supervised keyphrase extraction approach - mostly used for scientific genre previously - could be adapted for opinion-related keyphrases.


  • The dataset used in the experiments.
  • Statistical data on the number of annotated keyphrases per document pdf.
  • Annotation guidelines for the refinement and formulation of keyphrases.


For further information please contact Gábor Berend (berendg AT