Web Page of HomePage Corpus
Introduction
This site is the support site of the HomePage Corpus and the Annotation Tool of the corpus. This corpus is a manually and extensively annotated corpus for Web Content Mining. It is freely available for research purposes. We developed an Annotation Tool, which is a Firefox extension which allows the annotator to work with the pages in their original apperance. This tool handles the annotation hierarchy independently of the DOM tree of the web pages, and it allows overlapped annotation between HTML tags. For more details please read our article.
Downloads
- Annotator
- Annotator tool as a Firefox extension
- Firefox Portable with Annotation Tool
- A manually annotated HTML corpus for a novel scientific trend analysis
- Statistics