Web Page of HomePage Corpus

Introduction

This site is the support site of the HomePage Corpus and the Annotation Tool of the corpus. This corpus is a manually and extensively annotated corpus for Web Content Mining. It is freely available for research purposes. We developed an Annotation Tool, which is a Firefox extension which allows the annotator to work with the pages in their original apperance. This tool handles the annotation hierarchy independently of the DOM tree of the web pages, and it allows overlapped annotation between HTML tags. For more details please read our article.

Downloads