The BioScope corpus

Introduction

The BioScope corpus consists of medical and biological texts annotated for negation, speculation and their linguistic scope. This was done to allow a comparison between the development of systems for negation/hedge detection and scope resolution. The corpus is publicly available for research purposes.

BioNLP-2008 paper on BioScope (please cite if you make use of the corpus):

Corpus download

  • The corpus consists of texts taken from 3 different sources in order to ensure that it captures the heterogeneity of language use in the biomedical domain. Here is DTD for the xml files containing the annotations: DTD
  • Abstracts of the Genia corpus: xml v1.1 (In version 1.1 the Genia UIDs were replaced by PMIDs)
  • Full scientific articles, 5 articles from FlyBase (the same articles were used by Medlock and Briscoe (2007)) and 4 articles from the open access BMC Bioinformatics repository: xml
  • Clinical free-texts: The radiology report corpus that was used for the CMC clinical coding challenge. The negation/hedge annotated version of the corpus can be obtained (due to licencing issues) by downloading the original 'ICD-9-CM coding' corpus from Cincinatti Children's Hospital site and merge it with our annotation: readme, merger software.
  • The full corpus and the evaluation code in one file: zip
  • For the evaluation of the CoNLL-2010 Shared Task, 15 more biomedical articles were annotated for hedges and their scope, which can be accessed at the shared task website, following a registration.

Inter-agreement analysis

The BioScope corpus was annotated by two independent linguists following the guidelines written by our linguist expert before the annotation of the corpus was initiated. These guidelines were developed throughout the annotation process as annotators were often confronted with problematic issues. The annotators were not allowed to communicate with each other as far as the annotation process was concerned, but they could turn to the expert when needed and regular meetings were also held between the annotators and the linguist expert in order to discuss recurring and/or frequent problematic issues. When the two annotations for one subcorpus were finalized, differences between the two were resolved by the linguist expert, yielding the gold standard labeling of the subcorpus.

We measured the consistency level of the annotation using inter-annotator agreement analysis. The inter-annotator agreement rate is defined as the F-measure of one annotation, treating the second one as the gold standard. The evaluation has two levels: first keyword F-measures are calculated, then left/right/full scope F-measures are gathered around the true positive keyword matches.

The JAVA code for evaluating scope annotations: zip , readme

Corpus statistics

In the table below, agreement rates are provided in the following format: the first number in each cell represents the agreement rate between the two annotators, whereas the second and third numbers give the agreement rate between one of the annotators and the chief annotator:

type   clinical records abstracts full articles
NEGATION        
  keyword 90.70 / 94.56 / 95.81 91.46 / 91.71 / 98.05 79.42 / 86.77 / 91.71
  left scope 86.27 / 86.86 / 97.95 97.78 / 97.90 / 100 83.44 / 82.42 / 95.87
  right scope 88.88 / 91.26 / 97.39 94.56 / 95.17 / 99.42 84.36 / 88.19 / 95.09
  full scope 76.29 / 79.32 / 95.35 92.46 / 93.07 / 99.42 70.86 / 73.35 / 91.21
SPECULATION        
  keyword 84.01 / 89.86 / 92.37 79.12 / 83.92 / 92.05 77.60 / 81.49 / 90.81
  left scope 89.36 / 88.90 / 97.60 87.52 / 88.37 / 97.58 75.49 / 80.13 / 92.15
  right scope 91.28 / 92.64 / 97.90 87.13 / 89.92 / 96.16 82.40 / 83.28 / 96.97
  full scope 81.90 / 82.88 / 95.54 76.72 / 80.07 / 94.04 62.50 / 66.72 / 89.67