SzegedParalell Corpus
Introduction
The English-Hungarian parallel corpus contains texts selected on the basis of grammatical and translational criteria. Sentences representing the grammar of the given language (usually taken from language books) and authentic texts are both included in the parallel corpus, thus, the balance is maintained between artificially constructed and natural language structures.
Both paragraph and sentence alignment were checked and corrected manually.
Data on corpus texts are shown here:
texts | sentence alignment unit |
---|---|
language book sentences | 3,496 |
texts on the European Union | 1,518 |
Horizon Magazine | 3,980 |
Resource Ingatlan Info | 1,340 |
literature | 88,716 |
miscellaneous | 695 |
TOTAL | 99,745 |
Reference
- Krisztina Tóth, Richárd Farkas, András Kocsor: Hybrid algorithm for sentence alignment of Hungarian-English parallel corpora. Acta Cybernetica 18(3):463-478. (2008)
For further information please contact Veronika Vincze (vinczev AT inf.u-szeged.hu).