SzegedParalell Corpus


The English-Hungarian parallel corpus contains texts selected on the basis of grammatical and translational criteria. Sentences representing the grammar of the given language (usually taken from language books) and authentic texts are both included in the parallel corpus, thus, the balance is maintained between artificially constructed and natural language structures.

Both paragraph and sentence alignment were checked and corrected manually.

Data on corpus texts are shown here:

texts sentence alignment unit
language book sentences 3,496
texts on the European Union 1,518
Horizon Magazine 3,980
Resource Ingatlan Info 1,340
literature 88,716
miscellaneous 695
TOTAL 99,745


For further information please contact Veronika Vincze (vinczev AT


  • The corpus in "one alignment in a row" format.
  • The corpus with separate English and Hungarian files.