The HunLearner Corpus

Introduction

HunLearner is a learners' corpus of Hungarian. We collected written data from 35 students majoring in Hungarian studies at the University of Zagreb, Croatia. Texts were morphologically and syntactically analyzed by the magyarlanc tool.

Texts include essays on the difficulties of learning Hungarian, compositions on a friendly person and ficticious letters written by a Hungarian immigrant from England.

Statistical data on the corpus

Text Number of texts Number of sentences Number of tokens
Difficulties 181 559 10,433
Friendly person 6 134 1,930
England 11 258 3,936
Total 35 951 16,299

Downloads

  • The morphologically and syntactically tagged HunLearner corpus.
  • The HunLearner corpus with corrected nouns and error codes.

Reference

For further information please contact Veronika Vincze (vinczev AT inf.u-szeged.hu).