The HunLearner Corpus
Introduction
HunLearner is a learners' corpus of Hungarian. We collected written data from 35 students majoring in Hungarian studies at the University of Zagreb, Croatia. Texts were morphologically and syntactically analyzed by the magyarlanc tool.
Texts include essays on the difficulties of learning Hungarian, compositions on a friendly person and ficticious letters written by a Hungarian immigrant from England.
Statistical data on the corpus
Text | Number of texts | Number of sentences | Number of tokens |
---|---|---|---|
Difficulties | 181 | 559 | 10,433 |
Friendly person | 6 | 134 | 1,930 |
England | 11 | 258 | 3,936 |
Total | 35 | 951 | 16,299 |
Downloads
- The morphologically and syntactically tagged HunLearner corpus.
- The HunLearner corpus with corrected nouns and error codes.
Reference
- Durst, Péter; Szabó, Martina Katalin; Vincze, Veronika; Zsibrita, János 2014: Using Automatic Morphological Tools to Process Data from a Learner Corpus of Hungarian. Apples: Journal of Applied Language Studies 8:(3) pp. 39-54.
- Vincze, Veronika; Zsibrita, János; Durst, Péter; Szabó, Martina Katalin 2014: Automatic Error Detection concerning the Definite and Indefinite Conjugation in the HunLearner Corpus. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), ELRA, Reykjavik, Iceland, pp. 3958-3962.
For further information please contact Veronika Vincze (vinczev AT inf.u-szeged.hu).