The HunLearner Corpus

Introduction

HunLearner is a learners' corpus of Hungarian. We collected written data from 35 students majoring in Hungarian studies at the University of Zagreb, Croatia. Texts were morphologically and syntactically analyzed by the magyarlanc tool.

Texts include essays on the difficulties of learning Hungarian, compositions on a friendly person and ficticious letters written by a Hungarian immigrant from England.

Statistical data on the corpus

Text	Number of texts	Number of sentences	Number of tokens
Difficulties	181	559	10,433
Friendly person	6	134	1,930
England	11	258	3,936
Total	35	951	16,299

Downloads

The morphologically and syntactically tagged HunLearner corpus.
The HunLearner corpus with corrected nouns and error codes.

Reference

Durst, Péter; Szabó, Martina Katalin; Vincze, Veronika; Zsibrita, János 2014: Using Automatic Morphological Tools to Process Data from a Learner Corpus of Hungarian. Apples: Journal of Applied Language Studies 8:(3) pp. 39-54.
Vincze, Veronika; Zsibrita, János; Durst, Péter; Szabó, Martina Katalin 2014: Automatic Error Detection concerning the Definite and Indefinite Conjugation in the HunLearner Corpus. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), ELRA, Reykjavik, Iceland, pp. 3958-3962.

For further information please contact Veronika Vincze (vinczev AT inf.u-szeged.hu).