Analyzing learner language: the case of the Hebrew Learner Essay Corpus

Chen Gafni, Livnat Herzig Sheinfux, Hadar Klunover, Anat Bar Siman Tov, Anat Prior, Shuly Wintner

Research output: Contribution to journalArticlepeer-review


We present the Hebrew Learner Essay Corpus (HELEECS): an annotated corpus of Hebrew language argumentative essays authored by prospective higher-education students. The corpus includes essays by two main populations: (1) essays by native speakers of Hebrew, written as part of the psychometric exam that is used to assess their future success in academic studies; (2) essays by non-native speakers of Hebrew, with three different native languages (Arabic, French, and Russian), that were written as part of a language aptitude test. The corpus is uniformly encoded and stored. The non-native essays were annotated with target hypotheses (i.e., hypothesized intended formulations in standard written Hebrew). The corpus is available for research purposes upon request. We describe the corpus and the error correction and annotation schemes used in its analysis. In addition to introducing this new resource, we discuss the challenges of identifying and analyzing non-native language use. Among these challenges are determining whether the language used in a particular utterance is native-like, and determining the target hypothesis when language use is non-native-like. We propose various ways for dealing with these challenges.

Original languageEnglish
JournalLanguage Resources and Evaluation
StateAccepted/In press - 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.


  • Crosslinguistic influence
  • Educational applications
  • Hebrew
  • Learner corpora
  • Non-native language


Dive into the research topics of 'Analyzing learner language: the case of the Hebrew Learner Essay Corpus'. Together they form a unique fingerprint.

Cite this