Abstract
In this paper, we describe various language models (LMs) and combinations created to support word prediction and completion in Hebrew. We define and apply 5 general types of LMs: (1) Basic LMs (unigrams, bigrams, trigrams, and quadgrams), (2) Backoff LMs, (3) LMs Integrated with tagged LMs, (4) Interpolated LMs, and (5) Interpolated LMs Integrated with tagged LMs. 16 specific implementations of these LMs were compared using 3 types of Israeli web newspaper corpora. The foremost keystroke saving results were achieved with LMs of the most complex variety, the Interpolated LMs Integrated with tagged LMs. Therefore, we conclude that combining all strengths by creating a synthesis of all four basic LMs and the tagged LMs leads to the best results.
Original language | English |
---|---|
Pages (from-to) | 450-462 |
Number of pages | 13 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 8686 |
DOIs | |
State | Published - 2014 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© Springer International Publishing Switzerland 2014.
Keywords
- Hebrew
- Keystroke savings
- Language models
- Word completion
- Word prediction