Experiments with Language Models for Word Completion and Prediction in Hebrew

Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In this paper, we describe various language models (LMs) and combinations created to support word prediction and completion in Hebrew. We define and apply 5 general types of LMs: (1) Basic LMs (unigrams, bigrams, trigrams, and quadgrams), (2) Backoff LMs, (3) LMs Integrated with tagged LMs, (4) Interpolated LMs, and (5) Interpolated LMs Integrated with tagged LMs. 16 specific implementations of these LMs were compared using 3 types of Israeli web newspaper corpora. The foremost keystroke saving results were achieved with LMs of the most complex variety, the Interpolated LMs Integrated with tagged LMs. Therefore, we conclude that combining all strengths by creating a synthesis of all four basic LMs and the tagged LMs leads to the best results.

Original languageEnglish
Pages (from-to)450-462
Number of pages13
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8686
DOIs
StatePublished - 2014
Externally publishedYes

Bibliographical note

Publisher Copyright:
© Springer International Publishing Switzerland 2014.

Keywords

  • Hebrew
  • Keystroke savings
  • Language models
  • Word completion
  • Word prediction

Fingerprint

Dive into the research topics of 'Experiments with Language Models for Word Completion and Prediction in Hebrew'. Together they form a unique fingerprint.

Cite this