Identification of transliterated foreign words in Hebrew script

Yoav Goldberg, Michael Elhadad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

We present a loosely-supervised method for context-free identification of transliterated foreign names and borrowed words in Hebrew text. The method is purely statistical and does not require the use of any lexicons or linguistic analysis tool for the source languages (Hebrew, in our case). It also does not require any manually annotated data for training - we learn from noisy data acquired by over-generation. We report precision/recall results of 80/82 for a corpus of 4044 unique words, containing 368 foreign words.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 9th International Conference, CICLing 2008, Proceedings
Pages466-477
Number of pages12
DOIs
StatePublished - 2008
Externally publishedYes
Event9th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2008 - Haifa, Israel
Duration: 17 Feb 200823 Feb 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4919 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2008
Country/TerritoryIsrael
CityHaifa
Period17/02/0823/02/08

Fingerprint

Dive into the research topics of 'Identification of transliterated foreign words in Hebrew script'. Together they form a unique fingerprint.

Cite this