Learning an expert from human annotations in Statistical Machine Translation: The case of Out-Of-Vocabulary words

Wilker Aziz, Marc Dymetman, Shachar Mirkin, Lucia Specia, Nicola Cancedda, Ido Dagan

Research output: Contribution to conferencePaperpeer-review

12 Scopus citations

Abstract

We present a general method for incorporating an "expert" model into a Statistical Machine Translation (SMT) system, in order to improve its performance on a particular "area of expertise", and apply this method to the specific task of finding adequate replacements for Out-of-Vocabulary (OOV) words. Candidate replacements are paraphrases and entailed phrases, obtained using monolingual resources. These candidate replacements are transformed into "dynamic biphrases", generated at decoding time based on the context of each source sentence. Standard SMT features are enhanced with a number of new features aimed at scoring translations produced by using different replacements. Active learning is used to discriminatively train the model parameters from human assessments of the quality of translations. The learning framework yields an SMT system which is able to deal with sentences containing OOV words but also guarantees that the performance is not degraded for input sentences without OOV words. Results of experiments on English-French translation show that this method outperforms previous work addressing OOV words in terms of acceptability.

Original languageEnglish
StatePublished - 2010
Event14th Annual Conference of the European Association for Machine Translation, EAMT 2010 - Saint-Raphael, France
Duration: 27 May 201028 May 2010

Conference

Conference14th Annual Conference of the European Association for Machine Translation, EAMT 2010
Country/TerritoryFrance
CitySaint-Raphael
Period27/05/1028/05/10

Fingerprint

Dive into the research topics of 'Learning an expert from human annotations in Statistical Machine Translation: The case of Out-Of-Vocabulary words'. Together they form a unique fingerprint.

Cite this