Identifying Universal Laws of Text Translation

I. Kanter, H Kfir, B Malkiel, M Shlesinger

Research output: Contribution to journalArticlepeer-review


Straightforward quantitative analyses of authentic texts have allowed linguists and translation scholars to discern patterns in individual languages as well as features which set translations apart from originals1,2 . A language can also be studied statistically, an approach epitomized by the application of Zipf's Law3 , which states that word-frequency distributions follow an almost identical curve regardless of language. To date, no universal law governing the joint probability distribution of words in two or more languages has been either proposed or observed. This study identifies new universal behaviours which characterize the mutual overlaps between a corpus of original English and three corpora of translated English. Specifically, it suggests a remarkable similarity in (a) the number of types unique to each translated corpus, and (b) the number of types common to the original-English corpus and each of the translated corpora. We argue that these universal behaviours can be used both to determine the ontological status of an unidentified 1 language (whether it is an original or a translation) and to identify the source language of a translation.
Original languageAmerican English
JournalJournal of Quantitive Linguistics
Issue number13(1)
StatePublished - 2006


Dive into the research topics of 'Identifying Universal Laws of Text Translation'. Together they form a unique fingerprint.

Cite this