Determining an author's native language by mining a text for errors

Moshe Koppel, Jonathan Schler, Kfir Zigdon

Research output: Contribution to conferencePaperpeer-review

134 Scopus citations

Abstract

In this paper, we show that stylistic text features can be exploited to determine an anonymous author's native language with high accuracy. Specifically, we first use automatic tools to ascertain frequencies of various stylistic idiosyncrasies in a text. These frequencies then serve as features for support vector machines that learn to classify texts according to author native language.

Original languageEnglish
Pages624-628
Number of pages5
DOIs
StatePublished - 2005
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: 21 Aug 200524 Aug 2005

Conference

ConferenceKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/TerritoryUnited States
CityChicago, IL
Period21/08/0524/08/05

Keywords

  • Author profiling
  • Text mining

Fingerprint

Dive into the research topics of 'Determining an author's native language by mining a text for errors'. Together they form a unique fingerprint.

Cite this