Authorship attribution in the wild

Moshe Koppel, Jonathan Schler, Shlomo Argamon

Research output: Contribution to journalArticlepeer-review

195 Scopus citations

Abstract

Most previous work on authorship attribution has focused on the case in which we need to attribute an anonymous document to one of a small set of candidate authors. In this paper, we consider authorship attribution as found in the wild: the set of known candidates is extremely large (possibly many thousands) and might not even include the actual author. Moreover, the known texts and the anonymous texts might be of limited length. We show that even in these difficult cases, we can use similarity-based methods along with multiple randomized feature sets to achieve high precision. Moreover, we show the precise relationship between attribution precision and four parameters: the size of the candidate set, the quantity of known-text by the candidates, the length of the anonymous text and a certain robustness score associated with a attribution.

Original languageEnglish
Pages (from-to)83-94
Number of pages12
JournalLanguage Resources and Evaluation
Volume45
Issue number1
DOIs
StatePublished - Mar 2011

Keywords

  • Authorship attribution
  • Open candidate set
  • Randomized feature set

Fingerprint

Dive into the research topics of 'Authorship attribution in the wild'. Together they form a unique fingerprint.

Cite this