Automatically profiling the author of an anonymous text

Shlomo Argamon, Moshe Koppel, James W. Pennebaker, Jonathan Schler

Research output: Contribution to journalArticlepeer-review

317 Scopus citations

Abstract

Authorship profiling problem is of growing importance in the global information environment, and can help police identify characteristics of the perpetrator of a crime when there are specific suspects to consider. The approach is to apply machine learning to text categorization, for which the corpus of training documents, each labeled according to its category for a particular profiling dimension is taken. The study outlined the kinds of text features that can be found most useful for authorship profiling. The two basic type of features include content based features, and style based features, which reflect the fact that different populations might tend to write about different topics as well as to express themselves differently about the same topic. There are four profiling problems such as determining the author's gender, age, native language, and neuroticism level for the experimental setup. The right combination of linguistic features and machine learning methods enables an automated system to effectively determine such aspects of an anonymous author.

Original languageEnglish
Pages (from-to)119-123
Number of pages5
JournalCommunications of the ACM
Volume52
Issue number2
DOIs
StatePublished - 1 Feb 2009

Fingerprint

Dive into the research topics of 'Automatically profiling the author of an anonymous text'. Together they form a unique fingerprint.

Cite this