TY - JOUR
T1 - Automatically profiling the author of an anonymous text
AU - Argamon, Shlomo
AU - Koppel, Moshe
AU - Pennebaker, James W.
AU - Schler, Jonathan
PY - 2009/2/1
Y1 - 2009/2/1
N2 - Authorship profiling problem is of growing importance in the global information environment, and can help police identify characteristics of the perpetrator of a crime when there are specific suspects to consider. The approach is to apply machine learning to text categorization, for which the corpus of training documents, each labeled according to its category for a particular profiling dimension is taken. The study outlined the kinds of text features that can be found most useful for authorship profiling. The two basic type of features include content based features, and style based features, which reflect the fact that different populations might tend to write about different topics as well as to express themselves differently about the same topic. There are four profiling problems such as determining the author's gender, age, native language, and neuroticism level for the experimental setup. The right combination of linguistic features and machine learning methods enables an automated system to effectively determine such aspects of an anonymous author.
AB - Authorship profiling problem is of growing importance in the global information environment, and can help police identify characteristics of the perpetrator of a crime when there are specific suspects to consider. The approach is to apply machine learning to text categorization, for which the corpus of training documents, each labeled according to its category for a particular profiling dimension is taken. The study outlined the kinds of text features that can be found most useful for authorship profiling. The two basic type of features include content based features, and style based features, which reflect the fact that different populations might tend to write about different topics as well as to express themselves differently about the same topic. There are four profiling problems such as determining the author's gender, age, native language, and neuroticism level for the experimental setup. The right combination of linguistic features and machine learning methods enables an automated system to effectively determine such aspects of an anonymous author.
UR - http://www.scopus.com/inward/record.url?scp=58849089737&partnerID=8YFLogxK
U2 - 10.1145/1461928.1461959
DO - 10.1145/1461928.1461959
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:58849089737
SN - 0001-0782
VL - 52
SP - 119
EP - 123
JO - Communications of the ACM
JF - Communications of the ACM
IS - 2
ER -