Automatically Categorizing Written Texts by Author Gender

Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni

Research output: Contribution to journalArticlepeer-review

476 Scopus citations

Abstract

The problem of automatically determining the gender of a document’s author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy. The same techniques can be used to determine if a document is fiction or non-fiction with approximately 98 per cent accuracy.

Original languageEnglish
Pages (from-to)401-412
Number of pages12
JournalLiterary and Linguistic Computing
Volume17
Issue number4
DOIs
StatePublished - Nov 2002

Fingerprint

Dive into the research topics of 'Automatically Categorizing Written Texts by Author Gender'. Together they form a unique fingerprint.

Cite this