TY - JOUR
T1 - Automatically Categorizing Written Texts by Author Gender
AU - Koppel, Moshe
AU - Argamon, Shlomo
AU - Shimoni, Anat Rachel
PY - 2002/11
Y1 - 2002/11
N2 - The problem of automatically determining the gender of a document’s author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy. The same techniques can be used to determine if a document is fiction or non-fiction with approximately 98 per cent accuracy.
AB - The problem of automatically determining the gender of a document’s author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy. The same techniques can be used to determine if a document is fiction or non-fiction with approximately 98 per cent accuracy.
UR - http://www.scopus.com/inward/record.url?scp=84985033441&partnerID=8YFLogxK
U2 - 10.1093/llc/17.4.401
DO - 10.1093/llc/17.4.401
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84985033441
SN - 0268-1145
VL - 17
SP - 401
EP - 412
JO - Literary and Linguistic Computing
JF - Literary and Linguistic Computing
IS - 4
ER -