Abstract
The identification of pseudepigraphic texts - texts not written by the authors to which they are attributed - has important historical, forensic and commercial applications. We introduce an unsupervised technique for identifying pseudepigrapha. The idea is to identify textual outliers in a corpus based on the pairwise similarities of all documents in the corpus. The crucial point is that document similarity not be measured in any of the standard ways but rather be based on the output of a recently introduced algorithm for authorship verification. The proposed method strongly outperforms existing techniques in systematic experiments on a blog corpus.
Original language | English |
---|---|
Title of host publication | EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1449-1454 |
Number of pages | 6 |
ISBN (Electronic) | 9781937284978 |
State | Published - 2013 |
Event | 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 - Seattle, United States Duration: 18 Oct 2013 → 21 Oct 2013 |
Publication series
Name | EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference |
---|
Conference
Conference | 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 18/10/13 → 21/10/13 |
Bibliographical note
Publisher Copyright:© 2013 Association for Computational Linguistics.