Automatically identifying pseudepigraphic texts

Moshe Koppel, Shachar Seidman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

23 Scopus citations

Abstract

The identification of pseudepigraphic texts - texts not written by the authors to which they are attributed - has important historical, forensic and commercial applications. We introduce an unsupervised technique for identifying pseudepigrapha. The idea is to identify textual outliers in a corpus based on the pairwise similarities of all documents in the corpus. The crucial point is that document similarity not be measured in any of the standard ways but rather be based on the output of a recently introduced algorithm for authorship verification. The proposed method strongly outperforms existing techniques in systematic experiments on a blog corpus.

Original languageEnglish
Title of host publicationEMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1449-1454
Number of pages6
ISBN (Electronic)9781937284978
StatePublished - 2013
Event2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 - Seattle, United States
Duration: 18 Oct 201321 Oct 2013

Publication series

NameEMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013
Country/TerritoryUnited States
CitySeattle
Period18/10/1321/10/13

Bibliographical note

Publisher Copyright:
© 2013 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Automatically identifying pseudepigraphic texts'. Together they form a unique fingerprint.

Cite this