Detection of simple plagiarism in computer science papers

Yaakov Hacohen-Kerner, Natan Ben-Dror, Aharon Tayeb

Research output: Contribution to conferencePaperpeer-review

14 Scopus citations

Abstract

Plagiarism is the use of the language and thoughts of another work and the representation of them as one's own original work. Various levels of plagiarism exist in many domains in general and in academic papers in particular. Therefore, diverse efforts are taken to automatically identify plagiarism. In this research, we developed software capable of simple plagiarism detection. We have built a corpus (C) containing 10,100 academic papers in computer science written in English and two test sets including papers that were randomly chosen from C. A widespread variety of baseline methods has been developed to identify identical or similar papers. Several methods are novel. The experimental results and their analysis show interesting findings. Some of the novel methods are among the best predictive methods.

Original languageEnglish
Pages421-429
Number of pages9
StatePublished - 2010
Externally publishedYes
Event23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China
Duration: 23 Aug 201027 Aug 2010

Conference

Conference23rd International Conference on Computational Linguistics, Coling 2010
Country/TerritoryChina
CityBeijing
Period23/08/1027/08/10

Fingerprint

Dive into the research topics of 'Detection of simple plagiarism in computer science papers'. Together they form a unique fingerprint.

Cite this