Abstract
Plagiarism is the use of the language and thoughts of another work and the representation of them as one's own original work. Various levels of plagiarism exist in many domains in general and in academic papers in particular. Therefore, diverse efforts are taken to automatically identify plagiarism. In this research, we developed software capable of simple plagiarism detection. We have built a corpus (C) containing 10,100 academic papers in computer science written in English and two test sets including papers that were randomly chosen from C. A widespread variety of baseline methods has been developed to identify identical or similar papers. Several methods are novel. The experimental results and their analysis show interesting findings. Some of the novel methods are among the best predictive methods.
Original language | English |
---|---|
Pages | 421-429 |
Number of pages | 9 |
State | Published - 2010 |
Externally published | Yes |
Event | 23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China Duration: 23 Aug 2010 → 27 Aug 2010 |
Conference
Conference | 23rd International Conference on Computational Linguistics, Coling 2010 |
---|---|
Country/Territory | China |
City | Beijing |
Period | 23/08/10 → 27/08/10 |