A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents

V.U. Thompson, Christo Panchev

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Advancement in internet technology has made information resources more readily available and much easier for plagiarism to be carried out. Detecting plagiarism is by no means a trivial task because of the sophisticated tactics by which plagiarist disguise their sources. In this paper we present a hybrid algorithm for identifying and categorizing plagiarised text documents. We built our algorithm by combining the potentials of three standard textual similarity measures used in information retrieval (IR). We used the back propagation neural network (BPNN) for combining the measures and the PAN@Clef 2012 text alignment corpus for experimental purpose. We experimented with four categories of plagiarism with each category representing a degree of textual similarity. We measured performance in terms of precision, recall and fmeasure. Comparative analysis using the same corpus revealed that our hybrid algorithm (HA) outperformed each of the base similarity measures (BSM) in detecting three out of the four categories of plagiarism, and stood at a virtual tie in the fourth category: [highly similar: HA-96.6183%, BSM-96.5517%, lightly reviewed: HA-84.1321%, BSM-80.9636%, heavily reviewed: HA-68.1188%, BSM-67.1255%, highly dissimilar: HA-70.6280%, BSM-69.7%].
Original languageEnglish
Title of host publicationProceedings of the World Congress on Engineering 2015
PublisherInternational Association of Engineers
Pages297-302
Volume1
ISBN (Print)978-988-19253-4-3
Publication statusPublished - 2015

Bibliographical note

This paper is available at http://www.iaeng.org/publication/WCE2015/WCE2015_pp297-302.pdf

Keywords

  • Information Retrieval
  • Plagiarism Detection
  • Similarity Measures
  • Artificial Neural Networks.

Fingerprint Dive into the research topics of 'A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents'. Together they form a unique fingerprint.

  • Cite this

    Thompson, V. U., & Panchev, C. (2015). A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents. In Proceedings of the World Congress on Engineering 2015 (Vol. 1, pp. 297-302). International Association of Engineers.