A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents

V.U. Thompson, Christo Panchev

    Research output: Chapter in Book/Report/Conference proceedingChapter

    1 Citation (Scopus)


    Advancement in internet technology has made information resources more readily available and much easier for plagiarism to be carried out. Detecting plagiarism is by no means a trivial task because of the sophisticated tactics by which plagiarist disguise their sources. In this paper we present a hybrid algorithm for identifying and categorizing plagiarised text documents. We built our algorithm by combining the potentials of three standard textual similarity measures used in information retrieval (IR). We used the back propagation neural network (BPNN) for combining the measures and the PAN@Clef 2012 text alignment corpus for experimental purpose. We experimented with four categories of plagiarism with each category representing a degree of textual similarity. We measured performance in terms of precision, recall and fmeasure. Comparative analysis using the same corpus revealed that our hybrid algorithm (HA) outperformed each of the base similarity measures (BSM) in detecting three out of the four categories of plagiarism, and stood at a virtual tie in the fourth category: [highly similar: HA-96.6183%, BSM-96.5517%, lightly reviewed: HA-84.1321%, BSM-80.9636%, heavily reviewed: HA-68.1188%, BSM-67.1255%, highly dissimilar: HA-70.6280%, BSM-69.7%].
    Original languageEnglish
    Title of host publicationProceedings of the World Congress on Engineering 2015
    PublisherInternational Association of Engineers
    ISBN (Print)978-988-19253-4-3
    Publication statusPublished - 2015

    Bibliographical note

    This paper is available at http://www.iaeng.org/publication/WCE2015/WCE2015_pp297-302.pdf


    • Information Retrieval
    • Plagiarism Detection
    • Similarity Measures
    • Artificial Neural Networks.


    Dive into the research topics of 'A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents'. Together they form a unique fingerprint.

    Cite this