A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents

V.U. Thompson, Christo Panchev

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Advancement in internet technology has made information resources more readily available and much easier for plagiarism to be carried out. Detecting plagiarism is by no means a trivial task because of the sophisticated tactics by which plagiarist disguise their sources. In this paper we present a hybrid algorithm for identifying and categorizing plagiarised text documents. We built our algorithm by combining the potentials of three standard textual similarity measures used in information retrieval (IR). We used the back propagation neural network (BPNN) for combining the measures and the PAN@Clef 2012 text alignment corpus for experimental purpose. We experimented with four categories of plagiarism with each category representing a degree of textual similarity. We measured performance in terms of precision, recall and fmeasure. Comparative analysis using the same corpus revealed that our hybrid algorithm (HA) outperformed each of the base similarity measures (BSM) in detecting three out of the four categories of plagiarism, and stood at a virtual tie in the fourth category: [highly similar: HA-96.6183%, BSM-96.5517%, lightly reviewed: HA-84.1321%, BSM-80.9636%, heavily reviewed: HA-68.1188%, BSM-67.1255%, highly dissimilar: HA-70.6280%, BSM-69.7%].
Original languageEnglish
Title of host publicationProceedings of the World Congress on Engineering 2015
PublisherInternational Association of Engineers
Pages297-302
Volume1
ISBN (Print)978-988-19253-4-3
Publication statusPublished - 2015

Fingerprint

Internet

Bibliographical note

This paper is available at http://www.iaeng.org/publication/WCE2015/WCE2015_pp297-302.pdf

Keywords

  • Information Retrieval
  • Plagiarism Detection
  • Similarity Measures
  • Artificial Neural Networks.

Cite this

Thompson, V. U., & Panchev, C. (2015). A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents. In Proceedings of the World Congress on Engineering 2015 (Vol. 1, pp. 297-302). International Association of Engineers.

A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents. / Thompson, V.U.; Panchev, Christo.

Proceedings of the World Congress on Engineering 2015. Vol. 1 International Association of Engineers, 2015. p. 297-302.

Research output: Chapter in Book/Report/Conference proceedingChapter

Thompson, VU & Panchev, C 2015, A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents. in Proceedings of the World Congress on Engineering 2015. vol. 1, International Association of Engineers, pp. 297-302.
Thompson VU, Panchev C. A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents. In Proceedings of the World Congress on Engineering 2015. Vol. 1. International Association of Engineers. 2015. p. 297-302
Thompson, V.U. ; Panchev, Christo. / A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents. Proceedings of the World Congress on Engineering 2015. Vol. 1 International Association of Engineers, 2015. pp. 297-302
@inbook{9e57b7c223a545cba9922b357b82dfe8,
title = "A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents",
abstract = "Advancement in internet technology has made information resources more readily available and much easier for plagiarism to be carried out. Detecting plagiarism is by no means a trivial task because of the sophisticated tactics by which plagiarist disguise their sources. In this paper we present a hybrid algorithm for identifying and categorizing plagiarised text documents. We built our algorithm by combining the potentials of three standard textual similarity measures used in information retrieval (IR). We used the back propagation neural network (BPNN) for combining the measures and the PAN@Clef 2012 text alignment corpus for experimental purpose. We experimented with four categories of plagiarism with each category representing a degree of textual similarity. We measured performance in terms of precision, recall and fmeasure. Comparative analysis using the same corpus revealed that our hybrid algorithm (HA) outperformed each of the base similarity measures (BSM) in detecting three out of the four categories of plagiarism, and stood at a virtual tie in the fourth category: [highly similar: HA-96.6183{\%}, BSM-96.5517{\%}, lightly reviewed: HA-84.1321{\%}, BSM-80.9636{\%}, heavily reviewed: HA-68.1188{\%}, BSM-67.1255{\%}, highly dissimilar: HA-70.6280{\%}, BSM-69.7{\%}].",
keywords = "Information Retrieval, Plagiarism Detection, Similarity Measures, Artificial Neural Networks.",
author = "V.U. Thompson and Christo Panchev",
note = "This paper is available at http://www.iaeng.org/publication/WCE2015/WCE2015_pp297-302.pdf",
year = "2015",
language = "English",
isbn = "978-988-19253-4-3",
volume = "1",
pages = "297--302",
booktitle = "Proceedings of the World Congress on Engineering 2015",
publisher = "International Association of Engineers",
address = "China",

}

TY - CHAP

T1 - A Hybrid Algorithm for Identifying and Categorizing Plagiarised Text Documents

AU - Thompson, V.U.

AU - Panchev, Christo

N1 - This paper is available at http://www.iaeng.org/publication/WCE2015/WCE2015_pp297-302.pdf

PY - 2015

Y1 - 2015

N2 - Advancement in internet technology has made information resources more readily available and much easier for plagiarism to be carried out. Detecting plagiarism is by no means a trivial task because of the sophisticated tactics by which plagiarist disguise their sources. In this paper we present a hybrid algorithm for identifying and categorizing plagiarised text documents. We built our algorithm by combining the potentials of three standard textual similarity measures used in information retrieval (IR). We used the back propagation neural network (BPNN) for combining the measures and the PAN@Clef 2012 text alignment corpus for experimental purpose. We experimented with four categories of plagiarism with each category representing a degree of textual similarity. We measured performance in terms of precision, recall and fmeasure. Comparative analysis using the same corpus revealed that our hybrid algorithm (HA) outperformed each of the base similarity measures (BSM) in detecting three out of the four categories of plagiarism, and stood at a virtual tie in the fourth category: [highly similar: HA-96.6183%, BSM-96.5517%, lightly reviewed: HA-84.1321%, BSM-80.9636%, heavily reviewed: HA-68.1188%, BSM-67.1255%, highly dissimilar: HA-70.6280%, BSM-69.7%].

AB - Advancement in internet technology has made information resources more readily available and much easier for plagiarism to be carried out. Detecting plagiarism is by no means a trivial task because of the sophisticated tactics by which plagiarist disguise their sources. In this paper we present a hybrid algorithm for identifying and categorizing plagiarised text documents. We built our algorithm by combining the potentials of three standard textual similarity measures used in information retrieval (IR). We used the back propagation neural network (BPNN) for combining the measures and the PAN@Clef 2012 text alignment corpus for experimental purpose. We experimented with four categories of plagiarism with each category representing a degree of textual similarity. We measured performance in terms of precision, recall and fmeasure. Comparative analysis using the same corpus revealed that our hybrid algorithm (HA) outperformed each of the base similarity measures (BSM) in detecting three out of the four categories of plagiarism, and stood at a virtual tie in the fourth category: [highly similar: HA-96.6183%, BSM-96.5517%, lightly reviewed: HA-84.1321%, BSM-80.9636%, heavily reviewed: HA-68.1188%, BSM-67.1255%, highly dissimilar: HA-70.6280%, BSM-69.7%].

KW - Information Retrieval

KW - Plagiarism Detection

KW - Similarity Measures

KW - Artificial Neural Networks.

M3 - Chapter

SN - 978-988-19253-4-3

VL - 1

SP - 297

EP - 302

BT - Proceedings of the World Congress on Engineering 2015

PB - International Association of Engineers

ER -