Abstract
Advancement in internet technology has made
information resources more readily available and much easier
for plagiarism to be carried out. Detecting plagiarism is by no
means a trivial task because of the sophisticated tactics by
which plagiarist disguise their sources. In this paper we present
a hybrid algorithm for identifying and categorizing plagiarised
text documents. We built our algorithm by combining the
potentials of three standard textual similarity measures used in
information retrieval (IR). We used the back propagation
neural network (BPNN) for combining the measures and the
PAN@Clef 2012 text alignment corpus for experimental
purpose. We experimented with four categories of plagiarism
with each category representing a degree of textual similarity.
We measured performance in terms of precision, recall and fmeasure.
Comparative analysis using the same corpus revealed
that our hybrid algorithm (HA) outperformed each of the base
similarity measures (BSM) in detecting three out of the four
categories of plagiarism, and stood at a virtual tie in the fourth
category: [highly similar: HA-96.6183%, BSM-96.5517%,
lightly reviewed: HA-84.1321%, BSM-80.9636%, heavily
reviewed: HA-68.1188%, BSM-67.1255%, highly dissimilar:
HA-70.6280%, BSM-69.7%].
Original language | English |
---|---|
Title of host publication | Proceedings of the World Congress on Engineering 2015 |
Publisher | International Association of Engineers |
Pages | 297-302 |
Volume | 1 |
ISBN (Print) | 978-988-19253-4-3 |
Publication status | Published - 2015 |
Bibliographical note
This paper is available at http://www.iaeng.org/publication/WCE2015/WCE2015_pp297-302.pdfKeywords
- Information Retrieval
- Plagiarism Detection
- Similarity Measures
- Artificial Neural Networks.