Intrinsic plagiarism detection using latent semantic indexing and stylometry

Muna Alsallal, Rahat Iqbal, Saad Amin, Anne James

    Research output: Chapter in Book/Report/Conference proceedingConference proceedingpeer-review

    17 Citations (Scopus)

    Abstract

    Plagiarism is growing increasingly for the last few years due to the rapid proliferation of information through the World Wide Web (WWW). In this paper, we present an integrated approach based on Latent Semantic Indexing (LSI) and Stylometry technique for intrinsic plagiarism detection. LSI is used for the term document matrix of dataset, whereas, stylometry is used for intrinsic approximation of human writing style. We have conducted a series of experiments to investigate the efficiency of dimensionality reduction (DR) parameter as the core for LSI technique in order to gain insights into its effects using a small corpus. Following that, we carried out comparative evaluation of our approach by using the LSI and Stylometry separately, and then applying them together. Our results show that the performance of the proposed approach was improved when an integrated approach consisting of LSI and stylometry was applied.

    Original languageEnglish
    Title of host publicationProceedings - 2013 6th International Conference on Developments in eSystems Engineering, DeSE 2013
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages145-150
    Number of pages6
    ISBN (Electronic)9781479952649
    DOIs
    Publication statusPublished - 11 Feb 2013
    Event2013 6th International Conference on Developments in eSystems Engineering - Abu Dhabi, United Arab Emirates
    Duration: 16 Dec 201318 Dec 2013

    Conference

    Conference2013 6th International Conference on Developments in eSystems Engineering
    Abbreviated titleDeSE 2013
    Country/TerritoryUnited Arab Emirates
    CityAbu Dhabi
    Period16/12/1318/12/13

    Keywords

    • Extrinsic plagiarism
    • Intrinsic plagiarism
    • Latent semantic indexing (LSI)
    • Plagiarism
    • Stylometry technique
    • Text misuse

    ASJC Scopus subject areas

    • Control and Systems Engineering
    • Computer Networks and Communications
    • Computer Science Applications
    • Software

    Fingerprint

    Dive into the research topics of 'Intrinsic plagiarism detection using latent semantic indexing and stylometry'. Together they form a unique fingerprint.

    Cite this