IDEAS home Printed from https://ideas.repec.org/a/igg/jswis0/v14y2018i3p53-69.html
   My bibliography  Save this article

Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection

Author

Listed:
  • Ali Daud

    (King Abdulaziz University, Jeddah, Saudi Arabia & International Islamic University, Islamabad, Pakistan)

  • Jamal Ahmad Khan

    (International Islamic University, Islamabad, Pakistan)

  • Jamal Abdul Nasir

    (International Islamic University, Islamabad, Pakistan)

  • Rabeeh Ayaz Abbasi

    (King Abdulaziz University, Jeddah, Saudi Arabia & Quaid-i-Azam University, Islamabad, Pakistan)

  • Naif Radi Aljohani

    (Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia)

  • Jalal S. Alowibdi

    (Faculty of Computing and Information Technology, University of Jeddah, Jeddah, Saudi Arabia)

Abstract

In this article we present a new semantic and syntactic-based method for external plagiarism detection. In the proposed approach, latent dirichlet allocation (LDA) and parts of speech (POS) tags are used together to detect plagiarism between the sample and a number of source documents. The basic hypothesis is that considering semantic and syntactic information between two text documents may improve the performance of the plagiarism detection task. Our method is based on two steps, naming, which is a pre-processing where we detect the topics from the sentences in documents using the LDA and convert each sentence in POS tags array; then a post processing step where the suspicious cases are verified purely on the basis of semantic rules. For two types of external plagiarism (copy and random obfuscation), we empirically compare our approach to the state-of-the-art N-gram based and stop-word N-gram based methods and observe significant improvements.

Suggested Citation

  • Ali Daud & Jamal Ahmad Khan & Jamal Abdul Nasir & Rabeeh Ayaz Abbasi & Naif Radi Aljohani & Jalal S. Alowibdi, 2018. "Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection," International Journal on Semantic Web and Information Systems (IJSWIS), IGI Global, vol. 14(3), pages 53-69, July.
  • Handle: RePEc:igg:jswis0:v:14:y:2018:i:3:p:53-69
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJSWIS.2018070103
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hong, Ming & Wang, Heyong, 2021. "Research on customer opinion summarization using topic mining and deep neural network," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 185(C), pages 88-114.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jswis0:v:14:y:2018:i:3:p:53-69. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.