IDEAS home Printed from https://ideas.repec.org/a/igg/jcini0/v14y2020i1p35-50.html
   My bibliography  Save this article

Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity

Author

Listed:
  • Adnen Mahmoud

    (Higher Institute of Computer Science and Communication Techniques, Monastir, Tunisia)

  • Mounir Zrigui

    (Faculty of Science Monastir, Monastir, Tunisia)

Abstract

The problem addressed is to develop a model that can reliably identify whether a previously unseen document pair is paraphrased or not. Its detection in Arabic documents is a challenge because of its variability in features and the lack of publicly available corpora. Faced with these problems, the authors propose a semantic approach. At the feature extraction level, the authors use global vectors representation combining global co-occurrence counting and a contextual skip gram model. At the paraphrase identification level, the authors apply a convolutional neural network model to learn more contextual and semantic information between documents. For experiments, the authors use Open Source Arabic Corpora as a source corpus. Then the authors collect different datasets to create a vocabulary model. For the paraphrased corpus construction, the authors replace each word from the source corpus by its most similar one which has the same grammatical class applying the word2vec algorithm and the part-of-speech annotation. Experiments show that the model achieves promising results in terms of precision and recall compared to existing approaches in the literature.

Suggested Citation

  • Adnen Mahmoud & Mounir Zrigui, 2020. "Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity," International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), IGI Global, vol. 14(1), pages 35-50, January.
  • Handle: RePEc:igg:jcini0:v:14:y:2020:i:1:p:35-50
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJCINI.2020010103
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Boan Ji & Huabin Wang & Mengxin Zhang & Borun Mao & Xuejun Li, 2022. "An Efficient Lightweight Network Based on Magnetic Resonance Images for Predicting Alzheimer's Disease," International Journal on Semantic Web and Information Systems (IJSWIS), IGI Global, vol. 18(1), pages 1-18, January.
    2. Ons Meddeb & Mohsen Maraoui & Mounir Zrigui, 2021. "Personalized Smart Learning Recommendation System for Arabic Users in Smart Campus," International Journal of Web-Based Learning and Teaching Technologies (IJWLTT), IGI Global, vol. 16(6), pages 1-21, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jcini0:v:14:y:2020:i:1:p:35-50. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.