IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v6y2021i11p117-d679373.html
   My bibliography  Save this article

Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification

Author

Listed:
  • Mayur Gaikwad

    (Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune MH 412115, India)

  • Swati Ahirrao

    (Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune MH 412115, India)

  • Shraddha Phansalkar

    (MIT Art, Design and Technology University, Pune MH 412201, India)

  • Ketan Kotecha

    (Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune MH 412115, India)

Abstract

Social media platforms are a popular choice for extremist organizations to disseminate their perceptions, beliefs, and ideologies. This information is generally based on selective reporting and is subjective in content. However, the radical presentation of this disinformation and its outreach on social media leads to an increased number of susceptible audiences. Hence, detection of extremist text on social media platforms is a significant area of research. The unavailability of extremism text datasets is a challenge in online extremism research. The lack of emphasis on classifying extremism text into propaganda, radicalization, and recruitment classes is a challenge. The lack of data validation methods also challenges the accuracy of extremism detection. This research addresses these challenges and presents a seed dataset with a multi-ideology and multi-class extremism text dataset. This research presents the construction of a multi-ideology ISIS/Jihadist White supremacist (MIWS) dataset with recent tweets collected from Twitter. The presented dataset can be employed effectively and importantly to classify extremist text into popular types like propaganda, radicalization, and recruitment. Additionally, the seed dataset is statistically validated with a coherence score of Latent Dirichlet Allocation (LDA) and word mover’s distance using a pretrained Google News vector. The dataset shows effectiveness in its construction with good coherence scores within a topic and appropriate distance measures between topics. This dataset is the first publicly accessible multi-ideology, multi-class extremism text dataset to reinforce research on extremism text detection on social media platforms.

Suggested Citation

  • Mayur Gaikwad & Swati Ahirrao & Shraddha Phansalkar & Ketan Kotecha, 2021. "Multi-Ideology ISIS/Jihadist White Supremacist (MIWS) Dataset for Multi-Class Extremism Text Classification," Data, MDPI, vol. 6(11), pages 1-15, November.
  • Handle: RePEc:gam:jdataj:v:6:y:2021:i:11:p:117-:d:679373
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/6/11/117/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/6/11/117/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Elkhan Richard Sadik-Zada, 2021. "An Ode to ODA against all Odds? A Novel Game-Theoretical and Empirical Reappraisal of the Terrorism-Aid Nexus," Atlantic Economic Journal, Springer;International Atlantic Economic Society, vol. 49(2), pages 221-240, June.
    2. Hailin Liu & Ling Xu & Mengning Yang & Meng Yan & Xiaohong Zhang, 2015. "Predicting Component Failures Using Latent Dirichlet Allocation," Mathematical Problems in Engineering, Hindawi, vol. 2015, pages 1-15, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Terhorst, Andrew & Garrard, Robert, 2022. "How unified is the Australian agricultural sector when talking to policy makers about digitalization?," SocArXiv 4nge5, Center for Open Science.
    2. Rui Sun & Dayi He & Jingjing Yan & Li Tao, 2021. "Mechanism Analysis of Applying Blockchain Technology to Forestry Carbon Sink Projects Based on the Differential Game Model," Sustainability, MDPI, vol. 13(21), pages 1-18, October.
    3. Claudiu Coman & Felicia Andrioni & Roxana-Catalina Ghita & Maria Cristina Bularca, 2021. "Social and Emotional Intelligence as Factors in Terrorist Propaganda: An Analysis of the Way Mass Media Portrays the Behavior of Islamic Terrorist Groups," Sustainability, MDPI, vol. 13(21), pages 1-20, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:6:y:2021:i:11:p:117-:d:679373. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.