IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v7y2022i7p83-d845163.html
   My bibliography  Save this article

Instagram-Based Benchmark Dataset for Cyberbullying Detection in Arabic Text

Author

Listed:
  • Reem ALBayari

    (Higher College of Technology, Abu Dhabi P.O. Box 25026, United Arab Emirates
    Faculty of Engineering and IT, The British University in Dubai, Dubai P.O. Box 345015, United Arab Emirates)

  • Sherief Abdallah

    (Faculty of Engineering and IT, The British University in Dubai, Dubai P.O. Box 345015, United Arab Emirates)

Abstract

(1) Background: the ability to use social media to communicate without revealing one’s real identity has created an attractive setting for cyberbullying. Several studies targeted social media to collect their datasets with the aim of automatically detecting offensive language. However, the majority of the datasets were in English, not in Arabic. Even the few Arabic datasets that were collected, none focused on Instagram despite being a major social media platform in the Arab world. (2) Methods: we use the official Instagram APIs to collect our dataset. To consider the dataset as a benchmark, we use SPSS (Kappa statistic) to evaluate the inter-annotator agreement (IAA), as well as examine and evaluate the performance of various learning models (LR, SVM, RFC, and MNB). (3) Results: in this research, we present the first Instagram Arabic corpus (sub-class categorization (multi-class)) focusing on cyberbullying. The dataset is primarily designed for the purpose of detecting offensive language in texts. We end up with 200,000 comments, of which 46,898 comments were annotated by three human annotators. The results show that the SVM classifier outperforms the other classifiers, with an F1 score of 69% for bullying comments and 85 percent for positive comments.

Suggested Citation

  • Reem ALBayari & Sherief Abdallah, 2022. "Instagram-Based Benchmark Dataset for Cyberbullying Detection in Arabic Text," Data, MDPI, vol. 7(7), pages 1-11, June.
  • Handle: RePEc:gam:jdataj:v:7:y:2022:i:7:p:83-:d:845163
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/7/7/83/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/7/7/83/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yousri Marzouki & Fatimah Salem Aldossari & Giuseppe A. Veltri, 2021. "Understanding the buffering effect of social media use on anxiety during the COVID-19 pandemic lockdown," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-10, December.
    2. Vuk Batanović & Miloš Cvetanović & Boško Nikolić, 2020. "A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-30, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Elena G. Popkova & Aleksei V. Bogoviz & Svetlana V. Lobova & Abdula M. Chililov & Anastasia A. Sozinova & Bruno S. Sergi, 2022. "Changing entrepreneurial attitudes for mitigating the global pandemic’s social drama," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-12, December.
    2. Griffith, David A. & Lee, Hannah S. & Yalcinkaya, Goksel, 2023. "Understanding the relationship between the use of social media and the prevalence of anxiety at the country level: a multi-country examination," International Business Review, Elsevier, vol. 32(4).
    3. Shisei Tei & Junya Fujino, 2022. "Social ties, fears and bias during the COVID-19 pandemic: Fragile and flexible mindsets," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-7, December.
    4. Siyao Liu & Bin Yu & Chan Xu & Min Zhao & Jing Guo, 2022. "Characteristics of Collective Resilience and Its Influencing Factors from the Perspective of Psychological Emotion: A Case Study of COVID-19 in China," IJERPH, MDPI, vol. 19(22), pages 1-19, November.
    5. Divine Q. Agozie & Muesser Nat, 2022. "Do communication content functions drive engagement among interest group audiences? An analysis of organizational communication on Twitter," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-9, December.
    6. Stephanie Rodriguez-Besteiro & Ana Isabel Beltran-Velasco & José Francisco Tornero-Aguilera & Marina Begoña Martínez-González & Eduardo Navarro-Jiménez & Rodrigo Yáñez-Sepúlveda & Vicente Javier Cleme, 2023. "Social Media, Anxiety and COVID-19 Lockdown Measurement Compliance," IJERPH, MDPI, vol. 20(5), pages 1-13, March.
    7. Pollák František & Vavrek Roman & Váchal Jan & Markovič Peter & Konečný Michal, 2021. "Analysis of Digital Customer Communities in terms of their interactions during the first wave of the COVID-19 pandemic," Management & Marketing, Sciendo, vol. 16(2), pages 134-151, June.
    8. Dimitrios Kydros & Maria Argyropoulou & Vasiliki Vrana, 2021. "A Content and Sentiment Analysis of Greek Tweets during the Pandemic," Sustainability, MDPI, vol. 13(11), pages 1-21, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:7:y:2022:i:7:p:83-:d:845163. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.