IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i13p2123-d1430103.html
   My bibliography  Save this article

OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms

Author

Listed:
  • MD. Nahid Hasan

    (Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh)

  • Kazi Shadman Sakib

    (Department of Computer Science and Engineering, University of Dhaka, Dhaka 1000, Bangladesh)

  • Taghrid Tahani Preeti

    (Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh)

  • Jeza Allohibi

    (Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia)

  • Abdulmajeed Atiah Alharbi

    (Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia)

  • Jia Uddin

    (Artificial Intelligence and Big Data Department, Endicott College, Woosong University, Daejeon 34606, Republic of Korea)

Abstract

The pervasiveness of offensive language on social media emphasizes the necessity of automated systems for identifying and categorizing content. To ensure a more secure online environment and improve communication, effective identification and categorization of this content is essential. However, existing research encounters challenges such as limited datasets and biased model performance, hindering progress in this domain. To address these challenges, this research presents a comprehensive framework that simplifies the utilization of support vector machines (SVM), random forest (RF) and artificial neural networks (ANN). The proposed methodology yields notable gains in offensive language detection, automatic categorization of offensiveness, and offense target identification tasks by utilizing the Offensive Language Identification Dataset (OLID). The simulation results indicate that SVM performs exceptionally well, exhibiting excellent accuracy scores (77%, 88%, and 68%), precision scores (76%, 87%, and 67%), F1 scores (57%, 88%, and 68%), and recall rates (45%, 88%, and 68%), proving to be practically successful in identifying and moderating offensive content on social media. By applying sophisticated preprocessing and meticulous hyperparameter tuning, our model outperforms some earlier research in detecting and categorizing offensive language tasks.

Suggested Citation

  • MD. Nahid Hasan & Kazi Shadman Sakib & Taghrid Tahani Preeti & Jeza Allohibi & Abdulmajeed Atiah Alharbi & Jia Uddin, 2024. "OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms," Mathematics, MDPI, vol. 12(13), pages 1-18, July.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:2123-:d:1430103
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/13/2123/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/13/2123/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Matthias Schonlau & Rosie Yuyan Zou, 2020. "The random forest algorithm for statistical learning," Stata Journal, StataCorp LP, vol. 20(1), pages 3-29, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sascha O. Becker, Sascha O & Voth, Hans-Joachim, 2023. "From the Death of God to the Rise of Hitler," The Warwick Economics Research Paper Series (TWERPS) 1478, University of Warwick, Department of Economics.
    2. Wang, Feipeng & Wong, Wing-Keung & Wang, Zheng & Albasher, Gadah & Alsultan, Nouf & Fatemah, Ambreen, 2023. "Emerging pathways to sustainable economic development: An interdisciplinary exploration of resource efficiency, technological innovation, and ecosystem resilience in resource-rich regions," Resources Policy, Elsevier, vol. 85(PA).
    3. Xiaxuan He & Qifeng Yuan & Yinghong Qin & Junwen Lu & Gang Li, 2024. "Analysis of Surface Urban Heat Island in the Guangzhou-Foshan Metropolitan Area Based on Local Climate Zones," Land, MDPI, vol. 13(10), pages 1-34, October.
    4. Sascha O. Becker & Hans-Joachim Voth, 2023. "From the Death of God to the Rise of Hitler," CESifo Working Paper Series 10730, CESifo.
    5. Ahmet Faruk Aysan & Bekir Sait Ciftler & Ibrahim Musa Unal, 2024. "Predictive Power of Random Forests in Analyzing Risk Management in Islamic Banking," JRFM, MDPI, vol. 17(3), pages 1-19, March.
    6. Zhu, Xinyi & Shen, Xiaoyan & Chen, Kailiang & Zhang, Zeqing, 2024. "Research on the prediction and influencing factors of heavy duty truck fuel consumption based on LightGBM," Energy, Elsevier, vol. 296(C).
    7. Özer Depren & Mustafa Tevfik Kartal & Serpil Kılıç Depren, 2021. "Recent innovation in benchmark rates (BMR): evidence from influential factors on Turkish Lira Overnight Reference Interest Rate with machine learning algorithms," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 7(1), pages 1-20, December.
    8. Tomasz Rymarczyk & Konrad Niderla & Edward Kozłowski & Krzysztof Król & Joanna Maria Wyrwisz & Sylwia Skrzypek-Ahmed & Piotr Gołąbek, 2021. "Logistic Regression with Wave Preprocessing to Solve Inverse Problem in Industrial Tomography for Technological Process Control," Energies, MDPI, vol. 14(23), pages 1-21, December.
    9. Jialing Zhang & Zhanxu Chen & An Wang & Zhenzhang Li & Wei Wan, 2023. "Intelligent Personalized Lighting Control System for Residents," Sustainability, MDPI, vol. 15(21), pages 1-12, October.
    10. Yu, Min & Niu, Dongxiao & Gao, Tian & Wang, Keke & Sun, Lijie & Li, Mingyu & Xu, Xiaomin, 2023. "A novel framework for ultra-short-term interval wind power prediction based on RF-WOA-VMD and BiGRU optimized by the attention mechanism," Energy, Elsevier, vol. 269(C).
    11. Junlong Zhang & Youbin He & Yuan Zhang & Weifeng Li & Junjie Zhang, 2022. "Well-Logging-Based Lithology Classification Using Machine Learning Methods for High-Quality Reservoir Identification: A Case Study of Baikouquan Formation in Mahu Area of Junggar Basin, NW China," Energies, MDPI, vol. 15(10), pages 1-15, May.
    12. Forbes, Kevin F., 2023. "Demand for grid-supplied electricity in the presence of distributed solar energy resources: Evidence from New York City," Utilities Policy, Elsevier, vol. 80(C).
    13. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer & Thomas Wiemann, 2024. "ddml: Double/debiased machine learning in Stata," Stata Journal, StataCorp LP, vol. 24(1), pages 3-45, March.
    14. Virginia Negri & Alessandro Mingotti & Roberto Tinarelli & Lorenzo Peretto, 2023. "Comparison of Algorithms for the AI-Based Fault Diagnostic of Cable Joints in MV Networks," Energies, MDPI, vol. 16(1), pages 1-20, January.
    15. Hillebrecht, Michael & Klonner, Stefan & Pacere, Noraogo A., 2020. "Dynamic Properties of Poverty Targeting," Working Papers 0696, University of Heidelberg, Department of Economics.
    16. Ivan Brandić & Alan Antonović & Lato Pezo & Božidar Matin & Tajana Krička & Vanja Jurišić & Karlo Špelić & Mislav Kontek & Juraj Kukuruzović & Mateja Grubor & Ana Matin, 2023. "Energy Potentials of Agricultural Biomass and the Possibility of Modelling Using RFR and SVM Models," Energies, MDPI, vol. 16(2), pages 1-10, January.
    17. Simon, David & Sojourner, Aaron & Pedersen, Jon & Ombisa Skallet, Heidi, 2024. "Financial Incentives for Adoption and Kin Guardianship Improve Achievement for Foster Children," IZA Discussion Papers 17057, Institute of Labor Economics (IZA).
    18. Kang, Lili & Zhao, Guangchuan, 2022. "Financial support for unmet need for personal assistance with daily activities: Implications from China's long-term care insurance pilots," Finance Research Letters, Elsevier, vol. 45(C).
    19. Hong Pan & Jie Yang & Yang Yu & Yuan Zheng & Xiaonan Zheng & Chenyang Hang, 2024. "Intelligent Low-Consumption Optimization Strategies: Economic Operation of Hydropower Stations Based on Improved LSTM and Random Forest Machine Learning Algorithm," Mathematics, MDPI, vol. 12(9), pages 1-20, April.
    20. Julien Champagne & Émilien Gouin-Bonenfant, 2022. "Monetary Policy, Credit Constraints and SME Employment," Staff Working Papers 22-49, Bank of Canada.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:2123-:d:1430103. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.