IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i13p2123-d1430103.html
   My bibliography  Save this article

OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms

Author

Listed:
  • MD. Nahid Hasan

    (Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh)

  • Kazi Shadman Sakib

    (Department of Computer Science and Engineering, University of Dhaka, Dhaka 1000, Bangladesh)

  • Taghrid Tahani Preeti

    (Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka 1212, Bangladesh)

  • Jeza Allohibi

    (Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia)

  • Abdulmajeed Atiah Alharbi

    (Department of Mathematics, Taibah University, Madinah 42353, Saudi Arabia)

  • Jia Uddin

    (Artificial Intelligence and Big Data Department, Endicott College, Woosong University, Daejeon 34606, Republic of Korea)

Abstract

The pervasiveness of offensive language on social media emphasizes the necessity of automated systems for identifying and categorizing content. To ensure a more secure online environment and improve communication, effective identification and categorization of this content is essential. However, existing research encounters challenges such as limited datasets and biased model performance, hindering progress in this domain. To address these challenges, this research presents a comprehensive framework that simplifies the utilization of support vector machines (SVM), random forest (RF) and artificial neural networks (ANN). The proposed methodology yields notable gains in offensive language detection, automatic categorization of offensiveness, and offense target identification tasks by utilizing the Offensive Language Identification Dataset (OLID). The simulation results indicate that SVM performs exceptionally well, exhibiting excellent accuracy scores (77%, 88%, and 68%), precision scores (76%, 87%, and 67%), F1 scores (57%, 88%, and 68%), and recall rates (45%, 88%, and 68%), proving to be practically successful in identifying and moderating offensive content on social media. By applying sophisticated preprocessing and meticulous hyperparameter tuning, our model outperforms some earlier research in detecting and categorizing offensive language tasks.

Suggested Citation

  • MD. Nahid Hasan & Kazi Shadman Sakib & Taghrid Tahani Preeti & Jeza Allohibi & Abdulmajeed Atiah Alharbi & Jia Uddin, 2024. "OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms," Mathematics, MDPI, vol. 12(13), pages 1-18, July.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:2123-:d:1430103
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/13/2123/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/13/2123/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Matthias Schonlau & Rosie Yuyan Zou, 2020. "The random forest algorithm for statistical learning," Stata Journal, StataCorp LP, vol. 20(1), pages 3-29, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sascha O. Becker, Sascha O & Voth, Hans-Joachim, 2023. "From the Death of God to the Rise of Hitler," The Warwick Economics Research Paper Series (TWERPS) 1478, University of Warwick, Department of Economics.
    2. Xiaxuan He & Qifeng Yuan & Yinghong Qin & Junwen Lu & Gang Li, 2024. "Analysis of Surface Urban Heat Island in the Guangzhou-Foshan Metropolitan Area Based on Local Climate Zones," Land, MDPI, vol. 13(10), pages 1-34, October.
    3. Sascha O. Becker & Hans-Joachim Voth, 2023. "From the Death of God to the Rise of Hitler," CESifo Working Paper Series 10730, CESifo.
    4. Sakiru Adebola Solarin & Muhammed Sehid Gorus & Onder Ozgur, 2024. "Modelling the economic effect of inbound birth tourism: a random forest algorithm approach," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(5), pages 4223-4240, October.
    5. Murat Aslan & Onder Ozgur, 2024. "Financial dollarization and its effects on inflation and output in Turkey: a machine learning approach," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(6), pages 5777-5804, December.
    6. Maria A. F. Silva Dias & Yania Molina Souto & Bruno Biazeto & Enzo Todesco & Jose A. Zuñiga Mora & Dylana Vargas Navarro & Melvin Pérez Chinchilla & Carlos Madrigal Araya & Dayanna Arce Fernández & Be, 2024. "Reduction of Wind Speed Forecast Error in Costa Rica Tejona Wind Farm with Artificial Intelligence," Energies, MDPI, vol. 17(22), pages 1-12, November.
    7. Tomasz Rymarczyk & Konrad Niderla & Edward Kozłowski & Krzysztof Król & Joanna Maria Wyrwisz & Sylwia Skrzypek-Ahmed & Piotr Gołąbek, 2021. "Logistic Regression with Wave Preprocessing to Solve Inverse Problem in Industrial Tomography for Technological Process Control," Energies, MDPI, vol. 14(23), pages 1-21, December.
    8. Lamperti, Fabio, 2024. "Unlocking machine learning for social sciences: The case for identifying Industry 4.0 adoption across business restructuring events," Technological Forecasting and Social Change, Elsevier, vol. 207(C).
    9. Jianghong Xu & Wei Lu & Weixin Wang, 2024. "From “fragile smallholders” to “resilient smallholders”: measuring rural household resilience in China," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-14, December.
    10. Forbes, Kevin F., 2023. "Demand for grid-supplied electricity in the presence of distributed solar energy resources: Evidence from New York City," Utilities Policy, Elsevier, vol. 80(C).
    11. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer & Thomas Wiemann, 2024. "ddml: Double/debiased machine learning in Stata," Stata Journal, StataCorp LP, vol. 24(1), pages 3-45, March.
    12. Xiangzhao Yan & Wei Yang & Zaohong Pu & Qilong Zhang & Yutong Chen & Jiaqi Chen & Weiqi Xiang & Hongyu Chen & Yuyang Cheng & Yanwei Zhao, 2025. "Responses of Typical Riparian Vegetation to Annual Variation of River Flow in a Semi-Arid Climate Region: Case Study of China’s Xiliao River," Land, MDPI, vol. 14(1), pages 1-19, January.
    13. Hillebrecht, Michael & Klonner, Stefan & Pacere, Noraogo A., 2020. "Dynamic Properties of Poverty Targeting," Working Papers 0696, University of Heidelberg, Department of Economics.
    14. Ivan Brandić & Alan Antonović & Lato Pezo & Božidar Matin & Tajana Krička & Vanja Jurišić & Karlo Špelić & Mislav Kontek & Juraj Kukuruzović & Mateja Grubor & Ana Matin, 2023. "Energy Potentials of Agricultural Biomass and the Possibility of Modelling Using RFR and SVM Models," Energies, MDPI, vol. 16(2), pages 1-10, January.
    15. David Simon & Aaron Sojourner & Jon Pedersen & Heidi Ombisa Skallet, 2024. "Financial Incentives for Adoption and Kin Guardianship Improve Achievement for Foster Children," Upjohn Working Papers 24-401, W.E. Upjohn Institute for Employment Research.
    16. Kang, Lili & Zhao, Guangchuan, 2022. "Financial support for unmet need for personal assistance with daily activities: Implications from China's long-term care insurance pilots," Finance Research Letters, Elsevier, vol. 45(C).
    17. Hong Pan & Jie Yang & Yang Yu & Yuan Zheng & Xiaonan Zheng & Chenyang Hang, 2024. "Intelligent Low-Consumption Optimization Strategies: Economic Operation of Hydropower Stations Based on Improved LSTM and Random Forest Machine Learning Algorithm," Mathematics, MDPI, vol. 12(9), pages 1-20, April.
    18. Merike Kukk & Jaanika Meriküll & Tairi Rõõm, 2023. "The Gender Wealth Gap in Europe: Application of Machine Learning to Predict Individual‐level Wealth," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 69(2), pages 289-317, June.
    19. van den Berg, Gerard J. & Stephan, Gesine & Uhlendorff, Arne, 2025. "Do Early Active Labor Market Policies Improve Outcomes of Not-Yet-Unemployed Workers? Findings from a Randomized Field Experiment," IZA Discussion Papers 17612, Institute of Labor Economics (IZA).
    20. Wang, Sicheng & Noland, Robert B., 2021. "What is the elasticity of sharing a ridesourcing trip?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 153(C), pages 284-305.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:2123-:d:1430103. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.