IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v14y2022i7p194-d848579.html
   My bibliography  Save this article

A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language

Author

Listed:
  • Yousif A. Alhaj

    (Sanaa Community College, Sanaa 5695, Yemen)

  • Abdelghani Dahou

    (Mathematics and Computer Science Department, Ahmed Draia University, Adrar 01000, Algeria)

  • Mohammed A. A. Al-qaness

    (State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
    Faculty of Engineering, Sana’a University, Sana’a 12544, Yemen)

  • Laith Abualigah

    (Faculty of Information Technology, Middle East University, Amman 11831, Jordan
    Faculty of Computer Sciences and Informatics, Amman Arab University, Amman 11953, Jordan)

  • Aaqif Afzaal Abbasi

    (Department of Software Engineering, Foundation University Islamabad, Islamabad 44000, Pakistan)

  • Nasser Ahmed Obad Almaweri

    (Sanaa Community College, Sanaa 5695, Yemen)

  • Mohamed Abd Elaziz

    (Faculty of Computer Science and Engineering, Galala University, Suez 435611, Egypt
    Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman P.O. Box 346, United Arab Emirates
    Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt)

  • Robertas Damaševičius

    (Department of Applied Informatics, Vytautas Magnus University, 44404 Kaunas, Lithuania)

Abstract

We propose a novel text classification model, which aims to improve the performance of Arabic text classification using machine learning techniques. One of the effective solutions in Arabic text classification is to find the suitable feature selection method with an optimal number of features alongside the classifier. Although several text classification methods have been proposed for the Arabic language using different techniques, such as feature selection methods, an ensemble of classifiers, and discriminative features, choosing the optimal method becomes an NP-hard problem considering the huge search space. Therefore, we propose a method, called Optimal Configuration Determination for Arabic text Classification (OCATC), which utilized the Particle Swarm Optimization (PSO) algorithm to find the optimal solution (configuration) from this space. The proposed OCATC method extracts and converts the features from the textual documents into a numerical vector using the Term Frequency-Inverse Document Frequency (TF–IDF) approach. Finally, the PSO selects the best architecture from a set of classifiers to feature selection methods with an optimal number of features. Extensive experiments were carried out to evaluate the performance of the OCATC method using six datasets, including five publicly available datasets and our proposed dataset. The results obtained demonstrate the superiority of OCATC over individual classifiers and other state-of-the-art methods.

Suggested Citation

  • Yousif A. Alhaj & Abdelghani Dahou & Mohammed A. A. Al-qaness & Laith Abualigah & Aaqif Afzaal Abbasi & Nasser Ahmed Obad Almaweri & Mohamed Abd Elaziz & Robertas Damaševičius, 2022. "A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language," Future Internet, MDPI, vol. 14(7), pages 1-18, June.
  • Handle: RePEc:gam:jftint:v:14:y:2022:i:7:p:194-:d:848579
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/14/7/194/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/14/7/194/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Unler, Alper & Murat, Alper, 2010. "A discrete particle swarm optimization method for feature selection in binary classification problems," European Journal of Operational Research, Elsevier, vol. 206(3), pages 528-539, November.
    2. Gerard Salton & Chris Buckley, 1990. "Improving retrieval performance by relevance feedback," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(4), pages 288-297, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xinlu Li & Yuanyuan Lei & Shengwei Ji, 2022. "BERT- and BiLSTM-Based Sentiment Analysis of Online Chinese Buzzwords," Future Internet, MDPI, vol. 14(11), pages 1-15, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yu, Shiwei & Wei, Yi-Ming & Fan, Jingli & Zhang, Xian & Wang, Ke, 2012. "Exploring the regional characteristics of inter-provincial CO2 emissions in China: An improved fuzzy clustering analysis based on particle swarm optimization," Applied Energy, Elsevier, vol. 92(C), pages 552-562.
    2. Wen, Hanguan & Liu, Xiufeng & Yang, Ming & Lei, Bo & Xu, Cheng & Chen, Zhe, 2024. "A novel approach for identifying customer groups for personalized demand-side management services using household socio-demographic data," Energy, Elsevier, vol. 286(C).
    3. Moraes, Marcelo Botelho da Costa & Nagano, Marcelo Seido, 2014. "Evolutionary models in cash management policies with multiple assets," Economic Modelling, Elsevier, vol. 39(C), pages 1-7.
    4. Lee, In Gyu & Yoon, Sang Won & Won, Daehan, 2022. "A Mixed Integer Linear Programming Support Vector Machine for Cost-Effective Group Feature Selection: Branch-Cut-and-Price Approach," European Journal of Operational Research, Elsevier, vol. 299(3), pages 1055-1068.
    5. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    6. Zhixiang Chen & Bin Fu & John Abraham, 2010. "A quadratic lower bound for Rocchio’s similarity-based relevance feedback algorithm with a fixed query updating factor," Journal of Combinatorial Optimization, Springer, vol. 19(2), pages 134-157, February.
    7. Bin, Wei & Qinke, Peng & Jing, Zhao & Xiao, Chen, 2012. "A binary particle swarm optimization algorithm inspired by multi-level organizational learning behavior," European Journal of Operational Research, Elsevier, vol. 219(2), pages 224-233.
    8. Roland Graef & Mathias Klier & Kilian Kluge & Jan Felix Zolitschka, 2021. "Human-machine collaboration in online customer service – a long-term feedback-based approach," Electronic Markets, Springer;IIM University of St. Gallen, vol. 31(2), pages 319-341, June.
    9. Asim Roy & Patrick Mackin & Jyrki Wallenius & James Corner & Mark Keith & Gregory Schymik & Hina Arora, 2008. "An Interactive Search Method Based on User Preferences," Decision Analysis, INFORMS, vol. 5(4), pages 203-229, December.
    10. Veda C. Storey & Andrew Burton-Jones & Vijayan Sugumaran & Sandeep Purao, 2008. "CONQUER: A Methodology for Context-Aware Query Processing on the World Wide Web," Information Systems Research, INFORMS, vol. 19(1), pages 3-25, March.
    11. Fouskakis, D., 2012. "Bayesian variable selection in generalized linear models using a combination of stochastic optimization methods," European Journal of Operational Research, Elsevier, vol. 220(2), pages 414-422.
    12. Huang, Yuming & Ge, Bingfeng & Hipel, Keith W. & Fang, Liping & Zhao, Bin & Yang, Kewei, 2023. "Solving the inverse graph model for conflict resolution using a hybrid metaheuristic algorithm," European Journal of Operational Research, Elsevier, vol. 305(2), pages 806-819.
    13. Toshiki Sato & Yuichi Takano & Ryuhei Miyashiro & Akiko Yoshise, 2016. "Feature subset selection for logistic regression via mixed integer optimization," Computational Optimization and Applications, Springer, vol. 64(3), pages 865-880, July.
    14. Wang, Lizhi & Nikouei Mehr, Maryam, 2019. "An optimization approach to epistasis detection," European Journal of Operational Research, Elsevier, vol. 274(3), pages 1069-1076.
    15. Li, An-Da & He, Zhen & Wang, Qing & Zhang, Yang, 2019. "Key quality characteristics selection for imbalanced production data using a two-phase bi-objective feature selection method," European Journal of Operational Research, Elsevier, vol. 274(3), pages 978-989.
    16. Lin, Qiuzhen & Li, Jianqiang & Du, Zhihua & Chen, Jianyong & Ming, Zhong, 2015. "A novel multi-objective particle swarm optimization with multiple search strategies," European Journal of Operational Research, Elsevier, vol. 247(3), pages 732-744.
    17. Lin Xu & Maoliang Ling & Yujie Lu & Meng Shen, 2017. "Understanding Household Waste Separation Behaviour: Testing the Roles of Moral, Past Experience, and Perceived Policy Effectiveness within the Theory of Planned Behaviour," Sustainability, MDPI, vol. 9(4), pages 1-27, April.
    18. Bertolazzi, P. & Felici, G. & Festa, P. & Fiscon, G. & Weitschek, E., 2016. "Integer programming models for feature selection: New extensions and a randomized solution algorithm," European Journal of Operational Research, Elsevier, vol. 250(2), pages 389-399.
    19. Mohammad Mahdi Mousavi & Jamal Ouenniche & Kaoru Tone, 2023. "A dynamic performance evaluation of distress prediction models," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(4), pages 756-784, July.
    20. Wang, Xin & Liu, Xiaodong & Pedrycz, Witold & Zhu, Xiaolei & Hu, Guangfei, 2012. "Mining axiomatic fuzzy set association rules for classification problems," European Journal of Operational Research, Elsevier, vol. 218(1), pages 202-210.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:14:y:2022:i:7:p:194-:d:848579. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.