IDEAS home Printed from https://ideas.repec.org/a/ajp/edwast/v9y2025i2p1374-1390id4650.html
   My bibliography  Save this article

A new intelligent system for malicious URLs detection

Author

Listed:
  • Hayder Raeed Hekmat AL-Shawk
  • Ibrahim M. El-Hasnony
  • Hazem M. El-Bakry

Abstract

In cybersecurity, recognizing and mitigating malicious URLs represents paramount challenges due to their various cyber threats, including phishing, malware distribution, and fraud. This paper aims to create a URL detection system that employs machine learning and data mining methods. The proposed system comprises several steps: data acquisition, preprocessing, feature selection, URL tokenization, and classification. First, we acquire a recent dataset containing both malicious URLs and normal ones and 87 numerical features. The features are preprocessed by scaling them using a standard scaler to prevent the model from being biased towards certain features. Furthermore, Fick's Law metaheuristic optimization algorithm (FLA) is used for feature selection, utilizing the Light Gradient Boosting Machines (LGBM) accuracy as a fitness function for the algorithm, resulting in a 50% feature reduction. The URLs are tokenized using Bidirectional Encoder Representations from Transformers (BERT) and converted to a feature vector. The combined BERT feature vector and FLA-selected features are input for the Categorical Boosting (CatBoost) classifier, achieving 96.59% accuracy, 96.75% precision, 96.41% recall, and 96.58% F1-score. The system surpasses all other machine learning and deep learning methodologies in its validation. Additionally, the proposed system outperformed the results of previous studies that utilized the same dataset. The proposed system is an effective and efficient approach for detecting malicious URLs, safeguarding digital assets, and ensuring the integrity of online environments.

Suggested Citation

  • Hayder Raeed Hekmat AL-Shawk & Ibrahim M. El-Hasnony & Hazem M. El-Bakry, 2025. "A new intelligent system for malicious URLs detection," Edelweiss Applied Science and Technology, Learning Gate, vol. 9(2), pages 1374-1390.
  • Handle: RePEc:ajp:edwast:v:9:y:2025:i:2:p:1374-1390:id:4650
    as

    Download full text from publisher

    File URL: https://learning-gate.com/index.php/2576-8484/article/view/4650/1807
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ajp:edwast:v:9:y:2025:i:2:p:1374-1390:id:4650. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Melissa Fernandes (email available below). General contact details of provider: https://learning-gate.com/index.php/2576-8484/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.