IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v14y2022i8p229-d873104.html
   My bibliography  Save this article

CCrFS: Combine Correlation Features Selection for Detecting Phishing Websites Using Machine Learning

Author

Listed:
  • Jimmy Moedjahedy

    (Computer Science Department, Universitas Klabat, Minahasa Utara 95371, Indonesia)

  • Arief Setyanto

    (Magister of Informatics Engineering, Universitas AMIKOM Yogyakarta, Yogyakarta 55281, Indonesia)

  • Fawaz Khaled Alarfaj

    (Department of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi Arabia)

  • Mohammed Alreshoodi

    (Unit of Scientific Research, Applied College, Qassim University, Buraydah 52362, Saudi Arabia)

Abstract

Internet users are continually exposed to phishing as cybercrime in the 21st century. The objective of phishing is to obtain sensitive information by deceiving a target and using the information for financial gain. The information may include a login detail, password, date of birth, credit card number, bank account number, and family-related information. To acquire these details, users will be directed to fill out the information on false websites based on information from emails, adverts, text messages, or website pop-ups. Examining the website’s URL address is one method for avoiding this type of deception. Identifying the features of a phishing website URL takes specialized knowledge and investigation. Machine learning is one method that uses existing data to teach machines to distinguish between legal and phishing website URLs. In this work, we proposed a method that combines correlation and recursive feature elimination to determine which URL characteristics are useful for identifying phishing websites by gradually decreasing the number of features while maintaining accuracy value. In this paper, we use two datasets that contain 48 and 87 features. The first scenario combines power predictive score correlation and recursive feature elimination; the second scenario is the maximal information coefficient correlation and recursive feature elimination. The third scenario combines spearman correlation and recursive feature elimination. All three scenarios from the combined findings of the proposed methodologies achieve a high level of accuracy even with the smallest feature subset. For dataset 1, the accuracy value for the 10 features result is 97.06%, and for dataset 2 the accuracy value is 95.88% for 10 features.

Suggested Citation

  • Jimmy Moedjahedy & Arief Setyanto & Fawaz Khaled Alarfaj & Mohammed Alreshoodi, 2022. "CCrFS: Combine Correlation Features Selection for Detecting Phishing Websites Using Machine Learning," Future Internet, MDPI, vol. 14(8), pages 1-18, July.
  • Handle: RePEc:gam:jftint:v:14:y:2022:i:8:p:229-:d:873104
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/14/8/229/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/14/8/229/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Nikita Pilnenskiy & Ivan Smetannikov, 2020. "Feature Selection Algorithms as One of the Python Data Analytical Tools," Future Internet, MDPI, vol. 12(3), pages 1-14, March.
    2. Rana Alabdan, 2020. "Phishing Attacks Survey: Types, Vectors, and Technical Approaches," Future Internet, MDPI, vol. 12(10), pages 1-37, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Padmalochan Panda & Alekha Kumar Mishra & Deepak Puthal, 2022. "A Novel Logo Identification Technique for Logo-Based Phishing Detection in Cyber-Physical Systems," Future Internet, MDPI, vol. 14(8), pages 1-17, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joakim Kävrestad & Allex Hagberg & Marcus Nohlberg & Jana Rambusch & Robert Roos & Steven Furnell, 2022. "Evaluation of Contextual and Game-Based Training for Phishing Detection," Future Internet, MDPI, vol. 14(4), pages 1-16, March.
    2. Muhammad Waqas & Alishba Hania & Farzan Yahya & Iqra Malik, 2023. "Enhancing Cybersecurity: The Crucial Role of Self-Regulation, Information Processing, and Financial Knowledge in Combating Phishing Attacks," SAGE Open, , vol. 13(4), pages 21582440231, December.
    3. Padmalochan Panda & Alekha Kumar Mishra & Deepak Puthal, 2022. "A Novel Logo Identification Technique for Logo-Based Phishing Detection in Cyber-Physical Systems," Future Internet, MDPI, vol. 14(8), pages 1-17, August.
    4. Ravi Kashyap, 2023. "DeFi Security: Turning The Weakest Link Into The Strongest Attraction," Papers 2312.00033, arXiv.org.
    5. Kausar Yasmeen & Muhammad Adnan, 2023. "Zero-day and zero-click attacks on digital banking: a comprehensive review of double trouble," Risk Management, Palgrave Macmillan, vol. 25(4), pages 1-24, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:14:y:2022:i:8:p:229-:d:873104. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.