IDEAS home Printed from https://ideas.repec.org/a/spr/telsys/v79y2022i1d10.1007_s11235-021-00850-6.html
   My bibliography  Save this article

Application of word embedding and machine learning in detecting phishing websites

Author

Listed:
  • Routhu Srinivasa Rao

    (GMR Institute of Technology)

  • Amey Umarekar

    (National Institute of Technology)

  • Alwyn Roshan Pais

    (National Institute of Technology)

Abstract

Phishing is an attack whose aim is to gain personal information such as passwords, credit card details etc. from online users by deceiving them through fake websites, emails or any legitimate internet service. There exists many techniques to detect phishing sites such as third-party based techniques, source code based methods and URL based methods but still users are getting trapped into revealing their sensitive information. In this paper, we propose a new technique which detects phishing sites with word embeddings using plain text and domain specific text extracted from the source code. We applied various word embedding for the evaluation of our model using ensemble and multimodal approaches. From the experimental evaluation, we observed that multimodal with domain specific text achieved a significant accuracy of 99.34% with TPR of 99.59%, FPR of 0.93%, and MCC of 98.68%

Suggested Citation

  • Routhu Srinivasa Rao & Amey Umarekar & Alwyn Roshan Pais, 2022. "Application of word embedding and machine learning in detecting phishing websites," Telecommunication Systems: Modelling, Analysis, Design and Management, Springer, vol. 79(1), pages 33-45, January.
  • Handle: RePEc:spr:telsys:v:79:y:2022:i:1:d:10.1007_s11235-021-00850-6
    DOI: 10.1007/s11235-021-00850-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11235-021-00850-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11235-021-00850-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Abdul Basit & Maham Zafar & Xuan Liu & Abdul Rehman Javed & Zunera Jalil & Kashif Kifayat, 2021. "A comprehensive survey of AI-enabled phishing attacks detection techniques," Telecommunication Systems: Modelling, Analysis, Design and Management, Springer, vol. 76(1), pages 139-154, January.
    2. Shan Wang & Sulaiman Khan & Chuyi Xu & Shah Nazir & Abdul Hafeez, 2020. "Deep Learning-Based Efficient Model Development for Phishing Detection Using Random Forest and BLSTM Classifiers," Complexity, Hindawi, vol. 2020, pages 1-7, September.
    3. Li Xu & Zhenxin Zhan & Shouhuai Xu & Keying Ye & Keesook Han & Frank Born, 2013. "Cross-Layer Detection of Malicious Websites," Working Papers 0150mss, College of Business, University of Texas at San Antonio.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Emtethal K. Alamri & Abdullah M. Alnajim & Suliman A. Alsuhibany, 2022. "Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks," Future Internet, MDPI, vol. 14(3), pages 1-21, March.
    2. Scott Robbins & Aimee van Wynsberghe, 2022. "Our New Artificial Intelligence Infrastructure: Becoming Locked into an Unsustainable Future," Sustainability, MDPI, vol. 14(8), pages 1-11, April.
    3. Kumar Prateek & Nitish Kumar Ojha & Fahiem Altaf & Soumyadev Maity, 2023. "Quantum secured 6G technology-based applications in Internet of Everything," Telecommunication Systems: Modelling, Analysis, Design and Management, Springer, vol. 82(2), pages 315-344, February.
    4. Nikola Anđelić & Sandi Baressi Šegota & Ivan Lorencin & Matko Glučina, 2022. "Detection of Malicious Websites Using Symbolic Classifier," Future Internet, MDPI, vol. 14(12), pages 1-30, November.
    5. Tepede Dipo & Akpa Michael Onyedikachi, 2024. "Developing a Biblical Solution Model for Mitigating Phishing Risks Among Internet Banking Users in Nigeria: The Initial Investigation," International Journal of Latest Technology in Engineering, Management & Applied Science, International Journal of Latest Technology in Engineering, Management & Applied Science (IJLTEMAS), vol. 13(4), pages 61-75, April.
    6. Hernández-Rivera, Ariadna, 2023. "Brecha de género en la confianza de productos y servicios financieros desde la perspectiva del comportamiento," Revista Finanzas y Politica Economica, Universidad Católica de Colombia, vol. 15(1), pages 245-273, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:telsys:v:79:y:2022:i:1:d:10.1007_s11235-021-00850-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.