IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v15y2023i10p326-d1250408.html
   My bibliography  Save this article

A New Approach to Web Application Security: Utilizing GPT Language Models for Source Code Inspection

Author

Listed:
  • Zoltán Szabó

    (Department of Software Engineering, University of Szeged, Dugonics Square 13., 6720 Szeged, Hungary)

  • Vilmos Bilicki

    (Department of Software Engineering, University of Szeged, Dugonics Square 13., 6720 Szeged, Hungary)

Abstract

Due to the proliferation of large language models (LLMs) and their widespread use in applications such as ChatGPT, there has been a significant increase in interest in AI over the past year. Multiple researchers have raised the question: how will AI be applied and in what areas? Programming, including the generation, interpretation, analysis, and documentation of static program code based on promptsis one of the most promising fields. With the GPT API, we have explored a new aspect of this: static analysis of the source code of front-end applications at the endpoints of the data path. Our focus was the detection of the CWE-653 vulnerability—inadequately isolated sensitive code segments that could lead to unauthorized access or data leakage. This type of vulnerability detection consists of the detection of code segments dealing with sensitive data and the categorization of the isolation and protection levels of those segments that were previously not feasible without human intervention. However, we believed that the interpretive capabilities of GPT models could be explored to create a set of prompts to detect these cases on a file-by-file basis for the applications under study, and the efficiency of the method could pave the way for additional analysis tasks that were previously unavailable for automation. In the introduction to our paper, we characterize in detail the problem space of vulnerability and weakness detection, the challenges of the domain, and the advances that have been achieved in similarly complex areas using GPT or other LLMs. Then, we present our methodology, which includes our classification of sensitive data and protection levels. This is followed by the process of preprocessing, analyzing, and evaluating static code. This was achieved through a series of GPT prompts containing parts of static source code, utilizing few-shot examples and chain-of-thought techniques that detected sensitive code segments and mapped the complex code base into manageable JSON structures.Finally, we present our findings and evaluation of the open source project analysis, comparing the results of the GPT-based pipelines with manual evaluations, highlighting that the field yields a high research value. The results show a vulnerability detection rate for this particular type of model of 88.76%, among others.

Suggested Citation

  • Zoltán Szabó & Vilmos Bilicki, 2023. "A New Approach to Web Application Security: Utilizing GPT Language Models for Source Code Inspection," Future Internet, MDPI, vol. 15(10), pages 1-27, September.
  • Handle: RePEc:gam:jftint:v:15:y:2023:i:10:p:326-:d:1250408
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/15/10/326/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/15/10/326/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Katharine Sanderson, 2023. "GPT-4 is here: what scientists think," Nature, Nature, vol. 615(7954), pages 773-773, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Christopher J. Lynch & Erik J. Jensen & Virginia Zamponi & Kevin O’Brien & Erika Frydenlund & Ross Gore, 2023. "A Structured Narrative Prompt for Prompting Narratives from Large Language Models: Sentiment Assessment of ChatGPT-Generated Narratives and Real Tweets," Future Internet, MDPI, vol. 15(12), pages 1-36, November.
    2. Ching-Nam Hang & Pei-Duo Yu & Roberto Morabito & Chee-Wei Tan, 2024. "Large Language Models Meet Next-Generation Networking Technologies: A Review," Future Internet, MDPI, vol. 16(10), pages 1-29, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chiarello, Filippo & Giordano, Vito & Spada, Irene & Barandoni, Simone & Fantoni, Gualtiero, 2024. "Future applications of generative large language models: A data-driven case study on ChatGPT," Technovation, Elsevier, vol. 133(C).
    2. Bauer, Kevin & Liebich, Lena & Hinz, Oliver & Kosfeld, Michael, 2023. "Decoding GPT's hidden "rationality" of cooperation," SAFE Working Paper Series 401, Leibniz Institute for Financial Research SAFE.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:15:y:2023:i:10:p:326-:d:1250408. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.