IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i13p2128-d1430346.html
   My bibliography  Save this article

SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers

Author

Listed:
  • Mohammad D. Alahmadi

    (Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 23890, Saudi Arabia)

  • Moayad Alshangiti

    (Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 23890, Saudi Arabia)

  • Jumana Alsubhi

    (School of Computing, University of Georgia, Athens, GA 30602, USA)

Abstract

Developers often rely on online resources, such as Stack Overflow (SO), to seek assistance for programming tasks. To facilitate effective search and resource discovery, manual tagging of questions and posts with the appropriate programming language is essential. However, accurate tagging is not consistently achieved, leading to the need for the automated classification of code snippets into the correct programming language as a tag. In this study, we introduce a novel approach to automated classification of code snippets from Stack Overflow (SO) posts into programming languages using generative pre-trained transformers (GPT). Our method, which does not require additional training on labeled data or dependency on pre-existing labels, classifies 224,107 code snippets into 19 programming languages. We employ the text-davinci-003 model of ChatGPT-3.5 and postprocess its responses to accurately identify the programming language. Our empirical evaluation demonstrates that our GPT-based model (SCC-GPT) significantly outperforms existing methods, achieving a median F1-score improvement that ranges from +6% to +31%. These findings underscore the effectiveness of SCC-GPT in enhancing code snippet classification, offering a cost-effective and efficient solution for developers who rely on SO for programming assistance.

Suggested Citation

  • Mohammad D. Alahmadi & Moayad Alshangiti & Jumana Alsubhi, 2024. "SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers," Mathematics, MDPI, vol. 12(13), pages 1-12, July.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:2128-:d:1430346
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/13/2128/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/13/2128/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Mohammad D. Alahmadi, 2022. "VID2META: Complementing Android Programming Screencasts with Code Elements and GUIs," Mathematics, MDPI, vol. 10(17), pages 1-22, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mohammad D. Alahmadi & Moayad Alshangiti, 2024. "Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models," Mathematics, MDPI, vol. 12(7), pages 1-19, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:13:p:2128-:d:1430346. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.