IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i7p1036-d1367415.html
   My bibliography  Save this article

Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models

Author

Listed:
  • Mohammad D. Alahmadi

    (Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 23890, Saudi Arabia)

  • Moayad Alshangiti

    (Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 23890, Saudi Arabia)

Abstract

The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software development. This study investigates the impact of video quality on the performance of optical character recognition (OCR) engines and the potential of large language models (LLMs) to enhance code extraction accuracy. Our comprehensive empirical analysis utilizes a rich dataset of programming screencasts, involving manual transcription of source code and the application of both traditional OCR engines, like Tesseract and Google Vision, and advanced LLMs, including GPT-4V and Gemini. We investigate the efficacy of image super-resolution (SR) techniques, namely, enhanced deep super-resolution (EDSR) and multi-scale deep super-resolution (MDSR), in improving the quality of low-resolution video frames. The findings reveal significant improvements in OCR accuracy with the use of SR, particularly at lower resolutions such as 360p. LLMs demonstrate superior performance across all video qualities, indicating their robustness and advanced capabilities in diverse scenarios. This research contributes to the field of software engineering by offering a benchmark for code extraction from video tutorials and demonstrating the substantial impact of SR techniques and LLMs in enhancing the readability and reusability of code from these educational resources.

Suggested Citation

  • Mohammad D. Alahmadi & Moayad Alshangiti, 2024. "Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models," Mathematics, MDPI, vol. 12(7), pages 1-19, March.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:7:p:1036-:d:1367415
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/7/1036/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/7/1036/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Mohammad D. Alahmadi, 2022. "VID2META: Complementing Android Programming Screencasts with Code Elements and GUIs," Mathematics, MDPI, vol. 10(17), pages 1-22, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mohammad D. Alahmadi & Moayad Alshangiti & Jumana Alsubhi, 2024. "SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers," Mathematics, MDPI, vol. 12(13), pages 1-12, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:7:p:1036-:d:1367415. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.