IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v10y2025i3p27-d1595396.html
   My bibliography  Save this article

SAPEx-D: A Comprehensive Dataset for Predictive Analytics in Personalized Education Using Machine Learning

Author

Listed:
  • Muhammad Adnan Aslam

    (Department of Computer Games Development, Faculty of Computing and Artificial Intelligence (FCAI), Air University, Islamabad 44000, Pakistan
    These authors contributed equally to this work.)

  • Fiza Murtaza

    (Department of Computer Games Development, Faculty of Computing and Artificial Intelligence (FCAI), Air University, Islamabad 44000, Pakistan
    These authors contributed equally to this work.)

  • Muhammad Ehatisham Ul Haq

    (Department of Computer Games Development, Faculty of Computing and Artificial Intelligence (FCAI), Air University, Islamabad 44000, Pakistan
    These authors contributed equally to this work.)

  • Amanullah Yasin

    (Department of Computer Science, Bahria University, BSEAS, H11, Islamabad 44000, Pakistan
    These authors contributed equally to this work.)

  • Numan Ali

    (Department of Computer Games Development, Faculty of Computing and Artificial Intelligence (FCAI), Air University, Islamabad 44000, Pakistan
    These authors contributed equally to this work.)

Abstract

Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student career chances, predicting learning success has been a central focus in education. Both performance analysis and providing high-quality instruction are challenges faced by modern schools. Maintaining high academic standards, juggling life and academics, and adjusting to technology are problems that students must overcome. In this study, we present a comprehensive dataset, SAPEx-D (Student Academic Performance Exploration), designed to predict student performance, encompassing a wide array of personal, familial, academic, and behavioral factors. Our data collection effort at Air University, Islamabad, Pakistan, involved both online and paper questionnaires completed by students across multiple departments, ensuring diverse representation. After meticulous preprocessing to remove duplicates and entries with significant missing values, we retained 494 valid responses. The dataset includes detailed attributes such as demographic information, parental education and occupation, study habits, reading frequencies, and transportation modes. To facilitate robust analysis, we encoded ordinal attributes using label encoding and nominal attributes using one-hot encoding, expanding our dataset from 38 to 88 attributes. Feature scaling was performed to standardize the range and distribution of data, using a normalization technique. Our analysis revealed that factors such as degree major, parental education, reading frequency, and scholarship type significantly influence student performance. The machine learning models applied to this dataset, including Gradient Boosting and Random Forest, demonstrated high accuracy and robustness, underscoring the dataset’s potential for insightful academic performance prediction. In terms of model performance, Gradient Boosting achieved an accuracy of 68.7% and an F1-score of 68% for the eight-class classification task. For the three-class classification, Random Forest outperformed other models, reaching an accuracy of 80.8% and an F1-score of 78%. These findings highlight the importance of comprehensive data in understanding and predicting academic outcomes, paving the way for more personalized and effective educational strategies.

Suggested Citation

  • Muhammad Adnan Aslam & Fiza Murtaza & Muhammad Ehatisham Ul Haq & Amanullah Yasin & Numan Ali, 2025. "SAPEx-D: A Comprehensive Dataset for Predictive Analytics in Personalized Education Using Machine Learning," Data, MDPI, vol. 10(3), pages 1-29, February.
  • Handle: RePEc:gam:jdataj:v:10:y:2025:i:3:p:27-:d:1595396
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/10/3/27/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/10/3/27/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:10:y:2025:i:3:p:27-:d:1595396. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.