IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v7y2022i11p165-d976562.html
   My bibliography  Save this article

Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels

Author

Listed:
  • Miguel Angel Valles-Coral

    (Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru)

  • Luis Salazar-Ramírez

    (Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru)

  • Richard Injante

    (Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru)

  • Edwin Augusto Hernandez-Torres

    (Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru)

  • Juan Juárez-Díaz

    (Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru)

  • Jorge Raul Navarro-Cabrera

    (Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru)

  • Lloy Pinedo

    (Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru)

  • Pierre Vidaurre-Rojas

    (Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru)

Abstract

Compliance with the basic conditions of quality in higher education implies the design of strategies to reduce student dropout, and Information and Communication Technologies (ICT) in the educational field have allowed directing, reinforcing, and consolidating the process of professional academic training. We propose an academic and emotional tracking model that uses data mining and machine learning to group university students according to their level of dropout risk. We worked with 670 students from a Peruvian public university, applied 5 valid and reliable psychological assessment questionnaires to them using a chatbot-based system, and then classified them using 3 density-based unsupervised learning algorithms, DBSCAN, K-Means, and HDBSCAN. The results showed that HDBSCAN was the most robust option, obtaining better validity levels in two of the three internal indices evaluated, where the performance of the Silhouette index was 0.6823, the performance of the Davies–Bouldin index was 0.6563, and the performance of the Calinski–Harabasz index was 369.6459. The best number of clusters produced by the internal indices was five. For the validation of external indices, with answers from mental health professionals, we obtained a high level of precision in the F -measure: 90.9%, purity: 94.5%, V -measure: 86.9%, and ARI: 86.5%, and this indicates the robustness of the proposed model that allows us to categorize university students into five levels according to the risk of dropping out.

Suggested Citation

  • Miguel Angel Valles-Coral & Luis Salazar-Ramírez & Richard Injante & Edwin Augusto Hernandez-Torres & Juan Juárez-Díaz & Jorge Raul Navarro-Cabrera & Lloy Pinedo & Pierre Vidaurre-Rojas, 2022. "Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels," Data, MDPI, vol. 7(11), pages 1-18, November.
  • Handle: RePEc:gam:jdataj:v:7:y:2022:i:11:p:165-:d:976562
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/7/11/165/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/7/11/165/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Montserrat Díaz-Méndez & Mario R. Paredes & Michael Saren, 2019. "Improving Society by Improving Education through Service-Dominant Logic: Reframing the Role of Students in Higher Education," Sustainability, MDPI, vol. 11(19), pages 1-14, September.
    2. Dario Sansone, 2019. "Beyond Early Warning Indicators: High School Dropout and Machine Learning," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 81(2), pages 456-485, April.
    3. Pavelea Alina Maria & Moldovan Octavian, 2020. "Why some Fail and others Succeed: Explaining the Academic Performance of PA Undergraduate Students," NISPAcee Journal of Public Administration and Policy, Sciendo, vol. 13(1), pages 109-132, June.
    4. Jorge Maluenda-Albornoz & Valeria Infante-Villagrán & Celia Galve-González & Gabriela Flores-Oyarzo & José Berríos-Riquelme, 2022. "Early and Dynamic Socio-Academic Variables Related to Dropout Intention: A Predictive Model Made during the Pandemic," Sustainability, MDPI, vol. 14(2), pages 1-18, January.
    5. Sergi Rovira & Eloi Puertas & Laura Igual, 2017. "Data-driven system to predict academic grades and dropout," PLOS ONE, Public Library of Science, vol. 12(2), pages 1-21, February.
    6. Alfredo Guzmán & Sandra Barragán & Favio Cala-Vitery, 2022. "Comparative Analysis of Dropout and Student Permanence in Rural Higher Education," Sustainability, MDPI, vol. 14(14), pages 1-25, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Neema Mduma, 2023. "Data Balancing Techniques for Predicting Student Dropout Using Machine Learning," Data, MDPI, vol. 8(3), pages 1-14, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Delogu, Marco & Lagravinese, Raffaele & Paolini, Dimitri & Resce, Giuliano, 2024. "Predicting dropout from higher education: Evidence from Italy," Economic Modelling, Elsevier, vol. 130(C).
    2. Filmer,Deon P. & Nahata,Vatsal & Sabarwal,Shwetlena, 2021. "Preparation, Practice, and Beliefs : A Machine Learning Approach to Understanding Teacher Effectiveness," Policy Research Working Paper Series 9847, The World Bank.
    3. Hazal Colak Oz & Çiçek Güven & Gonzalo Nápoles, 2023. "School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach," Journal of Computational Social Science, Springer, vol. 6(1), pages 245-287, April.
    4. Mehtab Alam & Fu-Ren Lin, 2022. "Internalizing Sustainability into Research Practices of Higher Education Institutions: Case of a Research University in Taiwan," Sustainability, MDPI, vol. 14(15), pages 1-30, August.
    5. Jorge Maluenda-Albornoz & José Berríos-Riquelme & Valeria Infante-Villagrán & Karla Lobos-Peña, 2022. "Perceived Social Support and Engagement in First-Year Students: The Mediating Role of Belonging during COVID-19," Sustainability, MDPI, vol. 15(1), pages 1-10, December.
    6. Isphording, Ingo E. & Raabe, Tobias, 2019. "Early Identification of College Dropouts Using Machine-Learning: Conceptual Considerations and an Empirical Example," IZA Research Reports 89, Institute of Labor Economics (IZA).
    7. Maria do Carmo Nicoletti & Osvaldo Luiz de Oliveira, 2020. "A Machine Learning-Based Computational System Proposal Aiming at Higher Education Dropout Prediction," Higher Education Studies, Canadian Center of Science and Education, vol. 10(4), pages 1-12, December.
    8. Zhe Cheng & Tong Xiao & Chen Chen & Xiong Xiong, 2022. "Evaluation of Scientific Research in Universities Based on the Idea of Education for Sustainable Development," Sustainability, MDPI, vol. 14(4), pages 1-18, February.
    9. Montorsi, Carlotta & Fusco, Alessio & Van Kerm, Philippe & Bordas, Stéphane P.A., 2024. "Predicting depression in old age: Combining life course data with machine learning," Economics & Human Biology, Elsevier, vol. 52(C).
    10. Neema Mduma, 2023. "Data Balancing Techniques for Predicting Student Dropout Using Machine Learning," Data, MDPI, vol. 8(3), pages 1-14, February.
    11. Sahin, Aleyna & Imamoglu, Gul & Murat, Mirac & Ayyildiz, Ertugrul, 2024. "A holistic decision-making approach to assessing service quality in higher education institutions," Socio-Economic Planning Sciences, Elsevier, vol. 92(C).
    12. Fedriani Martel, Eugenio M. & Romano Paguillo, Inmaculada, 2017. "Análisis cualitativo comparativo difuso para determinar influencias entre variables socio-económicas y el rendimiento académico de los universitarios || Fuzzy-Set Qualitative Comparative Analysis to D," Revista de Métodos Cuantitativos para la Economía y la Empresa = Journal of Quantitative Methods for Economics and Business Administration, Universidad Pablo de Olavide, Department of Quantitative Methods for Economics and Business Administration, vol. 24(1), pages 250-269, Diciembre.
    13. Dietrichson, Jens & Klokker, Rasmus H., 2024. "Predicting preschool problems," Children and Youth Services Review, Elsevier, vol. 161(C).
    14. Ashesh Rambachan & Amanda Coston & Edward Kennedy, 2022. "Robust Design and Evaluation of Predictive Algorithms under Unobserved Confounding," Papers 2212.09844, arXiv.org, revised May 2024.
    15. McKenzie, David & Sansone, Dario, 2019. "Predicting entrepreneurial success is hard: Evidence from a business plan competition in Nigeria," Journal of Development Economics, Elsevier, vol. 141(C).
    16. Liyang Tang, 2020. "Application of Nonlinear Autoregressive with Exogenous Input (NARX) neural network in macroeconomic forecasting, national goal setting and global competitiveness assessment," Papers 2005.08735, arXiv.org.
    17. Nuha Alruwais & Mohammed Zakariah, 2023. "Evaluating Student Knowledge Assessment Using Machine Learning Techniques," Sustainability, MDPI, vol. 15(7), pages 1-25, April.
    18. Bacon, Victoria R. & Kearney, Christopher A., 2020. "School climate and student-based contextual learning factors as predictors of school absenteeism severity at multiple levels via CHAID analysis," Children and Youth Services Review, Elsevier, vol. 118(C).
    19. Jack Vidal & Raquel Gilar-Corbi & Teresa Pozo-Rico & Juan-Luis Castejón & Tarquino Sánchez-Almeida, 2022. "Predictors of University Attrition: Looking for an Equitable and Sustainable Higher Education," Sustainability, MDPI, vol. 14(17), pages 1-27, September.
    20. Yuqing Geng & Nan Zhao, 2020. "Measurement of sustainable higher education development: Evidence from China," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-18, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:7:y:2022:i:11:p:165-:d:976562. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.