Classification of delayed students graduation risk : A comparative analysis of Naive Bayes, Xgboost, and random forest

fadillah, khafka and Atmadja, Aldy Rialdy and Lukman, atmadja (2026) Classification of delayed students graduation risk : A comparative analysis of Naive Bayes, Xgboost, and random forest. Classification of Delayed Students Graduation Risk : A Comparative Analysis of Naive Bayes, XGBoost, and Random Forest, 1 (1). pp. 1-13. ISSN 2621-7279

[img] Text
1843-Article Text-10638-1-10-20260226.pdf

Download (445kB)
Official URL: https://jurnal.masoemuniversity.ac.id/index.php/ai...

Abstract

One of the critical challenges affecting the effectiveness of higher education systems is delayed student graduation, which not only impacts institutional performance but also increases the financial and psychological burden on students. This study aims to classify the risk of delayed graduation by developing and evaluating machine learning models based on new student admission data. The dataset was obtained from the New Student Admission Center of UIN Sunan Gunung Djati Bandung and consists of students’ biodata, including socioeconomic characteristics and the educational background of students and their parents. The research was conducted following the CRISP-DM framework, encompassing business understanding, data understanding, data preparation, modeling, evaluation, and deployment planning. During the data preparation stage, preprocessing techniques such as data cleaning, encoding of categorical variables, and feature selection were applied to improve data quality. Three machine learning algorithms—Naïve Bayes, Random Forest, and XGBoost—were implemented and optimized using hyperparameter tuning to achieve optimal performance. Model evaluation was carried out using accuracy, precision, recall, F1-score, and ROC-AUC metrics to ensure a comprehensive comparison.The experimental results demonstrate that the Random Forest algorithm outperformed the other models, achieving an accuracy of 0.633, precision of 0.677, recall of 0.694, F1-score of 0.685, and ROC-AUC of 0.668. These findings indicate that machine learning models based on admission data are capable of providing a reasonably effective early prediction of delayed graduation risk. Nevertheless, the model performance can be further enhanced by incorporating academic performance variables during the study period. This study is expected to support higher education institutions in formulating data-driven strategies and early intervention programs for students with a high risk of delayed graduation.

Item Type: Article
Uncontrolled Keywords: graduation delay; new student admission data; machine learning; CRISP-DM; random forest; naïve bayes; xgboost
Subjects: Data Processing, Computer Science > Computer Science Education
Divisions: Fakultas Sains dan Teknologi > Program Studi Teknik Informatika
Depositing User: Khafka khafka khafka
Date Deposited: 20 Apr 2026 02:55
Last Modified: 20 Apr 2026 02:55
URI: https://digilib.uinsgd.ac.id/id/eprint/130275

Actions (login required)

View Item View Item