Stacking Ensemble Learning for University Student Dropout Prediction
DOI:
https://doi.org/10.63158/journalisi.v8i1.1403Keywords:
Stacking Ensemble Learning, Student Dropout Prediction, STEM Education, SMOTE–Tomek Links, Educational Data MiningAbstract
Student dropout in STEM programs remains a persistent challenge for higher education institutions, reducing educational quality, weakening retention outcomes, and increasing inefficiencies in resource utilization. This study develops an interpretable Stacking Ensemble Learning approach to predict STEM student dropout risk and identify key academic and socioeconomic determinants that can support data-driven early intervention. Following the CRISP-DM framework, we analyze 3,630 student records from the UCI Machine Learning Repository containing demographic, academic, and socioeconomic attributes. The proposed stacking architecture combines Random Forest, Gradient Boosting, and XGBoost as base learners with Logistic Regression as a meta-learner, while SMOTE–Tomek Links is employed to address class imbalance and reduce boundary noise. Experimental results show that the model achieves strong predictive performance with 90.91% accuracy and ROC–AUC of 95.72%, demonstrating stable discrimination and outperforming individual base models. Feature importance analysis indicates that early academic trajectory variables—especially first- and second-semester success rates, total approved units, and average grades—are the most influential predictors of dropout risk. The proposed framework contributes a practical, interpretable early warning model by integrating stacking ensemble learning with imbalance handling and trajectory-based feature engineering, supporting actionable intervention planning in higher education.
Downloads
References
[1] M. Nagy and R. Molontay, “Interpretable Dropout Prediction: Towards XAI-Based Personalized Intervention,” Int. J. Artif. Intell. Educ., vol. 34, no. 2, pp. 274–300, Jun. 2024, doi: 10.1007/s40593-023-00331-8.
[2] S. Kim, E. Choi, Y. K. Jun, and S. Lee, “Student Dropout Prediction for University with High Precision and Recall,” Appl. Sci., vol. 13, no. 10, Art. no. 6275, May 2023, doi: 10.3390/app13106275.
[3] C. H. Cho, Y. W. Yu, and H. G. Kim, “A Study on Dropout Prediction for University Students Using Machine Learning,” Appl. Sci., vol. 13, no. 21, Art. no. 12004, Nov. 2023, doi: 10.3390/app132112004.
[4] T. Yoon and D. Kang, “Multi-Modal Stacking Ensemble for the Diagnosis of Cardiovascular Diseases,” J. Pers. Med., vol. 13, no. 2, Art. no. 373, Feb. 2023, doi: 10.3390/jpm13020373.
[5] M. Nascimento, A. C. C. Nascimento, C. F. Azevedo, A. C. B. de Oliveira, E. T. Caixeta, and D. Jarquin, “Enhancing genomic prediction with stacking ensemble learning in Arabica coffee,” Front. Plant Sci., vol. 15, Art. no. 1373318, 2024, doi: 10.3389/fpls.2024.1373318.
[6] J. Zheng, M. Wang, T. Yao, Y. Tang, and H. Liu, “Dynamic mechanical strength prediction of BFRC based on stacking ensemble learning and genetic algorithm optimization,” Buildings, vol. 13, no. 5, Art. no. 1155, May 2023, doi: 10.3390/buildings13051155.
[7] N. Doede, P. Merkel, M. Kriwall, M. Stonis, and B. A. Behrens, “Implementation of an intelligent process monitoring system for screw presses using the CRISP-DM standard,” Prod. Eng., 2024, doi: 10.1007/s11740-024-01298-8.
[8] A. M. Shimaoka, R. C. Ferreira, and A. Goldman, “The evolution of CRISP-DM for data science: Methods, processes and frameworks,” SBC Rev. Comput. Sci., vol. 4, no. 1, pp. 28–43, Oct. 2024, doi: 10.5753/reviews.2024.3757.
[9] C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” Procedia Comput. Sci., vol. 181, pp. 526–534, 2021, doi: 10.1016/j.procs.2021.01.199.
[10] E. Hakim and A. Muklason, “Analysis of employee work stress using CRISP-DM to reduce work stress on reasons for employee resignation,” Data Sci. J. Comput. Appl. Inform., vol. 8, no. 2, pp. 75–87, 2024, doi: 10.32734/jocai.v8.i2.
[11] V. Realinho, J. Machado, L. Baptista, and M. V. Martins, “Predicting student dropout and academic success,” Data, vol. 7, no. 11, Art. no. 146, 2022, doi: 10.3390/data7110146.
[12] A. Y. Wang, W. Epperson, R. A. Deline, and S. M. Drucker, “Diff in the Loop: Supporting Data Comparison in Exploratory Data Analysis,” in Proc. CHI Conf. Hum. Factors Comput. Syst., New Orleans, LA, USA, Apr. 2022, doi: 10.1145/3491102.3502123.
[13] M. B. Courtney, “Exploratory data analysis in schools: A logic model to guide implementation,” Int. J. Educ. Policy Leadersh., vol. 17, no. 4, May 2021, doi: 10.22230/ijepl.2021v17n4a1041.
[14] S. Marlia et al., “Analysis of music features and song popularity trends on Spotify using K-Means and CRISP-DM,” Sistemasi, 2024.
[15] M. Mujahid et al., “Data oversampling and imbalanced datasets: An investigation of performance for machine learning and feature engineering,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00943-4.
[16] R. Joeres, D. B. Blumenthal, and O. V. Kalinina, “DataSAIL: Data splitting against information leakage,” bioRxiv, Nov. 17, 2023, doi: 10.1101/2023.11.15.566305.
[17] Q. H. Nguyen et al., “Influence of data splitting on performance of machine learning models in prediction of shear strength of soil,” Math. Probl. Eng., vol. 2021, Art. no. 4832864, 2021, doi: 10.1155/2021/4832864.
[18] Y. Zhang, L. Deng, and B. Wei, “Imbalanced data classification based on improved Random-SMOTE and feature standard deviation,” Mathematics, vol. 12, no. 11, Art. no. 1709, Jun. 2024, doi: 10.3390/math12111709.
[19] H. Hairani, A. Anggrawan, and D. Priyanto, “Improvement performance of the random forest method on unbalanced diabetes data classification using SMOTE-Tomek link,” Int. J. Inform. Vis., 2023.
[20] J. Niyogisubizo, L. Liao, E. Nziyumva, E. Murwanashyaka, and P. C. Nshimyumukiza, “Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization,” Comput. Educ. Artif. Intell., vol. 3, Art. no. 100066, 2022, doi: 10.1016/j.caeai.2022.100066.
[21] M. Nascimento et al., “Enhancing genomic prediction with stacking ensemble learning in Arabica coffee,” Front. Plant Sci., vol. 15, Art. no. 1373318, 2024, doi: 10.3389/fpls.2024.1373318.
[22] S. Sathyanarayanan, “Confusion matrix-based performance evaluation metrics,” Afr. J. Biomed. Res., vol. 27, no. 4S, pp. 4023–4031, Nov. 2024, doi: 10.53555/ajbr.v27i4s.4345.
[23] E. K. Anku and H. O. Duah, “Predicting and identifying factors associated with undernutrition among children under five years in Ghana using machine learning algorithms,” PLoS One, vol. 19, no. 2, Feb. 2024, doi: 10.1371/journal.pone.0296625.
[24] D. Chicco and G. Jurman, “The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification,” BioData Min., vol. 16, no. 1, 2023, doi: 10.1186/s13040-023-00322-4.
[25] A. Gupta, V. Jain, and A. Singh, “Stacking ensemble-based intelligent machine learning model for predicting post-COVID-19 complications,” New Gener. Comput., vol. 40, no. 4, pp. 987–1007, Dec. 2022, doi: 10.1007/s00354-021-00144-0.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














