A comparative analysis of ensemble autoML machine learning prediction accuracy of STEM student grade prediction: a multi-class classification prospective

Yagya Nath Rimal, Navneet Sharma, Abeer Alsadoon, Sayyed Khawar Abbas

Research output: Contribution to journalArticlepeer-review

Abstract

Statisticians in the early stages in research aiming to predict the relationship between dependent and independent variables for better classification. This association may exhibit either a negative or positive correlation to target features, albeit with varying degrees of reliability. This paper tries to fulfill the research gap between the selection of appropriate Automl model for multiclass student grade classification for the predication bachelor degree grade. Consequently, this study endeavors to assess the predictive accuracy of an ensemble AutoML (Automated Machine Learning) model for science, technology, engineering, and management students letter grading. This assessment is based on their subject grades from high school through to internal evaluation of bachelor's degree, to predict bachelor's degree (final) outcomes when the target variables are in multiclass letter grading in a modern system. The ensemble AutoML approach is employed to forecast upcoming grades. Nine out of 78 recommended Automl models undergo fine-tuning and cross-validation for performance metrics, evaluating the best-optimized hyperparameters and assessing their performance after best-optimized hyperparameters. The study analyzed the performance of various models in classifying STEM students, focusing on their accuracy and prediction error rates and miss classification between train and predicated values. The GBM_4_AutoML_1 model scored the highest at 0.28 (28%), followed by StackedEnsemble_BestOfFamily_5 at 0.31 (31%), DRF at 0.28 (28%), XRT at 0.30 (30%), DeepLearning_grid at 0.56 (56%), and GLM at 0.35 (35%). Furthermore, the confusion matrices when the optimized model of the GBM scored 100% matched the true and predicated student grades. The history scoring of each model tuned recommended hyperparameters to achieve the best model. The feature importance of dependent and independent features was analyzed comprehensively to true and predicated and internal multiclass grading classification of STEM student and contrasted to provide a detailed explanation of their respective performance.

Original languageEnglish
Number of pages27
JournalMultimedia Tools and Applications
DOIs
Publication statusE-pub ahead of print (In Press) - 2025

Keywords

  • Deep learning
  • Distributed random forest
  • Extremely randomized trees
  • Gradient linear models
  • Mean squared error
  • Root mean squared error
  • School leaving certificate
  • Stacked ensembled ALL gradient boosting

Fingerprint

Dive into the research topics of 'A comparative analysis of ensemble autoML machine learning prediction accuracy of STEM student grade prediction: a multi-class classification prospective'. Together they form a unique fingerprint.

Cite this