TY - JOUR
T1 - Deep learning for predicting the onset of type 2 diabetes : enhanced ensemble classifier using modified t-SNE
AU - Pokharel, Monima
AU - Alsadoon, Abeer
AU - Nguyen, Tran Quoc Vinh
AU - Al-Dala’in, Thair
AU - Pham, Duong Thu Hang
AU - Prasad, P.W.C.
AU - Mai, Ha Thi
PY - 2022
Y1 - 2022
N2 - Several methods have been used for detecting Type 2 diabetes mellitus (T2DM), but deep learning has not been successfully used to predict T2DM due to the low accuracy and performance. Using a traditional method like the synthetic minority over-sampling technique (SMOTE) affects the system’s accuracy. This study proposed an enhanced embedding technique that aims to increase the accuracy of predicting T2DM with minimum error. The proposed system uses the t-distributed Stochastic Neighbor Embedding (t-SNE), which visualizes the high dimension data with imbalanced and insufficient data to improve the accuracy, sensitivity, and specificity of T2DM production. It consists of three components: Pre-processing, feature extraction and selection, and classification. Pima Indians diabetics, Polarity, and Luzhou, are three datasets used for this proposed solution. The proposed system increased the overall performance of the model. It provides an accuracy of 85.34% from 83.96%, a sensitivity of 33.06% from 31.22%, and a specificity of 97.26% from 96.00% compared to the state-of-the-art. The proposed system reduced the overfitting problem, which affects the model’s accuracy. It also uses a non-linear technique for dimension reduction that is used for the visualization of high dimension datasets to deal with large, insufficient, and inconsistent datasets.
AB - Several methods have been used for detecting Type 2 diabetes mellitus (T2DM), but deep learning has not been successfully used to predict T2DM due to the low accuracy and performance. Using a traditional method like the synthetic minority over-sampling technique (SMOTE) affects the system’s accuracy. This study proposed an enhanced embedding technique that aims to increase the accuracy of predicting T2DM with minimum error. The proposed system uses the t-distributed Stochastic Neighbor Embedding (t-SNE), which visualizes the high dimension data with imbalanced and insufficient data to improve the accuracy, sensitivity, and specificity of T2DM production. It consists of three components: Pre-processing, feature extraction and selection, and classification. Pima Indians diabetics, Polarity, and Luzhou, are three datasets used for this proposed solution. The proposed system increased the overall performance of the model. It provides an accuracy of 85.34% from 83.96%, a sensitivity of 33.06% from 31.22%, and a specificity of 97.26% from 96.00% compared to the state-of-the-art. The proposed system reduced the overfitting problem, which affects the model’s accuracy. It also uses a non-linear technique for dimension reduction that is used for the visualization of high dimension datasets to deal with large, insufficient, and inconsistent datasets.
UR - https://hdl.handle.net/1959.7/uws:68253
U2 - 10.1007/s11042-022-12950-9
DO - 10.1007/s11042-022-12950-9
M3 - Article
SN - 1380-7501
VL - 81
SP - 27837
EP - 27852
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 19
ER -