TY - JOUR
T1 - Exploring Bengali speech for gender classification
T2 - machine learning and deep learning approaches
AU - Arpita, Habiba Dewan
AU - Al Ryan, Abdullah
AU - Hossain, Md. Fahad
AU - Rahman, Md. Sadekur
AU - Sajjad, Md
AU - Prova, Nuzhat Noor Islam
PY - 2025/2
Y1 - 2025/2
N2 - Speech enables clear and powerful idea transmission. The human voice, rich in tone and emotion, holds unique beauty and significance in daily life. Vocal pitches vary by gender and are influenced by emotions and languages. While people naturally perceive these nuances, machines often struggle to capture these subtle distinctions. Machines may struggle to detect these nuances, but people effortlessly perceive them. This project aims to use various machine learning (ML) and deep learning (DL) techniques to reliably determine an individual's gender from a corpus of Bengali conversations. Our dataset comprises 3185 Bengali speeches, with 1100 delivered by males, 1035 by women, and 1050 by those who identify as third gender. We employed six distinct feature extraction techniques to examine the audio data: roll-off, spectral centroid, chroma-stft, spectral bandwidth, zero crossing rate, and Mel-frequency cepstral coefficients (MFCC). Extreme gradient boosting (XGBoost), support vector machines (SVM), K-nearest neighbors (KNN), decision trees classifier (DTC), and random forest (RF) were employed as the five ML algorithms to comprehensively analyze the dataset. For a full study, we also included 1D convolutional neural networks (CNN) from the DL area. The 1D CNN performed extraordinarily well, exceeding the accuracy of all other algorithms with a stunning 99.37%.
AB - Speech enables clear and powerful idea transmission. The human voice, rich in tone and emotion, holds unique beauty and significance in daily life. Vocal pitches vary by gender and are influenced by emotions and languages. While people naturally perceive these nuances, machines often struggle to capture these subtle distinctions. Machines may struggle to detect these nuances, but people effortlessly perceive them. This project aims to use various machine learning (ML) and deep learning (DL) techniques to reliably determine an individual's gender from a corpus of Bengali conversations. Our dataset comprises 3185 Bengali speeches, with 1100 delivered by males, 1035 by women, and 1050 by those who identify as third gender. We employed six distinct feature extraction techniques to examine the audio data: roll-off, spectral centroid, chroma-stft, spectral bandwidth, zero crossing rate, and Mel-frequency cepstral coefficients (MFCC). Extreme gradient boosting (XGBoost), support vector machines (SVM), K-nearest neighbors (KNN), decision trees classifier (DTC), and random forest (RF) were employed as the five ML algorithms to comprehensively analyze the dataset. For a full study, we also included 1D convolutional neural networks (CNN) from the DL area. The 1D CNN performed extraordinarily well, exceeding the accuracy of all other algorithms with a stunning 99.37%.
KW - Deep learning
KW - Gender classification
KW - Machine learning
KW - Mel-frequency cepstral coefficients
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85215588187&partnerID=8YFLogxK
U2 - 10.11591/eei.v14i1.8146
DO - 10.11591/eei.v14i1.8146
M3 - Article
AN - SCOPUS:85215588187
SN - 2089-3191
VL - 14
SP - 328
EP - 337
JO - Bulletin of Electrical Engineering and Informatics
JF - Bulletin of Electrical Engineering and Informatics
IS - 1
ER -