The effect of noise on speaker identification and finding a noise that improves accuracy

Md Atiqul Islam, Mohammed Abdul Kader

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Conventional Speaker Identification (SID) systems accurately identify speakers if their speech is noiseless. However, their classification accuracies reduce substantially when speech is corrupted by noise. SID systems would be more practical and applicable if they were more noise-robust. We introduce an SID system that can accurately classify speakers, even when their speech is corrupted by various types of noise at different noise levels. We investigate the impact of noisy training data on the performance of an SID system and the noise that may enhance the performance of an SID system. In this paper, we compare two front-end feature extractors: a cochlea model called the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR-FAC) and an FFT-based Gammatone Frequency Cepstral Coefficient (GFCC). We use the Gaussian Mixture Model with the Universal Background Model (GMM-UBM) and a Extreme Learning Machine (ELM) as classifiers to focus on the influence of the front-ends on performance. We train the GMM-UBM and the neural network with noisy data under various conditions to investigate the impact of noise on the classifier. Our results suggest that noisy training data make an SID system noise-robust while the performance under clean conditions remains almost the same. More interestingly, training with speech-shaped noise (cocktail party) enhances SID accuracy more than white noise.

    Original languageEnglish
    Pages (from-to)814-829
    Number of pages16
    JournalIndonesian Journal of Electrical Engineering and Informatics
    Volume13
    Issue number3
    DOIs
    Publication statusPublished - Sept 2025

    Keywords

    • CAR-FAC
    • Cocktail Party
    • GFCC
    • GMM-UBM
    • Noise-robust
    • Speaker Identification

    Fingerprint

    Dive into the research topics of 'The effect of noise on speaker identification and finding a noise that improves accuracy'. Together they form a unique fingerprint.

    Cite this