Skip to main navigation Skip to search Skip to main content

Estimating number of speakers via density-based clustering and classification decision

  • Junjie Yang
  • , Yi Guo
  • , Zuyuan Yang
  • , Liu Yang
  • , Shengli Xie

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)
1 Downloads (Pure)

Abstract

It is crucial to robustly estimate the number of speakers (NoS) from the recorded audio mixtures in a reverberant environment. Some popular time-frequency (TF) methods approach this NoS estimation problem by assuming that only one of the speech components is active at each TF slot. However, this condition is violated in many scenarios where the speeches are convolved with long length of room impulse response coefficients, which causes degenerated performance of NoS estimation. To tackle this problem, a density-based clustering strategy is proposed to estimate NoS based on a local dominance assumption of speeches. Our method consists of several steps from clustering to classification of speakers with the consideration of robustness. First, the leading eigenvectors are extracted from the local covariance matrices of mixture TF components and ranked by the combination of local density and minimum distance to other leading eigenvectors with higher density. Second, a gap-based method is employed to determine the cluster centers from the ranked leading eigenvectors at each frequency bin. Third, a criterion based on averaged volume of cluster centers is proposed to select reliable clustering results at some frequency bins for the classification decision of NoS. The experiment results demonstrate that the proposed algorithm is superior to the existing methods in various reverberation cases with noise-free condition or noise condition.
Original languageEnglish
Article number8918027
Pages (from-to)176541-176551
Number of pages11
JournalIEEE Access
Volume7
DOIs
Publication statusPublished - 2019

Open Access - Access Right Statement

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/

Keywords

  • source separation (signal processing)

Fingerprint

Dive into the research topics of 'Estimating number of speakers via density-based clustering and classification decision'. Together they form a unique fingerprint.

Cite this