Evaluation of Speaker Recognition System Using Different Distance Metrics

Sukhvinder Kaur; Monica; J. S. Sohal

doi:10.32628/CSEIT172562

Authors

Sukhvinder Kaur SDDIET, Barwala, Golpura-134009, Haryana, India
Monica SDDIET, Barwala, Golpura-134009, Haryana, India
J. S. Sohal Director, LCET, Ludhiana-141113, Punjab, India

Keywords:

Bayesian Information criteria (BIC); Kullback-Leibler Distance Metric (KL2); T-test Distance Metric; Mel Frequency Cepstral Coefficients (MFCC); Nonlinear energy operator (NEO); Detection Error Tradeoff(DET); Receiver Operating Characteristics (ROC); Area Under Curve (AUC).

Abstract

In today's world scenario, speaker recognition system is very popular in voice verification for identity and access control to services. In this paper, speaker identification and verification is done with the help of feature extraction and different matching algorithms. We have introduced a new approach for speaker recognition. In this system, speech signals are firstly framed and then these signals are compressed using DWT for noise reduction and better sampling frequency. Furthermore, features of compressed signal are extracted with the help of Mel frequency Cepstral Coefficients (MFCC) and nonlinear energy operator (NEO). These features are further used for identification and verification of speaker’s voice. The distance metrics incorporated are Delta Bayesian Information Criteria (delta BIC), Kullback-Leibler Distance Metric (KL2), and T-Test metric. At the end, results are evaluated with Detection Error Tradeoff (DET) curve and Receiver Operator Characteristics (ROC) curve by finding the area under curve (AUC). The best result is shown by T-Test metric with MFCC feature.

References

H. Beigi, Fundamentals of Speaker Recognition. 2011.
M. A. Imtiaz and G. Raja, "Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN," Proc. - APMediaCast 2016, pp. 106-110, 2017.
S. M. Joseph, "Speech Compression Using Wavelet Transform," pp. 754-758, 2011.
S. Pal, "Speech Signal Processing : Non-Linear Energy Operator Centric Review," Int. J. Electron. Eng. Res., vol. 4, no. 3, pp. 205-221, 2012.
S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion," Proc. DARPA Broadcast News Transcr. Underst. Work., vol. 8, pp. 127-132, 1998.
M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, "Automatic Segmentation, Classification and Clustering of Broadcast News Audio," Proc. DARPA Speech Recognit. Work., pp. 97-99, 1997.
T. H. Nguyen, S. Chng, and H. Li, "T-test distance and clustering criterion for speaker diarization," Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, no. 4, pp. 36-39, 2008.
L. F. Carvalho, G. Fernandes, M. V. O. De Assis, J. J. P. C. Rodrigues, and M. Lemes Proença, "Digital signature of network segment for healthcare environments support," Irbm, vol. 35, no. 6, pp. 299-309, 2014.
A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, "The DET Curve in Assessment of Detection Task Performance," Proc. Eurospeech ’97, pp. 1895-1898, 1997.

Evaluation of Speaker Recognition System Using Different Distance Metrics

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite