Optimizing Speaker Recognition in Complex Environments : An Enhanced Framework with Artificial Neural Networks for Multi-Speaker Settings

Anitha Mummireddygari; N Ananda Reddy

doi:10.32628/CSEIT24103116

Authors

Anitha Mummireddygari M.Tech Student, Department Of CSE, Siddhartha Educational Academy Group of Institutions, Tirupati, Andhra Pradesh, India Author
N Ananda Reddy Assistant Professor, Department of CSE, Siddhartha Educational Academy Group of Institutions, Tirupati, Andhra Pradesh, India Author

DOI:

https://doi.org/10.32628/CSEIT24103116

Keywords:

Speaker Recognition System, Convolutional Neural Networks, Mel Frequency Cepstral Coefficients, K Nearest Neighbor, Feature Extraction

Abstract

This study focuses on the development of an advanced speaker recognition system utilizing Convolutional Neural Networks (CNN) in conjunction with Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and K Nearest Neighbor (KNN) for classification. The proposed system aims to improve accuracy by refining the fine-tuning layer within the CNN architecture. By leveraging the unique characteristics of human voice as a biometric identifier, the system extracts voice data features using MFCC, then employs CNN with triplet loss to generate 128-dimensional embeddings. These embeddings are subsequently classified using the KNN method. The system's performance was evaluated using 50 speakers from the TIMIT dataset and 60 speakers from live recordings made with a smartphone, demonstrating high accuracy. This study highlights the potential of combining CNN and MFCC for robust speaker recognition and suggests that future research could further enhance recognition accuracy by integrating multimodal biometric systems, which combine different types of biometric data for more comprehensive identification.

📊 Article Downloads

References

R. Ryu, S. Yeom, S.-H. Kim, and D. Herbert, ‘‘Continuous multimodal biometric authentication schemes: A systematic review,’’ IEEEAccess, vol. 9, pp. 34541–34557, 2021, doi: 10.1109/ACCESS.2021.3061589. DOI: https://doi.org/10.1109/ACCESS.2021.3061589

M. M. Kabir, M. F. Mridha, J. Shin, I. Jahan, and A. Q. Ohi, ‘‘A survey of speaker recognition: Fundamental theories, recognition methodsand opportunities,’’ IEEE Access, vol. 9, pp. 79236–79263, 2021, doi:10.1109/ACCESS.2021.3084299. DOI: https://doi.org/10.1109/ACCESS.2021.3084299

Veridium Enterprise. (2019). How Your Biometric Data is Different FromYour Password—Veridium. [Online]. Available: https://veridiumid.com/case-studies/?_ga=2.182866598.1698325735.1656058419-1850179022.1656058419

C. Burt. (2019). More Than 4 in 5 Americans Support AirportBiometrics, Unisys Survey Shows—Biometric Update. [Online]. Available:https://www.biometricupdate.com/201906/more-than-4-in-5-americanssupport-airport-biometrics-unisys-survey-shows

X. Mu and C.-H. Min, ‘‘MFCC as features for speaker classification using machine learning,’’ in Proc. IEEE World AI IoTCongr. (AIIoT), Seattle, WA, USA, Jun. 2023, pp. 566–570, doi:10.1109/AIIoT58121.2023.10174566. DOI: https://doi.org/10.1109/AIIoT58121.2023.10174566

A. Sedik, L. Tawalbeh, M. Hammad, A. A. A. El-Latif, G. M. El-Banby,A. A. M. Khalaf, F. E. A. El-Samie, and A. M. Iliyasu, ‘‘Deep learning modalities for biometric alteration detection in 5G networks-basedsecure smart cities,’’ IEEE Access, vol. 9, pp. 94780–94788, 2021, doi:10.1109/ACCESS.2021.3088341. DOI: https://doi.org/10.1109/ACCESS.2021.3088341

H. Mandalapu, P. N. A. Reddy, R. Ramachandra, K. S. Rao, P. Mitra,S. R. M. Prasanna, and C. Busch, ‘‘Audio-visual biometric recognitionand presentation attack detection: A comprehensive survey,’’ IEEE Access,vol. 9, pp. 37431–37455, 2021, doi: 10.1109/ACCESS.2021.3063031. DOI: https://doi.org/10.1109/ACCESS.2021.3063031

R. Anand, J. Singh, V. Jains, and S. Rathore, ‘‘Biometrics security technology with speaker recognition,’’ Int. J. Adv. Res. Comput. Eng. Technol.,vol. 1, no. 10, pp. 232–236, 2012.

J. Lee. (2017). Bank of America to Pilot Samsung Iris Recognition forMobile Banking—Biometric Update. [Online]. Available: https://www.biometricupdate.com/201708/bank-of-america-to-pilot-samsung-irisrecognition-for-mobile-banking

L. Wu, J. Yang, M. Zhou, Y. Chen, and Q. Wang, ‘‘LVID: Amultimodal biometrics authentication system on smartphones,’’ IEEETrans. Inf. Forensics Security, vol. 15, pp. 1572–1585, 2020, doi:10.1109/TIFS.2019.2944058. DOI: https://doi.org/10.1109/TIFS.2019.2944058

P. H. Lee, L.-J. Chu, Y.-P. Hung, S.-W. Shih, C.-S. Chen, and H.-M. Wang,‘‘Cascading multimodal verification using face, voice and iris information,’’ in Proc. IEEE Multimedia Expo Int. Conf., Jul. 2007, pp. 847–850,doi: 10.1109/ICME.2007.4284783. DOI: https://doi.org/10.1109/ICME.2007.4284783

T. M. Alsultan, A. A. Salam, K. A. Alissa, and N. A. Saqib, ‘‘A comparative study of biometric authentication in cloud computing,’’ in Proc.Int. Symp. Netw., Comput. Commun. (ISNCC), Jun. 2019, pp. 1–6, doi:10.1109/ISNCC.2019.8909117. DOI: https://doi.org/10.1109/ISNCC.2019.8909117

K. Arora, J. Singh, and Y. S. Randhawa, ‘‘A survey on channel codingtechniques for 5G wireless networks,’’ Telecommun. Syst., vol. 73, no. 4,pp. 637–663, Apr. 2020, doi: 10.1007/s11235-019-00630-3. DOI: https://doi.org/10.1007/s11235-019-00630-3

M. C. Chiu, ‘‘Analysis and design of polar-coded modulation,’’ IEEETrans. Commun., vol. 70, no. 3, pp. 1508–1521, Mar. 2022, doi:10.1109/TCOMM.2022.3142280. DOI: https://doi.org/10.1109/TCOMM.2022.3142280

J. Wang, A. Ji, and M. T. Johnson, ‘‘Features for phoneme independentspeaker identification,’’ in Proc. Int. Conf. Audio, Lang. Image Process.,Jul. 2012, pp. 1141–1145, doi: 10.1109/ICALIP.2012.6376788. DOI: https://doi.org/10.1109/ICALIP.2012.6376788

R. Jahangir, Y. W. Teh, N. A. Memon, G. Mujtaba, M. Zareei, U. Ishtiaq,M. Z. Akhtar, and I. Ali, ‘‘Text-independent speaker identificationthrough feature fusion and deep neural network,’’ IEEE Access, vol. 8,pp. 32187–32202, 2020, doi: 10.1109/ACCESS.2020.2973541. DOI: https://doi.org/10.1109/ACCESS.2020.2973541

D. R. Gowda, M. H. Pereira, and V. S. Venkatesh, ‘‘Speaker recognition system using MFCC and vector quantization,’’ in Proc. 2nd Int.Conf. Signal Process., Image Process. VLSI, 2015, pp. 116–121, doi:10.3850/978-981-09-6200-5_d-55. DOI: https://doi.org/10.3850/978-981-09-6200-5_D-55

J. H. Bae, A. Abotabl, H.-P. Lin, K.-B. Song, and J. Lee, ‘‘An overviewof channel coding for 5G NR cellular communications,’’ APSIPATrans. Signal Inf. Process., vol. 8, no. 1, 2019, p. e17, doi:10.1017/atsip.2019.10. DOI: https://doi.org/10.1017/ATSIP.2019.10

H. Yao, A. Fazeli, and A. Vardy, ‘‘List decoding of Arıkan’s PAC codes,’’Entropy, vol. 23, no. 7, p. 841, Jun. 2021, doi: 10.3390/e23070841. DOI: https://doi.org/10.3390/e23070841

I. Tal and A. Vardy, ‘‘List decoding of polar codes,’’ IEEETrans. Inf. Theory, vol. 61, no. 5, pp. 2213–2226, May 2015, doi:10.1109/TIT.2015.2410251. DOI: https://doi.org/10.1109/TIT.2015.2410251

J. Gałka, M. Maęsior, and M. Salasa, ‘‘Voice authentication embeddedsolution for secured access control,’’ IEEE Trans. Consum. Electron.,vol. 60, no. 4, pp. 653–661, Nov. 2014, doi: 10.1109/TCE.2014.7027339. DOI: https://doi.org/10.1109/TCE.2014.7027339

E. Kiktova and J. Juhar, ‘‘Speaker recognition for surveillance application,’’ J. Elect. Electron. Eng., vol. 8, no. 2, pp. 19–22, 2015.

M. A. Nematollahi, M. A. Akhaee, S. A. R. Al-Haddad, andH. Gamboa-Rosales, ‘‘Semi-fragile digital speech watermarking foronline speaker recognition,’’ EURASIP J. Audio, Speech, Music Process.,vol. 2015, no. 1, pp. 1–15, Dec. 2015, doi: 10.1186/s13636-015-0074-5. DOI: https://doi.org/10.1186/s13636-015-0074-5

H. Hentilä, Y. Y. Shkel, and V. Koivunen, ‘‘Secret key generation using short blocklength polar coding over wireless channels,’’ IEEE J. Sel. Topics Signal Process., vol. 16, no. 1, pp. 144–157, Jan. 2022. DOI: https://doi.org/10.1109/JSTSP.2021.3129624

D. Harjani, M. Jethwani, and M. Roja, ‘‘Speaker recognition system using MFCC and vector quantization,’’ Int. J. Sci. Res. Develop., vol. 1, no. 9, pp. 1935–1937, 2023.

Optimizing Speaker Recognition in Complex Environments : An Enhanced Framework with Artificial Neural Networks for Multi-Speaker Settings

Authors

DOI:

Keywords:

Abstract

📊 Article Downloads

References

Published

Issue

Section

License

How to Cite

IssueDate

RightSideBlock

Latest publications