Optimizing Speaker Recognition in Complex Environments : An Enhanced Framework with Artificial Neural Networks for Multi-Speaker Settings
DOI:
https://doi.org/10.32628/CSEIT24103116Keywords:
Speaker Recognition System, Convolutional Neural Networks, Mel Frequency Cepstral Coefficients, K Nearest Neighbor, Feature ExtractionAbstract
This study focuses on the development of an advanced speaker recognition system utilizing Convolutional Neural Networks (CNN) in conjunction with Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and K Nearest Neighbor (KNN) for classification. The proposed system aims to improve accuracy by refining the fine-tuning layer within the CNN architecture. By leveraging the unique characteristics of human voice as a biometric identifier, the system extracts voice data features using MFCC, then employs CNN with triplet loss to generate 128-dimensional embeddings. These embeddings are subsequently classified using the KNN method. The system's performance was evaluated using 50 speakers from the TIMIT dataset and 60 speakers from live recordings made with a smartphone, demonstrating high accuracy. This study highlights the potential of combining CNN and MFCC for robust speaker recognition and suggests that future research could further enhance recognition accuracy by integrating multimodal biometric systems, which combine different types of biometric data for more comprehensive identification.
Downloads
References
R. Ryu, S. Yeom, S.-H. Kim, and D. Herbert, ‘‘Continuous multimodal biometric authentication schemes: A systematic review,’’ IEEEAccess, vol. 9, pp. 34541–34557, 2021, doi: 10.1109/ACCESS.2021.3061589. DOI: https://doi.org/10.1109/ACCESS.2021.3061589
M. M. Kabir, M. F. Mridha, J. Shin, I. Jahan, and A. Q. Ohi, ‘‘A survey of speaker recognition: Fundamental theories, recognition methodsand opportunities,’’ IEEE Access, vol. 9, pp. 79236–79263, 2021, doi:10.1109/ACCESS.2021.3084299. DOI: https://doi.org/10.1109/ACCESS.2021.3084299
Veridium Enterprise. (2019). How Your Biometric Data is Different FromYour Password—Veridium. [Online]. Available: https://veridiumid.com/case-studies/?_ga=2.182866598.1698325735.1656058419-1850179022.1656058419
C. Burt. (2019). More Than 4 in 5 Americans Support AirportBiometrics, Unisys Survey Shows—Biometric Update. [Online]. Available:https://www.biometricupdate.com/201906/more-than-4-in-5-americanssupport-airport-biometrics-unisys-survey-shows
X. Mu and C.-H. Min, ‘‘MFCC as features for speaker classification using machine learning,’’ in Proc. IEEE World AI IoTCongr. (AIIoT), Seattle, WA, USA, Jun. 2023, pp. 566–570, doi:10.1109/AIIoT58121.2023.10174566. DOI: https://doi.org/10.1109/AIIoT58121.2023.10174566
A. Sedik, L. Tawalbeh, M. Hammad, A. A. A. El-Latif, G. M. El-Banby,A. A. M. Khalaf, F. E. A. El-Samie, and A. M. Iliyasu, ‘‘Deep learning modalities for biometric alteration detection in 5G networks-basedsecure smart cities,’’ IEEE Access, vol. 9, pp. 94780–94788, 2021, doi:10.1109/ACCESS.2021.3088341. DOI: https://doi.org/10.1109/ACCESS.2021.3088341
H. Mandalapu, P. N. A. Reddy, R. Ramachandra, K. S. Rao, P. Mitra,S. R. M. Prasanna, and C. Busch, ‘‘Audio-visual biometric recognitionand presentation attack detection: A comprehensive survey,’’ IEEE Access,vol. 9, pp. 37431–37455, 2021, doi: 10.1109/ACCESS.2021.3063031. DOI: https://doi.org/10.1109/ACCESS.2021.3063031
R. Anand, J. Singh, V. Jains, and S. Rathore, ‘‘Biometrics security technology with speaker recognition,’’ Int. J. Adv. Res. Comput. Eng. Technol.,vol. 1, no. 10, pp. 232–236, 2012.
J. Lee. (2017). Bank of America to Pilot Samsung Iris Recognition forMobile Banking—Biometric Update. [Online]. Available: https://www.biometricupdate.com/201708/bank-of-america-to-pilot-samsung-irisrecognition-for-mobile-banking
L. Wu, J. Yang, M. Zhou, Y. Chen, and Q. Wang, ‘‘LVID: Amultimodal biometrics authentication system on smartphones,’’ IEEETrans. Inf. Forensics Security, vol. 15, pp. 1572–1585, 2020, doi:10.1109/TIFS.2019.2944058. DOI: https://doi.org/10.1109/TIFS.2019.2944058
P. H. Lee, L.-J. Chu, Y.-P. Hung, S.-W. Shih, C.-S. Chen, and H.-M. Wang,‘‘Cascading multimodal verification using face, voice and iris information,’’ in Proc. IEEE Multimedia Expo Int. Conf., Jul. 2007, pp. 847–850,doi: 10.1109/ICME.2007.4284783. DOI: https://doi.org/10.1109/ICME.2007.4284783
T. M. Alsultan, A. A. Salam, K. A. Alissa, and N. A. Saqib, ‘‘A comparative study of biometric authentication in cloud computing,’’ in Proc.Int. Symp. Netw., Comput. Commun. (ISNCC), Jun. 2019, pp. 1–6, doi:10.1109/ISNCC.2019.8909117. DOI: https://doi.org/10.1109/ISNCC.2019.8909117
K. Arora, J. Singh, and Y. S. Randhawa, ‘‘A survey on channel codingtechniques for 5G wireless networks,’’ Telecommun. Syst., vol. 73, no. 4,pp. 637–663, Apr. 2020, doi: 10.1007/s11235-019-00630-3. DOI: https://doi.org/10.1007/s11235-019-00630-3
M. C. Chiu, ‘‘Analysis and design of polar-coded modulation,’’ IEEETrans. Commun., vol. 70, no. 3, pp. 1508–1521, Mar. 2022, doi:10.1109/TCOMM.2022.3142280. DOI: https://doi.org/10.1109/TCOMM.2022.3142280
J. Wang, A. Ji, and M. T. Johnson, ‘‘Features for phoneme independentspeaker identification,’’ in Proc. Int. Conf. Audio, Lang. Image Process.,Jul. 2012, pp. 1141–1145, doi: 10.1109/ICALIP.2012.6376788. DOI: https://doi.org/10.1109/ICALIP.2012.6376788
R. Jahangir, Y. W. Teh, N. A. Memon, G. Mujtaba, M. Zareei, U. Ishtiaq,M. Z. Akhtar, and I. Ali, ‘‘Text-independent speaker identificationthrough feature fusion and deep neural network,’’ IEEE Access, vol. 8,pp. 32187–32202, 2020, doi: 10.1109/ACCESS.2020.2973541. DOI: https://doi.org/10.1109/ACCESS.2020.2973541
D. R. Gowda, M. H. Pereira, and V. S. Venkatesh, ‘‘Speaker recognition system using MFCC and vector quantization,’’ in Proc. 2nd Int.Conf. Signal Process., Image Process. VLSI, 2015, pp. 116–121, doi:10.3850/978-981-09-6200-5_d-55. DOI: https://doi.org/10.3850/978-981-09-6200-5_D-55
J. H. Bae, A. Abotabl, H.-P. Lin, K.-B. Song, and J. Lee, ‘‘An overviewof channel coding for 5G NR cellular communications,’’ APSIPATrans. Signal Inf. Process., vol. 8, no. 1, 2019, p. e17, doi:10.1017/atsip.2019.10. DOI: https://doi.org/10.1017/ATSIP.2019.10
H. Yao, A. Fazeli, and A. Vardy, ‘‘List decoding of Arıkan’s PAC codes,’’Entropy, vol. 23, no. 7, p. 841, Jun. 2021, doi: 10.3390/e23070841. DOI: https://doi.org/10.3390/e23070841
I. Tal and A. Vardy, ‘‘List decoding of polar codes,’’ IEEETrans. Inf. Theory, vol. 61, no. 5, pp. 2213–2226, May 2015, doi:10.1109/TIT.2015.2410251. DOI: https://doi.org/10.1109/TIT.2015.2410251
J. Gałka, M. Maęsior, and M. Salasa, ‘‘Voice authentication embeddedsolution for secured access control,’’ IEEE Trans. Consum. Electron.,vol. 60, no. 4, pp. 653–661, Nov. 2014, doi: 10.1109/TCE.2014.7027339. DOI: https://doi.org/10.1109/TCE.2014.7027339
E. Kiktova and J. Juhar, ‘‘Speaker recognition for surveillance application,’’ J. Elect. Electron. Eng., vol. 8, no. 2, pp. 19–22, 2015.
M. A. Nematollahi, M. A. Akhaee, S. A. R. Al-Haddad, andH. Gamboa-Rosales, ‘‘Semi-fragile digital speech watermarking foronline speaker recognition,’’ EURASIP J. Audio, Speech, Music Process.,vol. 2015, no. 1, pp. 1–15, Dec. 2015, doi: 10.1186/s13636-015-0074-5. DOI: https://doi.org/10.1186/s13636-015-0074-5
H. Hentilä, Y. Y. Shkel, and V. Koivunen, ‘‘Secret key generation using short blocklength polar coding over wireless channels,’’ IEEE J. Sel. Topics Signal Process., vol. 16, no. 1, pp. 144–157, Jan. 2022. DOI: https://doi.org/10.1109/JSTSP.2021.3129624
D. Harjani, M. Jethwani, and M. Roja, ‘‘Speaker recognition system using MFCC and vector quantization,’’ Int. J. Sci. Res. Develop., vol. 1, no. 9, pp. 1935–1937, 2023.
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Scientific Research in Computer Science, Engineering and Information Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.