Wide Range Features-Based On Speech Emotion Recognition for Sensible Effective Services

V. Ramesh

doi:10.32628/CSEIT172642

Authors

V. Ramesh Assistant Professor, CSE Department, Sri Indu College of Engineering and Technology, Hyderabad, Telangana, India

Keywords:

Speech Emotion Recognition. Convolution Neural Network. Spectrogram, Rectangular Kernels.

Abstract

Speech emotion recognition from speech signals is a noteworthy analysis with many applications like sensible healthcare, autonomous voice response systems, assessing situational seriousness by caller emotive state analysis in emergency centers, and alternative sensible emotive services. During this paper, we have a tendency to present a study of speech emotion recognition supported the options extracted from spectrograms employing a wide range convolution neural network (CNN) with rectangular kernels. Typically, CNN have square shaped kernels and pooling operators at varied layers that are suited to second image information. However, just in case of spectrograms, the data is encoded in a very slightly very different manner. Time is diagrammatical on the x-axis and y-axis shows frequency of the speech signal, whereas, the amplitude is indicated by the intensity value within the spectrograph at a selected position. To research speech through spectrograms, we propose rectangular kernels of variable shapes and sizes, at the side of max pooling in rectangular neighborhoods, to extract discriminative options. The projected theme effectively learns discrimination options from speech spectrograms and performs higher than several state-of the-art techniques once evaluated its performance on emo-db and sample speech data set.

References

Abdelgawad H, Shalaby A, Abdulhai B, Gutub AAA (2014) Microscopic modelling of large-scale pedestrian-vehicle conflicts in the city of Madinah, Saudi Arabia. J Adv Transp 48:507-525
Ahmad J, Muhammad K, Kwon S-I, Baik SW, Rho S (2016) Dempster-Shafer Fusion Based Gender Recognition for Speech Analysis Applications. In: Platform Technology and Service (PlatCon), 2016 International Conference on, pp 1-4
Ahmad J, Sajjad M, Rho S, Kwon S-I, Lee MY, Baik SW (2016) Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture. Multimed Tools Appl 1-25. https://doi.org/10.1007/s11042-016-4041-7
Ahmad J, Fiaz M, Kwon S-I, Sodanil M, Vo B, Baik SW (2016) Gender Identification using MFCC for Telephone Applications-A Comparative Study. International Journal of Computer Science and Electronics Engineering 3.5 (2015):351-355
Aly SA, AlGhamdi TA, Salim M, Amin HH, Gutub AA (2014) Information Gathering Schemes For Collaborative Sensor Devices. Procedia Compute Sci 32:1141-1146
Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. In: Platform Technology and Service (PlatCon), 2017 International Conference on, pp 1-5
Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70:614
Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798-1828
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Interspeech, pp 1517-1520
Curtis S, Zafar B, Gutub A, Manocha D (2013) Right of way. Vis Compute 29:1277-1292
Deng L, Seltzer ML, Yu D, Acero A, Mohamed A-R, Hinton GE (2010) Binary coding of speech spectrograms using a deep auto-encoder. In: Interspeech, pp 1692-1695
Deng J, Zhang Z, Marchi E, Schuller B (2013) Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on, pp 511-516
Dennis J, Tran HD, Li H (2011) Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process Lett 18:130-133
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 44:572-587
Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a danish emotional speech database. In: Eurospeech
Eyben F, Wöllmer M, Schuller B (2009) OpenEAR—introducing the Munich open-source emotion and affect recognition toolkit. In: Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on, pp 1-6
France DJ, Shiavi RG, Silverman S, Silverman M, Wilkes M (2000) Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans Biomed Eng 47:829-837
Gharavian D, Sheikhan M, Nazerieh A, Garoucy S (2012) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Compute & Applic 21:2115-2126
Guo Z, Wang ZJ (2013) An unsupervised hierarchical feature learning framework for one-shot image recognition. IEEE Trans Multimedia 15:621-632
Gutub A, Alharthi N (2011) Improving Hajj and Umrah Services Utilizing Exploratory Data Visualization Techniques. Inf Vis 10:356-371
Guven E, Bock P (2010) Speech emotion recognition using a backward context. In: Applied Imagery Pattern Recognition Workshop (AIPR), 2010 I.E. 39th, pp 1-5

Wide Range Features-Based On Speech Emotion Recognition for Sensible Effective Services

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite