Wide Range Features-Based On Speech Emotion Recognition for Sensible Effective Services

Authors(1) :-V. Ramesh

Speech emotion recognition from speech signals is a noteworthy analysis with many applications like sensible healthcare, autonomous voice response systems, assessing situational seriousness by caller emotive state analysis in emergency centers, and alternative sensible emotive services. During this paper, we have a tendency to present a study of speech emotion recognition supported the options extracted from spectrograms employing a wide range convolution neural network (CNN) with rectangular kernels. Typically, CNN have square shaped kernels and pooling operators at varied layers that are suited to second image information. However, just in case of spectrograms, the data is encoded in a very slightly very different manner. Time is diagrammatical on the x-axis and y-axis shows frequency of the speech signal, whereas, the amplitude is indicated by the intensity value within the spectrograph at a selected position. To research speech through spectrograms, we propose rectangular kernels of variable shapes and sizes, at the side of max pooling in rectangular neighborhoods, to extract discriminative options. The projected theme effectively learns discrimination options from speech spectrograms and performs higher than several state-of the-art techniques once evaluated its performance on emo-db and sample speech data set.

Authors and Affiliations

V. Ramesh
Assistant Professor, CSE Department, Sri Indu College of Engineering and Technology, Hyderabad, Telangana, India

Speech Emotion Recognition. Convolution Neural Network. Spectrogram, Rectangular Kernels.

  1. Abdelgawad H, Shalaby A, Abdulhai B, Gutub AAA (2014) Microscopic modelling of large-scale pedestrian-vehicle conflicts in the city of Madinah, Saudi Arabia. J Adv Transp 48:507-525
  2. Ahmad J, Muhammad K, Kwon S-I, Baik SW, Rho S (2016) Dempster-Shafer Fusion Based Gender Recognition for Speech Analysis Applications. In: Platform Technology and Service (PlatCon), 2016 International Conference on, pp 1-4
  3. Ahmad J, Sajjad M, Rho S, Kwon S-I, Lee MY, Baik SW (2016) Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture. Multimed Tools Appl 1-25. https://doi.org/10.1007/s11042-016-4041-7
  4. Ahmad J, Fiaz M, Kwon S-I, Sodanil M, Vo B, Baik SW (2016) Gender Identification using MFCC for Telephone Applications-A Comparative Study. International Journal of Computer Science and Electronics Engineering 3.5 (2015):351-355
  5. Aly SA, AlGhamdi TA, Salim M, Amin HH, Gutub AA (2014) Information Gathering Schemes For Collaborative Sensor Devices. Procedia Compute Sci 32:1141-1146
  6. Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. In: Platform Technology and Service (PlatCon), 2017 International Conference on, pp 1-5
  7. Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70:614
  8. Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798-1828
  9. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Interspeech, pp 1517-1520
  10. Curtis S, Zafar B, Gutub A, Manocha D (2013) Right of way. Vis Compute 29:1277-1292
  11. Deng L, Seltzer ML, Yu D, Acero A, Mohamed A-R, Hinton GE (2010) Binary coding of speech spectrograms using a deep auto-encoder. In: Interspeech, pp 1692-1695
  12. Deng J, Zhang Z, Marchi E, Schuller B (2013) Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on, pp 511-516
  13. Dennis J, Tran HD, Li H (2011) Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process Lett 18:130-133
  14. El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 44:572-587
  15. Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a danish emotional speech database. In: Eurospeech
  16. Eyben F, Wöllmer M, Schuller B (2009) OpenEAR—introducing the Munich open-source emotion and affect recognition toolkit. In: Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on, pp 1-6
  17. France DJ, Shiavi RG, Silverman S, Silverman M, Wilkes M (2000) Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans Biomed Eng 47:829-837
  18. Gharavian D, Sheikhan M, Nazerieh A, Garoucy S (2012) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Compute & Applic 21:2115-2126
  19. Guo Z, Wang ZJ (2013) An unsupervised hierarchical feature learning framework for one-shot image recognition. IEEE Trans Multimedia 15:621-632
  20. Gutub A, Alharthi N (2011) Improving Hajj and Umrah Services Utilizing Exploratory Data Visualization Techniques. Inf Vis 10:356-371
  21. Guven E, Bock P (2010) Speech emotion recognition using a backward context. In: Applied Imagery Pattern Recognition Workshop (AIPR), 2010 I.E. 39th, pp 1-5

Publication Details

Published in : Volume 2 | Issue 6 | November-December 2017
Date of Publication : 2017-12-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 298-304
Manuscript Number : CSEIT172642
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

V. Ramesh, "Wide Range Features-Based On Speech Emotion Recognition for Sensible Effective Services", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 6, pp.298-304, November-December-2017. |          | BibTeX | RIS | CSV

Article Preview