A new efficient method for automatic emotion recognition by speech signal based on the four-dimensional emotions spherical model and principles of information encoding in the nervous system, is described. As a result, the principle of the relative cross-frequency amplitude-variable encoding of emotions in speech signal is proposed and experimentally tested. The hypothesis on the speech being a multichannel signal (a frequency diversity) with each band having possible independent fast micro amplitude change was tested. The agreement between the selected parameters of the speech signal and the subjective perception of the same samples (short words «yes» and «no») in the system of formalized parameters of psychophysiological emotion for the four-dimensional model is shown. The obtained parameters (factors) may be characterized as bimodal spectral filters. Factor 1 has a basic value is 3000 Hz and the secondary value is 500 Hz. It determines the change in the sound signal in accordance with «character emotion» axis, and the contribution of this component as compared with other components, the more positive (better useful) ones are estimated in the utterance. Factor 2 has two extremes at frequencies that lie somewhere near 1000 and 1750 Hz. It determines the degree of information uncertainty as opposed to confidence (calm). Factor 3 characterizes affection (love). It corresponds to the most widely spaced peaks: low frequencies of about 150 Hz and high-frequencies of 3500 Hz. In yes-no dichotomy «no» is accompanied by the absence of active rejection, and «yes» is characterised as a positive assessment. Factor 4 has similar range between 600 Hz and 1500 Hz. The configuration is close to factor 2, but it is shifted with respect to the low-frequency region, getting their peaks in its local minimum. This component determines whether aggressive (active) or passive (fear, escape) reaction is provoked in the subject. The results obtained confirm the efficiency of the proposed general anthropomorphic approach to the development of technical systems, in particular, the methods of speech signal processing and data presentation. It also confirms the identity of a previously identified psychophysiological model parameters, further justifying the preference (compared with other well-known ones) of this emotions classification, both in terms of dimensions and orientation of the axes of the model space.
Vartanov A.V. (2013). Anthropomorphic method of emotion recognition in sounding speech. National Psychological Journal, 2(10), 69-79