Using objective methods of formal analysis revealed fundamental similarity of audio signals of several types of apes and monkeys, i.e. chimpanzees, rhesus monkey, baboon, siamang (gibbon) with manifestations of emotions in human speech.
It is shown that the developed system (based on the principle of the relative crossfrequency amplitude-variable encoding) of formal parameters for assessing emotions in human speech is well applied as an experimental procedure for objective evaluation and interpretation of beeps monkeys (in accordance with the system of human emotions). This is confirmed by corresponding formal assessments with observations of animal behaviour in different situations.
The proposed anthropomorphic method of analysis of audio signals animals is based on four-dimensional spherical model of human emotions and principles of information encoding in the nervous system. The proposed model can serve as a common classification system for emotional phenomena that combines both physiological concepts of brain mechanisms of emotional control, and psychological well-known classification based on diverse experimental data. It also quantitatively explains all possible nuances and soft emotion mutual transitions, with representing each specific emotion as a linear combination of the selected basic physiological parameters. Positive agreement between the parameters of the speech signal in not only humans but also monkeys with psychophysiological parameters confirm the theoretical issues of the principles of encoding information in the nervous system and efficiency of the proposed anthropomorphic approach to the development of technical systems, in particular methods for speech signal processing. On the other hand, the coincidence of the detected pattern confirms the previously identified psychophysiological parameters, which further substantiate preference (compared with others described in the scientific papers) reveals in a classification system of emotions in terms of both the dimension and orientation of the axes in relation to the model space.
On the whole, the results suggest that the emotional regulation system is very old and preserved in humans without change throughout history and co-existing with the system of feeling expression and also with the independent speech sound system. Furthermore, it is shown that in the majority of the surveyed types of apes and monkeys (chimpanzees, rhesus monkey and baboon), the entire repertoire of sound signals is reduced to the above mentioned emotional regulation. However, we that some types of apes and monkeys, e.g. siamang (gibbon), are able to diverse their repertoire of sound signals and create additional channels of sound signals in a relatively free frequency domain, so as not to interfere with the system of signals shared with other types of apes and monkeys (and also humans). Apparently, this additional sound system is based on the same encoding principle as the general emotional system.
A new efficient method for automatic emotion recognition by speech signal based on the four-dimensional emotions spherical model and principles of information encoding in the nervous system, is described. As a result, the principle of the relative cross-frequency amplitude-variable encoding of emotions in speech signal is proposed and experimentally tested. The hypothesis on the speech being a multichannel signal (a frequency diversity) with each band having possible independent fast micro amplitude change was tested. The agreement between the selected parameters of the speech signal and the subjective perception of the same samples (short words «yes» and «no») in the system of formalized parameters of psychophysiological emotion for the four-dimensional model is shown. The obtained parameters (factors) may be characterized as bimodal spectral filters. Factor 1 has a basic value is 3000 Hz and the secondary value is 500 Hz. It determines the change in the sound signal in accordance with «character emotion» axis, and the contribution of this component as compared with other components, the more positive (better useful) ones are estimated in the utterance. Factor 2 has two extremes at frequencies that lie somewhere near 1000 and 1750 Hz. It determines the degree of information uncertainty as opposed to confidence (calm). Factor 3 characterizes affection (love). It corresponds to the most widely spaced peaks: low frequencies of about 150 Hz and high-frequencies of 3500 Hz. In yes-no dichotomy «no» is accompanied by the absence of active rejection, and «yes» is characterised as a positive assessment. Factor 4 has similar range between 600 Hz and 1500 Hz. The configuration is close to factor 2, but it is shifted with respect to the low-frequency region, getting their peaks in its local minimum. This component determines whether aggressive (active) or passive (fear, escape) reaction is provoked in the subject. The results obtained confirm the efficiency of the proposed general anthropomorphic approach to the development of technical systems, in particular, the methods of speech signal processing and data presentation. It also confirms the identity of a previously identified psychophysiological model parameters, further justifying the preference (compared with other well-known ones) of this emotions classification, both in terms of dimensions and orientation of the axes of the model space.