Developing a speech emotion recognition solution using ensemble learning for children with autism spectrum disorder to help identify human emotions
MetadataShow full metadata
In this thesis work, a robust speech emotion recognition system has been developed to be used by children with autism spectrum disorder (ASD). Children with ASD have difficulty identifying human emotions during social interactions, and the goal of this work was to develop a tool that could be used by these children to better understand the emotions of people around them. The speech emotion recognition solution was created using machine learning and deep learning techniques. A novel approach was taken, which involves joining multiple machine learning algorithms using ensemble learning to classify speech recordings in real-time. A support vector machine (SVM), a multilayer perceptron (MLP), and a recurrent neural network model were trained on the Ryerson Audio-Visual Database of Emotional Speech and Songs (RAVDESS), the Toronto Emotional Speech Set (TESS), the Crowd-sourced Emotional Multimodal Actors Dataset (CREMA-D), and a custom dataset which contains utterances from the three datasets with added background noise. Two separate audio feature sets were used, and their performances were compared. One of them was a custom feature set created specifically for this study and the other contained features from a popular speech emotion feature set. Furthermore, once the speech emotion recognizer was developed, it was joined with a facial expression recognition model to create a robust, multimodal emotion recognition system. The purpose was to get more accurate predictions of emotions by processing data from the audio and video mode.