TY - GEN
T1 - Deep learning for real time facial expression recognition in social robots
AU - Ruiz-Garcia, Ariel
AU - Webb, Nicola
AU - Palade, Vasile
AU - Eastwood, Mark
AU - Elshaw, Mark
PY - 2018/11/17
Y1 - 2018/11/17
N2 - Human robot interaction is a rapidly growing topic of interest in today’s society. The development of real time emotion recognition will further improve the relationship between humans and social robots. However, contemporary real time emotion recognition in unconstrained environments has yet to reach the accuracy levels achieved on controlled static datasets. In this work, we propose a Deep Convolutional Neural Network (CNN), pre-trained as a Stacked Convolutional Autoencoder (SCAE) in a greedy layer-wise unsupervised manner, for emotion recognition from facial expression images taken by a NAO robot. The SCAE model is trained to learn an illumination invariant down-sampled feature vector. The weights of the encoder element are then used to initialize the CNN model, which is fine-tuned for classification. We train the model on a corpus composed of gamma corrected versions of the CK+, JAFFE, FEEDTUM and KDEF datasets. The emotion recognition model produces a state-of-the-art accuracy rate of 99.14% on this corpus. We also show that the proposed training approach significantly improves the CNN’s generalisation ability by over 30% on nonuniform data collected with the NAO robot in unconstrained environments.
AB - Human robot interaction is a rapidly growing topic of interest in today’s society. The development of real time emotion recognition will further improve the relationship between humans and social robots. However, contemporary real time emotion recognition in unconstrained environments has yet to reach the accuracy levels achieved on controlled static datasets. In this work, we propose a Deep Convolutional Neural Network (CNN), pre-trained as a Stacked Convolutional Autoencoder (SCAE) in a greedy layer-wise unsupervised manner, for emotion recognition from facial expression images taken by a NAO robot. The SCAE model is trained to learn an illumination invariant down-sampled feature vector. The weights of the encoder element are then used to initialize the CNN model, which is fine-tuned for classification. We train the model on a corpus composed of gamma corrected versions of the CK+, JAFFE, FEEDTUM and KDEF datasets. The emotion recognition model produces a state-of-the-art accuracy rate of 99.14% on this corpus. We also show that the proposed training approach significantly improves the CNN’s generalisation ability by over 30% on nonuniform data collected with the NAO robot in unconstrained environments.
KW - Deep convolutional neural networks
KW - Emotion recognition
KW - Greedy layer-wise training
KW - Social robots
KW - Stacked convolutional autoencoders
UR - http://www.scopus.com/inward/record.url?scp=85059051112&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-04221-9_35
DO - 10.1007/978-3-030-04221-9_35
M3 - Conference proceeding
AN - SCOPUS:85059051112
SN - 9783030042202
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 392
EP - 402
BT - Neural Information Processing - 25th International Conference, ICONIP 2018, Proceedings
PB - Springer-Verlag London Ltd
T2 - 25th International Conference on Neural Information Processing, ICONIP 2018
Y2 - 13 December 2018 through 16 December 2018
ER -