Human robot interaction is a rapidly growing topic of interest in today’s society. The development of real time emotion recognition will further improve the relationship between humans and social robots. However, contemporary real time emotion recognition in unconstrained environments has yet to reach the accuracy levels achieved on controlled static datasets. In this work, we propose a Deep Convolutional Neural Network (CNN), pre-trained as a Stacked Convolutional Autoencoder (SCAE) in a greedy layer-wise unsupervised manner, for emotion recognition from facial expression images taken by a NAO robot. The SCAE model is trained to learn an illumination invariant down-sampled feature vector. The weights of the encoder element are then used to initialize the CNN model, which is fine-tuned for classification. We train the model on a corpus composed of gamma corrected versions of the CK+, JAFFE, FEEDTUM and KDEF datasets. The emotion recognition model produces a state-of-the-art accuracy rate of 99.14% on this corpus. We also show that the proposed training approach significantly improves the CNN’s generalisation ability by over 30% on nonuniform data collected with the NAO robot in unconstrained environments.