Modeling Multimodal Cues in a Deep Learning-based Framework for Emotion Recognition in the Wild