Bridging vision and commonsense for multimodal situation recognition in pervasive systems