Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks