Multimodal Attention Networks for Low-Level Vision-and-Language Navigation