ViGIL Spotlight Talk @ NeurIPS 2019

Abstract

This talk will be introducing our recent paper, VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering. Here, we investigate the feasibility of EQA -type tasks by building a novel benchmark, which contains pairs of questions and videos generated in the House3D environment. While removing the navigation and action selection requirements from EQA , we increase the difficulty of the visual reasoning component via a much larger question space, tackling the sort of complex reasoning questions that make QA tasks challenging. By designing and evaluating several VQA -style models on the dataset, we establish a novel way of evaluating EQA feasibility given existing methods, while highlighting the difficulty of the problem even in the most ideal setting.

Date
Dec 13, 2019 10:30 AM
Location
Vancouver Convention Center
1055 Canada Pl, Vancouver, BC, V6C 0C3, Canada
Avatar
Dr Cătălina Cangea
Senior Research Scientist

Senior Research Scientist at Google DeepMind, with a PhD in ML from the University of Cambridge, and inhaler of music :) Focus on generative music models, finding signals in data and human evaluation. Motivated by contributing ML-based knowledge and improvements to real-world systems!