Abstract: Video Question Answering (VideoQA) requires models to comprehend video content and generate answers to natural language questions. VideoQA must reason over both spatial and temporal ...