Modal Activity Top Class 982 Question Answer

A Cross-Modal Spatio-Temporal Interaction Network for Video Question Answering

Abstract: Video Question Answering (VideoQA) requires models to comprehend video content and generate answers to natural language questions. VideoQA must reason over both spatial and temporal ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

A Cross-Modal Spatio-Temporal Interaction Network for Video Question Answering

Trending now