A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
1 month ago·Arxiv