SciVQA 2026: Scientific Visual Question Answering

Scientific papers communicate information through unstructured text as well as (semi-)structured figures and tables. Jointly reasoning over both modalities benefits downstream applications such as visual question answering (VQA). SciVQA 2026 builds on the insights from SciVQA 2025 (run at SDP 2025/ACL 2025), shifting the focus toward evaluating the ability of multimodal LLMs to reason over combined modalities (figures, tables, text).

SciVQA 2026 will include a new set of papers and entirely new annotations, featuring two tasks:

  1. Context retrieval: Given a question, a paper, and its corpus of paragraphs and images, retrieve the relevant context (tables, figures, paragraphs from the main text) required to answer it.
  2. Answer generation: Given a question and the context retrieved from the first task, generate an answer.

Shared task organisers:

  • Ekaterina Borisova (DFKI)
  • Georg Rehm (DFKI)