ovr.news

Archaeology, rediscovered knowledge, the past opening up

Models Misidentify Historical Indian Artifacts

arxiv.org · 15 May 2026
Models Misidentify Historical Indian Artifacts
Photo: arxiv.org
Read on arxiv.org

Why this is here: The TAB-VLM benchmark includes questions about 1,600 Indian cultural artifacts, ranging from prehistoric times to the modern era.

Researchers evaluated ten vision-language models on 1,600 Indian cultural artifacts from prehistoric to modern times. They identified a problem called cultural anachronism. This happens when models incorrectly interpret objects using concepts from the wrong time period.

The team created the Temporal Anachronism Benchmark for Vision-Language Models, or TAB-VLM. This benchmark uses 600 questions across six categories to test how well models reason about time. Results show even the best model, GPT-5.2, only achieved 58.7% accuracy on the benchmark.

The performance gap exists across different model types and sizes. This suggests current visual AI systems struggle with historical context, especially when dealing with visual cultures not well represented in training data. The researchers released the dataset and code for further study.

Surfaced by the Discovery lens — one of the vital signs ovr.news reads.

How we evaluated this
AI summary

read the original for the full story — Read on arxiv.org . How we work →

Why are you reporting this article?

Why are you reporting this article?