Why AI Still Struggles to Understand Your Culture: A New Benchmark Reveals the Gap
A new study reveals that large language models and vision-language models struggle significantly with understanding traditional dance and cultural heritage across different languages and regions. Researchers at multiple institutions developed NRITYAM, the largest multilingual and multicultural benchmark dedicated to evaluating how well AI systems comprehend global dance traditions. The dataset includes 9,260 carefully curated questions spanning 12 languages and covering traditional dances from 12 countries across five continents.
What Is NRITYAM and Why Does It Matter?
NRITYAM, a Sanskrit term with deep significance in classical Indian dance, represents a comprehensive effort to address a critical blind spot in AI development. While language models have become essential tools for modern workflows, their global effectiveness depends on understanding local socio-cultural contexts. The benchmark was developed through close collaboration with native dance artists and native speakers who authored and validated culturally relevant questions specific to their regions.
The dataset spans two modalities, text-based and image-based questions, and is systematically organized into three key categories. This structure allows researchers to evaluate how well different types of AI models reason about dance traditions from multiple angles, including historical context, performance rules, and real-world scenarios.
Which AI Models Were Tested and How Did They Perform?
The research team evaluated 13 state-of-the-art language models, including large language models (LLMs), small language models (SLMs) with 7 billion parameters or fewer, multimodal large language models (MLMs), and small multimodal language models (SMLMs). Multimodal models are AI systems that can process both text and images, making them particularly relevant for evaluating cultural understanding through visual question answering.
The evaluation uncovered critical gaps in how these models understand and reason about traditional dance. Most models performed poorly when asked to demonstrate cultural knowledge about heritage dance forms, particularly when questions required understanding the historical significance, regional variations, or performance rules specific to particular traditions. This finding is significant because it highlights how current AI systems are predominantly trained on mainstream global culture and popular dance styles like hip-hop, often neglecting heritage traditions and culturally distinctive practices.
How to Improve AI's Cultural Understanding: Key Strategies
- Multilingual Training Data: AI models need to be trained on culturally relevant content in native languages, not just English translations, to capture nuanced meanings and regional variations in how traditions are understood and described.
- Collaboration With Cultural Experts: Involving native artists, speakers, and cultural practitioners in dataset creation ensures that questions and answers reflect authentic knowledge rather than stereotypical or mainstream interpretations of traditions.
- Diverse Question Types: Benchmarks should include history-based questions about origins and evolution, rule-based questions about performance techniques, and scenario-based questions that test real-world reasoning about how traditions are practiced and adapted.
- Visual and Textual Evaluation: Testing both text-based and image-based understanding helps identify whether models can recognize cultural elements visually and connect them to their historical and social significance.
Why Does This Matter for AI Development?
The NRITYAM benchmark addresses a persistent challenge in AI: ensuring that models effectively recognize and reason about diverse linguistic and cultural contexts, particularly in underrepresented domains. Traditional and indigenous dance forms are deeply embedded in local histories, societal values, and cultural identities. When AI systems fail to understand these traditions, they risk perpetuating inaccuracies, stereotypes, and the marginalization of underrepresented communities.
Conversely, models capable of understanding and respecting cultural nuances can enhance performance while promoting greater inclusivity and equity in AI applications. This is especially important as AI systems increasingly influence education, governance, entertainment, and cultural preservation efforts worldwide. An AI system that misunderstands or dismisses a cultural tradition could undermine efforts to preserve and celebrate heritage practices.
The research team noted that existing dance-related benchmarks were largely monolingual and English-centric, focused primarily on motion recognition rather than cultural reasoning. NRITYAM represents the first comprehensive benchmark that captures the rich cultural nuances of traditional dance reasoning across multiple languages, diverse cultural contexts, and visual question answering.
What's Next for Cultural AI Benchmarking?
The NRITYAM dataset covers traditional dances originating from 12 countries across five continents, which are now performed in over 100 countries spanning six continents. This global scope reflects the reality that cultural traditions are not static; they evolve through intercultural exchange and regional reinterpretation. For example, Bharatanatyam, a classical Indian dance rooted in Tamil Nadu, has been adapted in Sri Lanka through a distinctive fusion with Kandyan dance aesthetics, reflecting localized narratives and performance traditions.
The researchers have open-sourced the NRITYAM dataset to enable the broader AI research community to develop and test improved models. By establishing this benchmark, the team hopes to advance the intersection of natural language processing and culturally rich domains, contributing to greater inclusivity and equity in AI applications worldwide. The work demonstrates that building truly global AI systems requires more than scaling up training data; it requires intentional collaboration with communities whose cultures are being represented.