AI & ML interests

Indian STEM education AI: benchmarks, multimodal models, document understanding, and multilingual (EN/HI) datasets for tutoring and assessment.

Recent Activity

nalanda-dataย  updated a dataset 1 day ago
Nalandadata/NalandaJEENEETBench
nalanda-dataย  updated a dataset 1 day ago
Nalandadata/DrishtiTable
View all activity

Organization Card

Nalanda Data

Open data and models for Indian STEM education AI.

We build and publish datasets and fine-tuned models grounded in Indian academic content โ€” JEE & NEET competitive exam questions, school and college textbooks, and multimodal science material. The goal is to make Indian education AI workable for researchers, builders, and edtech companies who can't easily source this data elsewhere.

๐ŸŒ nalandadata.ai
๐Ÿ“ง info@nalandadata.ai (commercial) ยท tech@nalandadata.ai (technical)


What we publish

Datasets

Dataset Focus License
NalandaJEENEETBench JEE & NEET benchmark across Physics, Chemistry, Mathematics, Biology CC-BY-NC-4.0
DrishtiTable Table structure recognition in Indian textbooks (EN + HI) Apache-2.0
nalanda-image-qa 1,000 multimodal STEM Q&A pairs (image + text) CC-BY-4.0

Models

Model Built on Use case License
nalanda-qwen-7b-grpo Qwen 2.5 7B Instruct + GRPO JEE / NEET problem solving Apache-2.0
nalanda-image-vl Llama 3.2 11B Vision (LoRA) Multimodal STEM Q&A Llama 3.2
DrishtiTable-Qwen2.5-VL-7B Qwen 2.5 VL 7B (LoRA) Table structure recognition Apache-2.0

Demos


Who this is for

  • Researchers building or evaluating models on Indian-context STEM
  • Edtech companies training tutoring, grading, or content-generation systems
  • AI labs that need benchmarks reflecting non-Western curricula and multilingual (EN / HI) educational content

Data and licensing

The public artifacts on Hugging Face are samples of larger internal datasets. Each repo has its own license, listed in its dataset/model card.

  • Open-licensed releases (Apache-2.0, CC-BY-4.0) can be used commercially under the terms of those licenses.
  • Non-commercial releases (CC-BY-NC-4.0) require a separate commercial license for production or revenue-generating use.
  • Full-scale versions, custom slices, and licensed access to the parent corpora are available on request.

For commercial licensing, full dataset access, custom data work, or partnerships:

๐Ÿ“ง info@nalandadata.ai

For technical questions, integration help, or fine-tuning support:

๐Ÿ“ง tech@nalandadata.ai

๐ŸŒ nalandadata.ai