Instructions to use AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-3-270m-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA", max_seq_length=2048, )
- IndicRxNorm Gemma 3 270M LoRA
- Intended use
- Why this model exists
- Model summary
- Base model attribution
- Dataset attribution
- Dataset composition
- Credits
- RxNorm / RxNav attribution
- What this model does
- Safety note
- Training setup
- Evaluation summary
- Installation
- Quick start: tested Unsloth loading path
- Quick start: Transformers + PEFT
- Compare base model vs fine-tuned adapter
- Example prompts
- Example outputs
- Recommended production architecture
- Optional RxNav validation example
- Loading the training dataset
- Kaggle Community Benchmark
- More Suggested evaluation procedure
- Known limitations
- Out-of-scope use
- Suggested use in STT/TTS pipelines
- Files expected in this adapter repository
- Reproducibility notes
- Citation
- Acknowledgements
- License and terms
- NOTICE
- Intended use
IndicRxNorm Gemma 3 270M LoRA
IndicRxNorm-Gemma3-270M-LoRA is a PEFT/LoRA adapter fine-tuned from google/gemma-3-270m-it / unsloth/gemma-3-270m-it for multilingual Indic medicine terminology normalization.
The adapter is designed to convert Hindi, Bengali, Hinglish, and Banglish medicine mentions into structured RxNorm-style JSON candidates. It is intended for medicine terminology workflows, especially after STT/ASR transcription and before TTS or downstream structured clinical terminology processing.
This adapter is not a diagnosis, prescription, dosage, disease-treatment, or clinical decision model.
Repository:
AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA
Dataset:
AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K
Base model:
google/gemma-3-270m-it
Intended use
This adapter is intended for:
- Hindi, Bengali, Hinglish, and Banglish medicine mention normalization
- RxNorm-style structured JSON generation
- RxCUI candidate extraction
- medicine-name NER
- RxNorm entity-linking style outputs
- drug-field extraction
- ingredient, strength, dose-form, and terminology-field extraction
- terminology-only safety-boundary behavior
- STT/TTS pipeline normalization after speech transcription
- low-resource Indic clinical NLP experimentation
- edge-scale structured extraction with a compact model
This model is not intended for:
- diagnosis
- prescription generation
- dosage recommendation
- disease-treatment advice
- emergency triage
- clinical decision-making
- replacing a clinician or pharmacist
- autonomous medication decision-making
- inferring ICD-10-CM disease labels from RxNorm alone
Why this model exists
Many medicine-name normalization systems assume clean English text and enough compute to run larger medical language models. Real-world low-resource settings often have two constraints at the same time:
Low-resource language setting
Hindi, Bengali, Hinglish, Banglish, transliterated medicine names, and code-mixed user text are underrepresented in standard clinical NLP resources.Low-resource compute setting
Deployment targets may include small clinics, mobile workflows, local/offline assistants, privacy-sensitive environments, or STT/TTS pipelines where a compact edge model is preferable.
This adapter explores whether a very small model, Gemma 3 270M, can learn structured medicine terminology normalization when fine-tuned on a curated multilingual RxNorm-style instruction dataset.
Model summary
| Field | Value |
|---|---|
| Model repo | AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA |
| Adapter type | PEFT LoRA adapter |
| Base model | google/gemma-3-270m-it |
| Training base used | unsloth/gemma-3-270m-it |
| Training backend | Unsloth + Transformers + TRL + PEFT |
| Dataset | AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K |
| Primary dataset config | multilingual_rxnorm_normalization |
| Languages/styles | Hindi, Bengali, Hinglish, Banglish |
| Primary output | Structured JSON |
| Primary task | RxNorm-style medicine terminology normalization |
| License | gemma, subject to Google Gemma Terms of Use |
| Dataset license | cc-by-nc-4.0; see dataset card |
| Safety scope | Terminology normalization only |
Base model attribution
This adapter is fine-tuned from Google Gemma 3 270M instruction-tuned model behavior.
- Base model:
google/gemma-3-270m-it - Base model page: https://huggingface.co/google/gemma-3-270m-it
- Gemma authors: Google DeepMind / Gemma Team
- Gemma Terms of Use: https://ai.google.dev/gemma/terms
- Gemma technical report citation is included below.
Google’s Gemma 3 model card describes Gemma as a family of lightweight open models from Google. The Gemma 3 270M and 1B models support text input/output and a 32K-token context window. The base model requires users to accept Google’s Gemma usage license on Hugging Face before accessing model files.
This repository contains the LoRA adapter, not a full standalone merged base model.
Dataset attribution
Fine-tuned using:
AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K
Dataset page:
https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K
The dataset card describes IndicRxNorm-LexMap-15K as a multilingual Indic medicine terminology instruction dataset for:
- medicine-name understanding
- RxNorm normalization
- RxCUI entity linking
- structured drug-field extraction
- safe non-prescriptive clinical terminology tasks
The dataset includes two configurations:
| Config | Role |
|---|---|
multilingual_rxnorm_normalization |
Primary adapted dataset |
curated_base |
Original curated base dataset |
The primary adapted dataset file is:
multilingual_rxnorm_normalization.jsonl
The curated base dataset file is:
adaptive_upload_indicrxnorm_lexmap_15k.jsonl
The adapted dataset was created from the curated base dataset through the AXONVERTEX AI Research / Adaption / Adaptive Data curation and refinement workflow. The adaptation preserved structured RxNorm/RxCUI terminology facts while improving instruction clarity, formatting, and safety constraints.
Dataset composition
Primary adapted dataset: multilingual_rxnorm_normalization
| Metric | Value |
|---|---|
| Rows | 14,910 |
| JSON parse errors | 0 |
| Language/script styles | 4 |
| Task types | 6 |
Language/style distribution:
| Language/style | Rows |
|---|---|
| Banglish | 3,736 |
| Hindi | 3,727 |
| Hinglish | 3,725 |
| Bengali | 3,722 |
Task distribution:
| Task type | Rows |
|---|---|
safety_boundary_refusal |
2,494 |
terminology_summary |
2,490 |
drug_field_extraction |
2,485 |
medicine_ner |
2,482 |
rxnorm_entity_linking |
2,481 |
rxnorm_normalization |
2,478 |
Primary schema:
{
"prompt": "original prompt",
"completion": "original JSON completion",
"enhanced_prompt": "adapted prompt",
"enhanced_completion": "adapted JSON completion",
"context": "compact context metadata",
"id": "unique row id",
"language": "Hindi | Bengali | Hinglish | Banglish",
"language_code": "hin_Deva | ben_Beng | hi_Latn | bn_Latn",
"task_type": "medicine_ner | rxnorm_normalization | drug_field_extraction | rxnorm_entity_linking | terminology_summary | safety_boundary_refusal"
}
Credits
RxNorm / RxNav attribution
RxNorm is provided by the U.S. National Library of Medicine.
RxNorm provides normalized names and unique identifiers for medicines and drugs. It links drug names to vocabularies commonly used in pharmacy management and drug-interaction software.
References:
- RxNorm overview: https://www.nlm.nih.gov/research/umls/rxnorm/index.html
- RxNorm purpose: https://www.nlm.nih.gov/research/umls/rxnorm/overview.html
- RxNav: https://lhncbc.nlm.nih.gov/RxNav/
- RxNav APIs: https://lhncbc.nlm.nih.gov/RxNav/APIs/
- RxNorm APIs: https://lhncbc.nlm.nih.gov/RxNav/APIs/RxNormAPIs.html
Important: this adapter generates RxNorm-style candidates. Production systems should verify generated RxCUIs against RxNorm/RxNav.
What this model does
The adapter is trained for tasks such as:
- medicine-name named entity recognition
- RxNorm-style normalization
- RxCUI candidate extraction
- RxNorm entity linking
- drug-field extraction
- ingredient extraction
- strength extraction
- dose-form extraction
- safe terminology summaries
- safe refusal for diagnosis, dosage, prescription, or disease-treatment prompts
The intended high-level workflow is:
Speech / text input
↓
STT or user text
↓
IndicRxNorm-Gemma3-270M-LoRA
↓
Structured medicine terminology JSON
↓
Optional RxNav / RxNorm verification
↓
Downstream app logic or TTS response
Safety note
This adapter generates terminology-normalization candidates.
It should not be used as the final authority for:
- RxCUI values
- diagnosis
- prescription
- dosage
- disease indication
- contraindication
- treatment recommendation
- drug safety decisions
- clinical decision support
For production use, verify model-generated RxCUIs through RxNorm/RxNav or another authoritative terminology service.
Training setup
Training was performed with LoRA/PEFT using Unsloth.
| Setting | Value |
|---|---|
| Base model | unsloth/gemma-3-270m-it / google/gemma-3-270m-it |
| Fine-tuning method | LoRA / PEFT |
| Training backend | Unsloth |
| Train rows | 12,971 |
| Validation rows | 746 |
| Epochs | 2 |
| Total steps | 1,622 |
| Max sequence length | 2048 |
| Per-device batch size | 8 |
| Gradient accumulation | 2 |
| Effective batch size | 16 |
| Optimizer | adamw_8bit |
| Precision | BF16 |
| Hardware | NVIDIA A100 80GB |
| Trainable parameters | 3,796,992 / 271,895,168 |
| Trainable percentage | 1.40% |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.0 |
Target modules:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training loss and validation loss:
| Step | Training loss | Validation loss |
|---|---|---|
| 250 | 0.934334 | 0.943133 |
| 500 | 0.793007 | 0.797340 |
| 750 | 0.774033 | 0.727852 |
| 1000 | 0.621535 | 0.687793 |
| 1250 | 0.647093 | 0.659469 |
| 1500 | 0.700044 | 0.641339 |
| 1622 | 0.636263 | 0.637486 |
The validation loss continued improving through the end of training, so the final adapter checkpoint was used.
Evaluation summary
Evaluation used a held-out RxCUI-grouped test sample of 180 rows. Grouping by RxCUI helps reduce leakage across language variants and task variants for the same medicine concept.
| Model | JSON parse rate | RxCUI exact-match rate |
|---|---|---|
| Base Gemma 3 270M | 7.22% | 2.33% |
| Fine-tuned LoRA adapter | 71.67% | 74.42% |
Raw evaluation numbers:
{
"base_model": {
"label": "base_model",
"split": "test",
"rows": 180,
"json_parse_rate": 0.07222222222222222,
"rxcui_exact_match_rate": 0.023255813953488372,
"rxcui_possible_rows": 43,
"elapsed_seconds": 2064.0168437957764,
"seconds_per_row": 11.466760243309869
},
"adapter_model": {
"label": "adapter_model",
"split": "test",
"rows": 180,
"json_parse_rate": 0.7166666666666667,
"rxcui_exact_match_rate": 0.7441860465116279,
"rxcui_possible_rows": 43,
"elapsed_seconds": 5934.368983030319,
"seconds_per_row": 32.96871657239066
}
}
Interpretation:
- The adapter substantially improves structured JSON generation.
- The adapter substantially improves RxCUI candidate matching on the held-out sample.
- RxCUI outputs should still be verified against RxNorm/RxNav before production use.
Installation
Basic dependencies:
pip install -U transformers peft accelerate safetensors sentencepiece
Recommended Unsloth path:
pip install -U unsloth
Optional Hugging Face login if needed:
huggingface-cli login
You may need to accept the Gemma terms on the base model page before loading the base model from Hugging Face:
https://huggingface.co/google/gemma-3-270m-it
Quick start: tested Unsloth loading path
This is the recommended loading path for this adapter.
import torch
from peft import PeftModel
from unsloth import FastModel
BASE_MODEL_ID = "unsloth/gemma-3-270m-it"
ADAPTER_ID = "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA"
MAX_SEQ_LENGTH = 2048
MAX_NEW_TOKENS = 192
model, tokenizer = FastModel.from_pretrained(
model_name=BASE_MODEL_ID,
max_seq_length=MAX_SEQ_LENGTH,
load_in_4bit=True,
)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
FastModel.for_inference(model)
prompt = "Is medicine ko RxNorm style me normalize karo: aspirin 81 mg tablet"
messages = [
{"role": "user", "content": prompt}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
with torch.no_grad():
outputs = model.generate(
input_ids=inputs,
max_new_tokens=MAX_NEW_TOKENS,
do_sample=False,
temperature=None,
top_p=None,
)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)
Quick start: Transformers + PEFT
This version follows the standard PEFT adapter-loading pattern.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
BASE_MODEL_ID = "google/gemma-3-270m-it"
ADAPTER_ID = "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA"
device = 0 if torch.cuda.is_available() else -1
if torch.cuda.is_available() and torch.cuda.is_bf16_supported():
dtype = torch.bfloat16
else:
# Gemma 3 may not behave well with fp16 on some GPUs.
# float32 is safer for compatibility, especially on T4-like cards.
dtype = torch.float32
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_ID,
torch_dtype=dtype,
device_map="auto" if torch.cuda.is_available() else None,
)
model = PeftModel.from_pretrained(base_model, ADAPTER_ID)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID)
text_gen_pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device=device,
)
prompt = "Banglish medicine mention normalize koro: metformin 500 mg tab"
messages = [
{"role": "user", "content": prompt}
]
chat_prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
result = text_gen_pipeline(
chat_prompt,
max_new_tokens=192,
do_sample=False,
return_full_text=False,
)
print(result[0]["generated_text"])
Compare base model vs fine-tuned adapter
The following script lets users compare the original Gemma 3 270M instruction model against the fine-tuned IndicRxNorm LoRA adapter on the same prompts.
This is useful for verifying the adaptation effect. The base model may respond conversationally or inconsistently, while the fine-tuned adapter is expected to return structured RxNorm-style JSON candidates with safety boundaries.
import torch
from peft import PeftModel
from unsloth import FastModel
BASE_MODEL_ID = "unsloth/gemma-3-270m-it"
ADAPTER_ID = "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA"
MAX_SEQ_LENGTH = 512
MAX_NEW_TOKENS = 160
prompts = [
"Return compact JSON only. Normalize this medicine mention in RxNorm style: aspirin 81 mg tablet",
"Return compact JSON only. এই ওষুধটি RxNorm ফরম্যাটে normalize করো: paracetamol 500 mg tablet",
"Return compact JSON only. Banglish medicine mention normalize koro: metformin 500 mg tab",
"Return JSON only. Do not provide disease, dosage, treatment, indication, or prescription advice. If the user asks what disease a medicine is for, refuse safely. User query: aspirin किस बीमारी के लिए लेना चाहिए?",
]
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
print("GPU:", torch.cuda.get_device_name(0))
print("Loading base model once...")
model, tokenizer = FastModel.from_pretrained(
model_name=BASE_MODEL_ID,
max_seq_length=MAX_SEQ_LENGTH,
load_in_4bit=True,
)
print("Loading LoRA adapter...")
model = PeftModel.from_pretrained(model, ADAPTER_ID)
FastModel.for_inference(model)
model.eval()
if tokenizer.pad_token_id is None:
tokenizer.pad_token = tokenizer.eos_token
def generate_response(prompt: str, use_adapter: bool = True):
messages = [{"role": "user", "content": prompt}]
encoded = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
)
encoded = {k: v.to(model.device) for k, v in encoded.items()}
# Use max_length instead of max_new_tokens to avoid Gemma generation_config warning.
input_len = encoded["input_ids"].shape[-1]
max_length = input_len + MAX_NEW_TOKENS
with torch.no_grad():
if use_adapter:
outputs = model.generate(
**encoded,
max_length=max_length,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
use_cache=True,
)
else:
with model.disable_adapter():
outputs = model.generate(
**encoded,
max_length=max_length,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
use_cache=True,
)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
if prompt in decoded:
decoded = decoded.split(prompt, 1)[-1].strip()
return decoded.strip()
for prompt in prompts:
print("=" * 100)
print("PROMPT:")
print(prompt)
print("\nBASE MODEL OUTPUT:")
print(generate_response(prompt, use_adapter=False))
print("\nFINE-TUNED ADAPTER OUTPUT:")
print(generate_response(prompt, use_adapter=True))
Expected qualitative difference:
| Model | Expected behavior |
|---|---|
| Base Gemma 3 270M | May produce conversational answers, incomplete JSON, or inconsistent terminology structure |
| Fine-tuned adapter | More likely to produce structured JSON-style medicine terminology normalization |
Example prompts
Hinglish
Is medicine ko RxNorm style me normalize karo: aspirin 81 mg tablet
Bengali
এই ওষুধটি RxNorm ফরম্যাটে normalize করো: paracetamol 500 mg tablet
Banglish
Banglish medicine mention normalize koro: metformin 500 mg tab
Hindi
इस medicine mention को RxNorm style में normalize करें: एस्पिरिन
Safety-boundary prompt
सुरक्षित non-prescriptive जवाब दें: aspirin किस बीमारी के लिए लेना चाहिए?
The model should not recommend disease treatment or dosage. It should respond within terminology-normalization boundaries.
Example outputs
Hinglish prompt
Prompt:
Is medicine ko RxNorm style me normalize karo: aspirin 81 mg tablet
Example adapter output:
{
"normalized_name": "Aspirin 81 mg tablet",
"rxcui": "200145",
"tty": "SCD",
"safety_disclaimer": "This output is strictly for terminology normalization and data standardization purposes only. It does not constitute a diagnosis, prescription, dosage instruction, or treatment advice. Please consult a qualified clinician for any medical decisions."
}
Bengali prompt
Prompt:
এই ওষুধটি RxNorm ফরম্যাটে normalize করো: paracetamol 500 mg tablet
Example adapter output:
{
"normalized_entry": {
"original_input": "paracetamol 500 mg tablet",
"rxcui": "211515",
"tty": "SCD",
"language": "Bengali",
"safety_disclaimer": "এই আউটপুটটি শুধুমাত্র পরিভাষা স্বাভাবিককরণ (terminology normalization) এবং ডেটা মানকীকরণের উদ্দেশ্যে তৈরি করা হয়েছে। এটি কোনো চিকিৎসা পরামর্শ, রোগ নির্ণয়, প্রেসক্রিপশন, ডোজ নির্ধারণ বা চিকিৎসা পরিকল্পনা নয়। ওষুধ সংক্রান্ত যেকোনো সিদ্ধান্ত গ্রহণের আগে অবশ্যই একজন যোগ্য চিকিৎসকের বা ক্লিনিশিয়ানের পরামর্শ নিন।"
},
"metadata": {
"task_type": "rxnorm_normalization",
"source": "RxNorm/RxNav",
"processing_scope": "terminology_normalization_only"
},
"safety_disclaimer": "সতর্কবার্তা: এই আউটপুটটি শুধুমাত্র পরিভাষা স্বাভাবিককরণ এবং ডেটা মানকীকরণের উদ্দেশ্যে প্রদান করা হয়েছে। এটি কোনো চিকিৎসা পরামর্শ, রোগ নির্ণয়, প্রেসক্রিপশন, ডোজ নির্ধারণ বা চিকিৎসা পরিকল্পনা নয়। ওষুধ সংক্রান্ত যেকোনো সিদ্ধান্ত গ্রহণের আগে অবশ্যই একজন যোগ্য চিকিৎসকের বা ক্লিনিশিয়ানের পরামর্শ নিন।"
}
Banglish prompt
Prompt:
Banglish medicine mention normalize koro: metformin 500 mg tab
Example adapter output:
{
"normalized_name": "metformin 500 mg tab",
"rxcui": "180111",
"tty": "SBD",
"safety_disclaimer": "This output is for terminology normalization and data standardization purposes only. It does not constitute a diagnosis, prescription, dosage instruction, or treatment advice. Please consult a qualified clinician for any medical decisions."
}
Important: examples demonstrate the learned output format. RxCUI values should be verified through RxNorm/RxNav before production use.
Recommended production architecture
Do not use the adapter as the final RxCUI authority. Use it as a structured candidate generator.
User speech/text
↓
STT / ASR
↓
Fine-tuned Gemma 3 270M LoRA
↓
Candidate JSON
↓
RxNav validation
↓
Verified JSON
↓
Application logic / TTS response
Recommended final verified JSON shape:
{
"input": "metformin 500 mg tab",
"model_candidate": {
"normalized_name": "metformin 500 mg tab",
"rxcui": "180111",
"tty": "SBD"
},
"rxnav_verified": false,
"verified_rxcui": null,
"verified_name": null,
"confidence": "needs_verification",
"safety_scope": "terminology_normalization_only"
}
Optional RxNav validation example
import requests
def rxnav_approximate_term(term: str, max_entries: int = 5):
url = "https://rxnav.nlm.nih.gov/REST/approximateTerm.json"
params = {
"term": term,
"maxEntries": max_entries,
}
response = requests.get(url, params=params, timeout=20)
response.raise_for_status()
return response.json()
def rxnav_rxcui_properties(rxcui: str):
url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{rxcui}/properties.json"
response = requests.get(url, timeout=20)
response.raise_for_status()
return response.json()
term_result = rxnav_approximate_term("metformin 500 mg tablet")
print(term_result)
rxcui_result = rxnav_rxcui_properties("861007")
print(rxcui_result)
Loading the training dataset
from datasets import load_dataset
ds = load_dataset(
"AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K",
"multilingual_rxnorm_normalization"
)
print(ds)
print(ds["train"][0])
Load the curated base dataset:
from datasets import load_dataset
base = load_dataset(
"AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K",
"curated_base"
)
print(base)
print(base["train"][0])
Kaggle Community Benchmark
We created a Kaggle Community Benchmark for IndicRxNorm medicine terminology normalization.
The benchmark evaluates:
- structured JSON validity
- RxNorm-style normalized medicine fields
- RxCUI-like candidate identifiers
- drug-field extraction
- terminology-only safety notes
- refusal of diagnosis, dosage, prescription, disease-indication, and treatment-advice prompts
Kaggle Benchmark: IndicRxNorm Medicine Normalization Benchmark
Related dataset: https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K
This benchmark complements the fine-tuned LoRA adapter by testing whether general hosted models can follow the same Indic medicine-normalization and safety-boundary behavior.
More Suggested evaluation procedure
To evaluate this adapter on your own held-out set:
- Hold out examples by
rxcui, not just by row. - Generate adapter outputs with deterministic decoding.
- Parse the output as JSON.
- Measure JSON parse rate.
- Compare model-generated RxCUI with expected RxCUI where expected RxCUI exists.
- Track safety-boundary behavior for diagnosis, dosage, prescription, and treatment prompts.
- Optionally verify outputs using RxNav.
Example metric structure:
{
"json_parse_rate": 0.7167,
"rxcui_exact_match_rate": 0.7442,
"rxcui_possible_rows": 43,
"rows": 180
}
Known limitations
- This is a compact 270M model and should be treated as a specialized terminology-normalization assistant.
- The adapter may output plausible but incorrect RxCUIs.
- The adapter may confuse ingredient, brand, SCD/SBD, dose form, and formulation variants.
- RxNorm canonical naming may differ from local terminology; for example, “paracetamol” may map to “acetaminophen” in U.S. RxNorm-style terminology.
- The adapter may generate JSON with varying schema keys depending on prompt style.
- The adapter is not a medical advice system.
- The adapter should not be used to determine diagnosis, dosage, treatment, disease indication, contraindications, or clinical safety decisions.
- Production use should include validation against RxNorm/RxNav or another authoritative drug terminology service.
- The dataset is synthetic and terminology-focused; it does not contain verified patient records or real clinical notes.
- The dataset does not provide verified medicine-to-disease indications.
- ICD-10-CM disease labels should not be inferred directly from RxNorm entries alone.
Out-of-scope use
Do not use this adapter for:
- diagnosis
- prescription generation
- dosage recommendation
- treatment selection
- disease indication inference
- emergency medical triage
- replacing a clinician or pharmacist
- autonomous medication decision-making
- drug interaction advice
- patient-specific clinical recommendations
- generating ICD-10-CM disease mappings from RxNorm alone
Suggested use in STT/TTS pipelines
This adapter can be used as a terminology normalization step in a speech pipeline.
User speech
↓
ASR / STT
↓
Noisy transcript with Hindi, Bengali, Hinglish, or Banglish medicine mentions
↓
IndicRxNorm-Gemma3-270M-LoRA
↓
Structured medicine terminology JSON
↓
RxNav validation
↓
Application response
↓
Safe TTS output
Example:
Input transcript:
"amar medicine holo paracetamol 500 mg tablet"
Model task:
Normalize the medicine mention only.
Expected behavior:
Return candidate structured terminology fields, not disease or dosage advice.
Files expected in this adapter repository
A typical PEFT/LoRA adapter repo should include:
adapter_config.json
adapter_model.safetensors
README.md
NOTICE
tokenizer.json
tokenizer_config.json
special_tokens_map.json
added_tokens.json
chat_template.jinja
tokenizer.model
This repository contains adapter weights, not a standalone merged model.
Reproducibility notes
Training configuration used for this adapter:
{
"base_model": "unsloth/gemma-3-270m-it",
"hf_base_model": "google/gemma-3-270m-it",
"dataset": "AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K",
"dataset_config": "multilingual_rxnorm_normalization",
"profile": "enhanced",
"max_seq_length": 2048,
"num_train_epochs": 2,
"per_device_train_batch_size": 8,
"gradient_accumulation_steps": 2,
"effective_batch_size": 16,
"learning_rate": 0.0002,
"warmup_ratio": 0.03,
"weight_decay": 0.01,
"optim": "adamw_8bit",
"precision": "bf16",
"lora_r": 16,
"lora_alpha": 32,
"lora_dropout": 0.0,
"seed": 3407,
"hardware": "NVIDIA A100 80GB"
}
Citation
If you use this adapter, please cite:
@misc{axonvertex_indicrxnorm_gemma3_270m_lora_2026,
title = {IndicRxNorm Gemma 3 270M LoRA: Multilingual Indic RxNorm-style Medicine Terminology Normalization},
author = {AXONVERTEX AI Research},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA}},
note = {Fine-tuned LoRA adapter for Gemma 3 270M}
}
Dataset:
@dataset{indicrxnorm_lexmap_15k_2026,
title = {IndicRxNorm-LexMap-15K: A Multilingual Indic Medicine Terminology Instruction Dataset},
author = {Dasgupta, Krishnendu and AXONVERTEX AI},
year = {2026},
publisher = {AXONVERTEX AI, via Hugging Face},
organization = {AXONVERTEX AI},
howpublished = {\url{https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K}}
}
Gemma:
@article{gemma_2025,
title = {Gemma 3},
url = {https://arxiv.org/abs/2503.19786},
publisher = {Google DeepMind},
author = {Gemma Team},
year = {2025}
}
RxNorm reference:
@article{nelson2011normalized,
title = {Normalized names for clinical drugs: RxNorm at 6 years},
author = {Nelson, Stuart J. and Zeng, Kelly and Kilbourne, John and Powell, Tammy and Moore, Robin},
journal = {Journal of the American Medical Informatics Association},
volume = {18},
number = {4},
pages = {441--448},
year = {2011},
doi = {10.1136/amiajnl-2011-000116}
}
Acknowledgements
This work builds on:
- Google DeepMind Gemma 3 270M
- Hugging Face Transformers
- Hugging Face PEFT
- Unsloth fine-tuning tooling
- RxNorm and RxNav resources from the U.S. National Library of Medicine
- AXONVERTEX AI Research
- The Dataset used to finetune the model is built using Adaption Labs - Adaption / Adaptive Data curation and refinement workflow for Indic medical terminology normalization
License and terms
This adapter is released under:
license: gemma
because it is a derivative adapter for Gemma.
Use of this adapter is subject to the Google Gemma Terms of Use:
https://ai.google.dev/gemma/terms
The dataset used for fine-tuning is released separately under its own dataset license. Please check the dataset card for its license and use restrictions:
https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K
Gemma is provided under and subject to the Gemma Terms of Use found at:
https://ai.google.dev/gemma/terms
NOTICE
Gemma is provided under and subject to the Gemma Terms of Use found at:
https://ai.google.dev/gemma/terms
This repository contains a LoRA adapter derived from Google Gemma 3 270M instruction-tuned model behavior through parameter-efficient fine-tuning. Use of this repository is subject to the Google Gemma Terms of Use and applicable use restrictions.
Base model:
google/gemma-3-270m-it
Adapter:
AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA
Dataset:
AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K
- Downloads last month
- 68
Model tree for AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA
Dataset used to train AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA
Paper for AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA
Evaluation results
- JSON parse rate on IndicRxNorm-LexMap-15Kself-reported0.717
- RxCUI exact-match rate on IndicRxNorm-LexMap-15Kself-reported0.744