IndicRxNorm Gemma 3 270M LoRA

IndicRxNorm-Gemma3-270M-LoRA is a PEFT/LoRA adapter fine-tuned from google/gemma-3-270m-it / unsloth/gemma-3-270m-it for multilingual Indic medicine terminology normalization.

The adapter is designed to convert Hindi, Bengali, Hinglish, and Banglish medicine mentions into structured RxNorm-style JSON candidates. It is intended for medicine terminology workflows, especially after STT/ASR transcription and before TTS or downstream structured clinical terminology processing.

This adapter is not a diagnosis, prescription, dosage, disease-treatment, or clinical decision model.

Repository:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Dataset:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

Base model:

google/gemma-3-270m-it

Intended use

This adapter is intended for:

Hindi, Bengali, Hinglish, and Banglish medicine mention normalization
RxNorm-style structured JSON generation
RxCUI candidate extraction
medicine-name NER
RxNorm entity-linking style outputs
drug-field extraction
ingredient, strength, dose-form, and terminology-field extraction
terminology-only safety-boundary behavior
STT/TTS pipeline normalization after speech transcription
low-resource Indic clinical NLP experimentation
edge-scale structured extraction with a compact model

This model is not intended for:

diagnosis
prescription generation
dosage recommendation
disease-treatment advice
emergency triage
clinical decision-making
replacing a clinician or pharmacist
autonomous medication decision-making
inferring ICD-10-CM disease labels from RxNorm alone

Why this model exists

Many medicine-name normalization systems assume clean English text and enough compute to run larger medical language models. Real-world low-resource settings often have two constraints at the same time:

Low-resource language setting
Hindi, Bengali, Hinglish, Banglish, transliterated medicine names, and code-mixed user text are underrepresented in standard clinical NLP resources.
Low-resource compute setting
Deployment targets may include small clinics, mobile workflows, local/offline assistants, privacy-sensitive environments, or STT/TTS pipelines where a compact edge model is preferable.

This adapter explores whether a very small model, Gemma 3 270M, can learn structured medicine terminology normalization when fine-tuned on a curated multilingual RxNorm-style instruction dataset.

Model summary

Field	Value
Model repo	`AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA`
Adapter type	PEFT LoRA adapter
Base model	`google/gemma-3-270m-it`
Training base used	`unsloth/gemma-3-270m-it`
Training backend	Unsloth + Transformers + TRL + PEFT
Dataset	`AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K`
Primary dataset config	`multilingual_rxnorm_normalization`
Languages/styles	Hindi, Bengali, Hinglish, Banglish
Primary output	Structured JSON
Primary task	RxNorm-style medicine terminology normalization
License	`gemma`, subject to Google Gemma Terms of Use
Dataset license	`cc-by-nc-4.0`; see dataset card
Safety scope	Terminology normalization only

Base model attribution

This adapter is fine-tuned from Google Gemma 3 270M instruction-tuned model behavior.

Base model: google/gemma-3-270m-it
Base model page: https://huggingface.co/google/gemma-3-270m-it
Gemma authors: Google DeepMind / Gemma Team
Gemma Terms of Use: https://ai.google.dev/gemma/terms
Gemma technical report citation is included below.

Google’s Gemma 3 model card describes Gemma as a family of lightweight open models from Google. The Gemma 3 270M and 1B models support text input/output and a 32K-token context window. The base model requires users to accept Google’s Gemma usage license on Hugging Face before accessing model files.

This repository contains the LoRA adapter, not a full standalone merged base model.

Dataset attribution

Fine-tuned using:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

Dataset page:

https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

The dataset card describes IndicRxNorm-LexMap-15K as a multilingual Indic medicine terminology instruction dataset for:

medicine-name understanding
RxNorm normalization
RxCUI entity linking
structured drug-field extraction
safe non-prescriptive clinical terminology tasks

The dataset includes two configurations:

Config	Role
`multilingual_rxnorm_normalization`	Primary adapted dataset
`curated_base`	Original curated base dataset

The primary adapted dataset file is:

multilingual_rxnorm_normalization.jsonl

The curated base dataset file is:

adaptive_upload_indicrxnorm_lexmap_15k.jsonl

The adapted dataset was created from the curated base dataset through the AXONVERTEX AI Research / Adaption / Adaptive Data curation and refinement workflow. The adaptation preserved structured RxNorm/RxCUI terminology facts while improving instruction clarity, formatting, and safety constraints.

Dataset composition

Primary adapted dataset: multilingual_rxnorm_normalization

Metric	Value
Rows	14,910
JSON parse errors	0
Language/script styles	4
Task types	6

Language/style distribution:

Language/style	Rows
Banglish	3,736
Hindi	3,727
Hinglish	3,725
Bengali	3,722

Task distribution:

Task type	Rows
`safety_boundary_refusal`	2,494
`terminology_summary`	2,490
`drug_field_extraction`	2,485
`medicine_ner`	2,482
`rxnorm_entity_linking`	2,481
`rxnorm_normalization`	2,478

Primary schema:

{
  "prompt": "original prompt",
  "completion": "original JSON completion",
  "enhanced_prompt": "adapted prompt",
  "enhanced_completion": "adapted JSON completion",
  "context": "compact context metadata",
  "id": "unique row id",
  "language": "Hindi | Bengali | Hinglish | Banglish",
  "language_code": "hin_Deva | ben_Beng | hi_Latn | bn_Latn",
  "task_type": "medicine_ner | rxnorm_normalization | drug_field_extraction | rxnorm_entity_linking | terminology_summary | safety_boundary_refusal"
}

Credits

Adaption Labs

RxNorm / RxNav attribution

RxNorm is provided by the U.S. National Library of Medicine.

RxNorm provides normalized names and unique identifiers for medicines and drugs. It links drug names to vocabularies commonly used in pharmacy management and drug-interaction software.

References:

RxNorm overview: https://www.nlm.nih.gov/research/umls/rxnorm/index.html
RxNorm purpose: https://www.nlm.nih.gov/research/umls/rxnorm/overview.html
RxNav: https://lhncbc.nlm.nih.gov/RxNav/
RxNav APIs: https://lhncbc.nlm.nih.gov/RxNav/APIs/
RxNorm APIs: https://lhncbc.nlm.nih.gov/RxNav/APIs/RxNormAPIs.html

Important: this adapter generates RxNorm-style candidates. Production systems should verify generated RxCUIs against RxNorm/RxNav.

What this model does

The adapter is trained for tasks such as:

medicine-name named entity recognition
RxNorm-style normalization
RxCUI candidate extraction
RxNorm entity linking
drug-field extraction
ingredient extraction
strength extraction
dose-form extraction
safe terminology summaries
safe refusal for diagnosis, dosage, prescription, or disease-treatment prompts

The intended high-level workflow is:

Speech / text input
    ↓
STT or user text
    ↓
IndicRxNorm-Gemma3-270M-LoRA
    ↓
Structured medicine terminology JSON
    ↓
Optional RxNav / RxNorm verification
    ↓
Downstream app logic or TTS response

Safety note

This adapter generates terminology-normalization candidates.

It should not be used as the final authority for:

RxCUI values
diagnosis
prescription
dosage
disease indication
contraindication
treatment recommendation
drug safety decisions
clinical decision support

For production use, verify model-generated RxCUIs through RxNorm/RxNav or another authoritative terminology service.

Training setup

Training was performed with LoRA/PEFT using Unsloth.

Setting	Value
Base model	`unsloth/gemma-3-270m-it` / `google/gemma-3-270m-it`
Fine-tuning method	LoRA / PEFT
Training backend	Unsloth
Train rows	12,971
Validation rows	746
Epochs	2
Total steps	1,622
Max sequence length	2048
Per-device batch size	8
Gradient accumulation	2
Effective batch size	16
Optimizer	`adamw_8bit`
Precision	BF16
Hardware	NVIDIA A100 80GB
Trainable parameters	3,796,992 / 271,895,168
Trainable percentage	1.40%
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.0

Target modules:

q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training loss and validation loss:

Step	Training loss	Validation loss
250	0.934334	0.943133
500	0.793007	0.797340
750	0.774033	0.727852
1000	0.621535	0.687793
1250	0.647093	0.659469
1500	0.700044	0.641339
1622	0.636263	0.637486

The validation loss continued improving through the end of training, so the final adapter checkpoint was used.

Evaluation summary

Evaluation used a held-out RxCUI-grouped test sample of 180 rows. Grouping by RxCUI helps reduce leakage across language variants and task variants for the same medicine concept.

Model	JSON parse rate	RxCUI exact-match rate
Base Gemma 3 270M	7.22%	2.33%
Fine-tuned LoRA adapter	71.67%	74.42%

Raw evaluation numbers:

{
  "base_model": {
    "label": "base_model",
    "split": "test",
    "rows": 180,
    "json_parse_rate": 0.07222222222222222,
    "rxcui_exact_match_rate": 0.023255813953488372,
    "rxcui_possible_rows": 43,
    "elapsed_seconds": 2064.0168437957764,
    "seconds_per_row": 11.466760243309869
  },
  "adapter_model": {
    "label": "adapter_model",
    "split": "test",
    "rows": 180,
    "json_parse_rate": 0.7166666666666667,
    "rxcui_exact_match_rate": 0.7441860465116279,
    "rxcui_possible_rows": 43,
    "elapsed_seconds": 5934.368983030319,
    "seconds_per_row": 32.96871657239066
  }
}

Interpretation:

The adapter substantially improves structured JSON generation.
The adapter substantially improves RxCUI candidate matching on the held-out sample.
RxCUI outputs should still be verified against RxNorm/RxNav before production use.

Installation

Basic dependencies:

pip install -U transformers peft accelerate safetensors sentencepiece

Recommended Unsloth path:

pip install -U unsloth

Optional Hugging Face login if needed:

huggingface-cli login

You may need to accept the Gemma terms on the base model page before loading the base model from Hugging Face:

https://huggingface.co/google/gemma-3-270m-it

Quick start: tested Unsloth loading path

This is the recommended loading path for this adapter.

import torch
from peft import PeftModel
from unsloth import FastModel

BASE_MODEL_ID = "unsloth/gemma-3-270m-it"
ADAPTER_ID = "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA"

MAX_SEQ_LENGTH = 2048
MAX_NEW_TOKENS = 192

model, tokenizer = FastModel.from_pretrained(
    model_name=BASE_MODEL_ID,
    max_seq_length=MAX_SEQ_LENGTH,
    load_in_4bit=True,
)

model = PeftModel.from_pretrained(model, ADAPTER_ID)
FastModel.for_inference(model)

prompt = "Is medicine ko RxNorm style me normalize karo: aspirin 81 mg tablet"

messages = [
    {"role": "user", "content": prompt}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=MAX_NEW_TOKENS,
        do_sample=False,
        temperature=None,
        top_p=None,
    )

text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)

Quick start: Transformers + PEFT

This version follows the standard PEFT adapter-loading pattern.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

BASE_MODEL_ID = "google/gemma-3-270m-it"
ADAPTER_ID = "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA"

device = 0 if torch.cuda.is_available() else -1

if torch.cuda.is_available() and torch.cuda.is_bf16_supported():
    dtype = torch.bfloat16
else:
    # Gemma 3 may not behave well with fp16 on some GPUs.
    # float32 is safer for compatibility, especially on T4-like cards.
    dtype = torch.float32

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    torch_dtype=dtype,
    device_map="auto" if torch.cuda.is_available() else None,
)

model = PeftModel.from_pretrained(base_model, ADAPTER_ID)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID)

text_gen_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=device,
)

prompt = "Banglish medicine mention normalize koro: metformin 500 mg tab"

messages = [
    {"role": "user", "content": prompt}
]

chat_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

result = text_gen_pipeline(
    chat_prompt,
    max_new_tokens=192,
    do_sample=False,
    return_full_text=False,
)

print(result[0]["generated_text"])

Compare base model vs fine-tuned adapter

The following script lets users compare the original Gemma 3 270M instruction model against the fine-tuned IndicRxNorm LoRA adapter on the same prompts.

This is useful for verifying the adaptation effect. The base model may respond conversationally or inconsistently, while the fine-tuned adapter is expected to return structured RxNorm-style JSON candidates with safety boundaries.


import torch
from peft import PeftModel
from unsloth import FastModel

BASE_MODEL_ID = "unsloth/gemma-3-270m-it"
ADAPTER_ID = "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA"

MAX_SEQ_LENGTH = 512
MAX_NEW_TOKENS = 160

prompts = [
    "Return compact JSON only. Normalize this medicine mention in RxNorm style: aspirin 81 mg tablet",
    "Return compact JSON only. এই ওষুধটি RxNorm ফরম্যাটে normalize করো: paracetamol 500 mg tablet",
    "Return compact JSON only. Banglish medicine mention normalize koro: metformin 500 mg tab",
    "Return JSON only. Do not provide disease, dosage, treatment, indication, or prescription advice. If the user asks what disease a medicine is for, refuse safely. User query: aspirin किस बीमारी के लिए लेना चाहिए?",
]

print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))

print("Loading base model once...")
model, tokenizer = FastModel.from_pretrained(
    model_name=BASE_MODEL_ID,
    max_seq_length=MAX_SEQ_LENGTH,
    load_in_4bit=True,
)

print("Loading LoRA adapter...")
model = PeftModel.from_pretrained(model, ADAPTER_ID)

FastModel.for_inference(model)
model.eval()

if tokenizer.pad_token_id is None:
    tokenizer.pad_token = tokenizer.eos_token


def generate_response(prompt: str, use_adapter: bool = True):
    messages = [{"role": "user", "content": prompt}]

    encoded = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt",
        return_dict=True,
    )

    encoded = {k: v.to(model.device) for k, v in encoded.items()}

    # Use max_length instead of max_new_tokens to avoid Gemma generation_config warning.
    input_len = encoded["input_ids"].shape[-1]
    max_length = input_len + MAX_NEW_TOKENS

    with torch.no_grad():
        if use_adapter:
            outputs = model.generate(
                **encoded,
                max_length=max_length,
                do_sample=False,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id,
                use_cache=True,
            )
        else:
            with model.disable_adapter():
                outputs = model.generate(
                    **encoded,
                    max_length=max_length,
                    do_sample=False,
                    pad_token_id=tokenizer.eos_token_id,
                    eos_token_id=tokenizer.eos_token_id,
                    use_cache=True,
                )

    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)

    if prompt in decoded:
        decoded = decoded.split(prompt, 1)[-1].strip()

    return decoded.strip()


for prompt in prompts:
    print("=" * 100)
    print("PROMPT:")
    print(prompt)

    print("\nBASE MODEL OUTPUT:")
    print(generate_response(prompt, use_adapter=False))

    print("\nFINE-TUNED ADAPTER OUTPUT:")
    print(generate_response(prompt, use_adapter=True))

Expected qualitative difference:

Model	Expected behavior
Base Gemma 3 270M	May produce conversational answers, incomplete JSON, or inconsistent terminology structure
Fine-tuned adapter	More likely to produce structured JSON-style medicine terminology normalization

Example prompts

Hinglish

Is medicine ko RxNorm style me normalize karo: aspirin 81 mg tablet

Bengali

এই ওষুধটি RxNorm ফরম্যাটে normalize করো: paracetamol 500 mg tablet

Banglish

Banglish medicine mention normalize koro: metformin 500 mg tab

Hindi

इस medicine mention को RxNorm style में normalize करें: एस्पिरिन

Safety-boundary prompt

सुरक्षित non-prescriptive जवाब दें: aspirin किस बीमारी के लिए लेना चाहिए?

The model should not recommend disease treatment or dosage. It should respond within terminology-normalization boundaries.

Example outputs

Hinglish prompt

Prompt:

Is medicine ko RxNorm style me normalize karo: aspirin 81 mg tablet

Example adapter output:

{
  "normalized_name": "Aspirin 81 mg tablet",
  "rxcui": "200145",
  "tty": "SCD",
  "safety_disclaimer": "This output is strictly for terminology normalization and data standardization purposes only. It does not constitute a diagnosis, prescription, dosage instruction, or treatment advice. Please consult a qualified clinician for any medical decisions."
}

Bengali prompt

Prompt:

এই ওষুধটি RxNorm ফরম্যাটে normalize করো: paracetamol 500 mg tablet

Example adapter output:

{
  "normalized_entry": {
    "original_input": "paracetamol 500 mg tablet",
    "rxcui": "211515",
    "tty": "SCD",
    "language": "Bengali",
    "safety_disclaimer": "এই আউটপুটটি শুধুমাত্র পরিভাষা স্বাভাবিককরণ (terminology normalization) এবং ডেটা মানকীকরণের উদ্দেশ্যে তৈরি করা হয়েছে। এটি কোনো চিকিৎসা পরামর্শ, রোগ নির্ণয়, প্রেসক্রিপশন, ডোজ নির্ধারণ বা চিকিৎসা পরিকল্পনা নয়। ওষুধ সংক্রান্ত যেকোনো সিদ্ধান্ত গ্রহণের আগে অবশ্যই একজন যোগ্য চিকিৎসকের বা ক্লিনিশিয়ানের পরামর্শ নিন।"
  },
  "metadata": {
    "task_type": "rxnorm_normalization",
    "source": "RxNorm/RxNav",
    "processing_scope": "terminology_normalization_only"
  },
  "safety_disclaimer": "সতর্কবার্তা: এই আউটপুটটি শুধুমাত্র পরিভাষা স্বাভাবিককরণ এবং ডেটা মানকীকরণের উদ্দেশ্যে প্রদান করা হয়েছে। এটি কোনো চিকিৎসা পরামর্শ, রোগ নির্ণয়, প্রেসক্রিপশন, ডোজ নির্ধারণ বা চিকিৎসা পরিকল্পনা নয়। ওষুধ সংক্রান্ত যেকোনো সিদ্ধান্ত গ্রহণের আগে অবশ্যই একজন যোগ্য চিকিৎসকের বা ক্লিনিশিয়ানের পরামর্শ নিন।"
}

Banglish prompt

Prompt:

Banglish medicine mention normalize koro: metformin 500 mg tab

Example adapter output:

{
  "normalized_name": "metformin 500 mg tab",
  "rxcui": "180111",
  "tty": "SBD",
  "safety_disclaimer": "This output is for terminology normalization and data standardization purposes only. It does not constitute a diagnosis, prescription, dosage instruction, or treatment advice. Please consult a qualified clinician for any medical decisions."
}

Important: examples demonstrate the learned output format. RxCUI values should be verified through RxNorm/RxNav before production use.

Recommended production architecture

Do not use the adapter as the final RxCUI authority. Use it as a structured candidate generator.

User speech/text
    ↓
STT / ASR
    ↓
Fine-tuned Gemma 3 270M LoRA
    ↓
Candidate JSON
    ↓
RxNav validation
    ↓
Verified JSON
    ↓
Application logic / TTS response

Recommended final verified JSON shape:

{
  "input": "metformin 500 mg tab",
  "model_candidate": {
    "normalized_name": "metformin 500 mg tab",
    "rxcui": "180111",
    "tty": "SBD"
  },
  "rxnav_verified": false,
  "verified_rxcui": null,
  "verified_name": null,
  "confidence": "needs_verification",
  "safety_scope": "terminology_normalization_only"
}

Optional RxNav validation example

import requests


def rxnav_approximate_term(term: str, max_entries: int = 5):
    url = "https://rxnav.nlm.nih.gov/REST/approximateTerm.json"
    params = {
        "term": term,
        "maxEntries": max_entries,
    }
    response = requests.get(url, params=params, timeout=20)
    response.raise_for_status()
    return response.json()


def rxnav_rxcui_properties(rxcui: str):
    url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{rxcui}/properties.json"
    response = requests.get(url, timeout=20)
    response.raise_for_status()
    return response.json()


term_result = rxnav_approximate_term("metformin 500 mg tablet")
print(term_result)

rxcui_result = rxnav_rxcui_properties("861007")
print(rxcui_result)

Loading the training dataset

from datasets import load_dataset

ds = load_dataset(
    "AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K",
    "multilingual_rxnorm_normalization"
)

print(ds)
print(ds["train"][0])

Load the curated base dataset:

from datasets import load_dataset

base = load_dataset(
    "AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K",
    "curated_base"
)

print(base)
print(base["train"][0])

Kaggle Community Benchmark

We created a Kaggle Community Benchmark for IndicRxNorm medicine terminology normalization.

The benchmark evaluates:

structured JSON validity
RxNorm-style normalized medicine fields
RxCUI-like candidate identifiers
drug-field extraction
terminology-only safety notes
refusal of diagnosis, dosage, prescription, disease-indication, and treatment-advice prompts

Kaggle Benchmark: IndicRxNorm Medicine Normalization Benchmark

This benchmark complements the fine-tuned LoRA adapter by testing whether general hosted models can follow the same Indic medicine-normalization and safety-boundary behavior.

More Suggested evaluation procedure

To evaluate this adapter on your own held-out set:

Hold out examples by rxcui, not just by row.
Generate adapter outputs with deterministic decoding.
Parse the output as JSON.
Measure JSON parse rate.
Compare model-generated RxCUI with expected RxCUI where expected RxCUI exists.
Track safety-boundary behavior for diagnosis, dosage, prescription, and treatment prompts.
Optionally verify outputs using RxNav.

Example metric structure:

{
  "json_parse_rate": 0.7167,
  "rxcui_exact_match_rate": 0.7442,
  "rxcui_possible_rows": 43,
  "rows": 180
}

Known limitations

This is a compact 270M model and should be treated as a specialized terminology-normalization assistant.
The adapter may output plausible but incorrect RxCUIs.
The adapter may confuse ingredient, brand, SCD/SBD, dose form, and formulation variants.
RxNorm canonical naming may differ from local terminology; for example, “paracetamol” may map to “acetaminophen” in U.S. RxNorm-style terminology.
The adapter may generate JSON with varying schema keys depending on prompt style.
The adapter is not a medical advice system.
The adapter should not be used to determine diagnosis, dosage, treatment, disease indication, contraindications, or clinical safety decisions.
Production use should include validation against RxNorm/RxNav or another authoritative drug terminology service.
The dataset is synthetic and terminology-focused; it does not contain verified patient records or real clinical notes.
The dataset does not provide verified medicine-to-disease indications.
ICD-10-CM disease labels should not be inferred directly from RxNorm entries alone.

Out-of-scope use

Do not use this adapter for:

diagnosis
prescription generation
dosage recommendation
treatment selection
disease indication inference
emergency medical triage
replacing a clinician or pharmacist
autonomous medication decision-making
drug interaction advice
patient-specific clinical recommendations
generating ICD-10-CM disease mappings from RxNorm alone

Suggested use in STT/TTS pipelines

This adapter can be used as a terminology normalization step in a speech pipeline.

User speech
    ↓
ASR / STT
    ↓
Noisy transcript with Hindi, Bengali, Hinglish, or Banglish medicine mentions
    ↓
IndicRxNorm-Gemma3-270M-LoRA
    ↓
Structured medicine terminology JSON
    ↓
RxNav validation
    ↓
Application response
    ↓
Safe TTS output

Example:

Input transcript:
"amar medicine holo paracetamol 500 mg tablet"

Model task:
Normalize the medicine mention only.

Expected behavior:
Return candidate structured terminology fields, not disease or dosage advice.

Files expected in this adapter repository

A typical PEFT/LoRA adapter repo should include:

adapter_config.json
adapter_model.safetensors
README.md
NOTICE
tokenizer.json
tokenizer_config.json
special_tokens_map.json
added_tokens.json
chat_template.jinja
tokenizer.model

This repository contains adapter weights, not a standalone merged model.

Reproducibility notes

Training configuration used for this adapter:

{
  "base_model": "unsloth/gemma-3-270m-it",
  "hf_base_model": "google/gemma-3-270m-it",
  "dataset": "AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K",
  "dataset_config": "multilingual_rxnorm_normalization",
  "profile": "enhanced",
  "max_seq_length": 2048,
  "num_train_epochs": 2,
  "per_device_train_batch_size": 8,
  "gradient_accumulation_steps": 2,
  "effective_batch_size": 16,
  "learning_rate": 0.0002,
  "warmup_ratio": 0.03,
  "weight_decay": 0.01,
  "optim": "adamw_8bit",
  "precision": "bf16",
  "lora_r": 16,
  "lora_alpha": 32,
  "lora_dropout": 0.0,
  "seed": 3407,
  "hardware": "NVIDIA A100 80GB"
}

Citation

If you use this adapter, please cite:

@misc{axonvertex_indicrxnorm_gemma3_270m_lora_2026,
  title        = {IndicRxNorm Gemma 3 270M LoRA: Multilingual Indic RxNorm-style Medicine Terminology Normalization},
  author       = {AXONVERTEX AI Research},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA}},
  note         = {Fine-tuned LoRA adapter for Gemma 3 270M}
}

Dataset:

@dataset{indicrxnorm_lexmap_15k_2026,
  title        = {IndicRxNorm-LexMap-15K: A Multilingual Indic Medicine Terminology Instruction Dataset},
  author       = {Dasgupta, Krishnendu and AXONVERTEX AI},
  year         = {2026},
  publisher    = {AXONVERTEX AI, via Hugging Face},
  organization = {AXONVERTEX AI},
  howpublished = {\url{https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K}}
}

Gemma:

@article{gemma_2025,
  title     = {Gemma 3},
  url       = {https://arxiv.org/abs/2503.19786},
  publisher = {Google DeepMind},
  author    = {Gemma Team},
  year      = {2025}
}

RxNorm reference:

@article{nelson2011normalized,
  title   = {Normalized names for clinical drugs: RxNorm at 6 years},
  author  = {Nelson, Stuart J. and Zeng, Kelly and Kilbourne, John and Powell, Tammy and Moore, Robin},
  journal = {Journal of the American Medical Informatics Association},
  volume  = {18},
  number  = {4},
  pages   = {441--448},
  year    = {2011},
  doi     = {10.1136/amiajnl-2011-000116}
}

Acknowledgements

This work builds on:

Google DeepMind Gemma 3 270M
Hugging Face Transformers
Hugging Face PEFT
Unsloth fine-tuning tooling
RxNorm and RxNav resources from the U.S. National Library of Medicine
AXONVERTEX AI Research
The Dataset used to finetune the model is built using Adaption Labs - Adaption / Adaptive Data curation and refinement workflow for Indic medical terminology normalization

License and terms

This adapter is released under:

license: gemma

because it is a derivative adapter for Gemma.

Use of this adapter is subject to the Google Gemma Terms of Use:

https://ai.google.dev/gemma/terms

The dataset used for fine-tuning is released separately under its own dataset license. Please check the dataset card for its license and use restrictions:

https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

Gemma is provided under and subject to the Gemma Terms of Use found at:

https://ai.google.dev/gemma/terms

NOTICE

Gemma is provided under and subject to the Gemma Terms of Use found at:

https://ai.google.dev/gemma/terms

This repository contains a LoRA adapter derived from Google Gemma 3 270M instruction-tuned model behavior through parameter-efficient fine-tuning. Use of this repository is subject to the Google Gemma Terms of Use and applicable use restrictions.

Base model:

google/gemma-3-270m-it

Adapter:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Dataset:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

Downloads last month: 68

Model tree for AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Base model

google/gemma-3-270m

Finetuned

google/gemma-3-270m-it

Adapter

(60)

this model

Dataset used to train AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Paper for AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25, 2025 • 57

Evaluation results

JSON parse rate on IndicRxNorm-LexMap-15K
self-reported

0.717
RxCUI exact-match rate on IndicRxNorm-LexMap-15K
self-reported

0.744