IndicRxNorm Gemma 3 270M LoRA

IndicRxNorm-Gemma3-270M-LoRA is a PEFT/LoRA adapter fine-tuned from google/gemma-3-270m-it / unsloth/gemma-3-270m-it for multilingual Indic medicine terminology normalization.

The adapter is designed to convert Hindi, Bengali, Hinglish, and Banglish medicine mentions into structured RxNorm-style JSON candidates. It is intended for medicine terminology workflows, especially after STT/ASR transcription and before TTS or downstream structured clinical terminology processing.

This adapter is not a diagnosis, prescription, dosage, disease-treatment, or clinical decision model.

Repository:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Dataset:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

Base model:

google/gemma-3-270m-it

Intended use

This adapter is intended for:

  • Hindi, Bengali, Hinglish, and Banglish medicine mention normalization
  • RxNorm-style structured JSON generation
  • RxCUI candidate extraction
  • medicine-name NER
  • RxNorm entity-linking style outputs
  • drug-field extraction
  • ingredient, strength, dose-form, and terminology-field extraction
  • terminology-only safety-boundary behavior
  • STT/TTS pipeline normalization after speech transcription
  • low-resource Indic clinical NLP experimentation
  • edge-scale structured extraction with a compact model

This model is not intended for:

  • diagnosis
  • prescription generation
  • dosage recommendation
  • disease-treatment advice
  • emergency triage
  • clinical decision-making
  • replacing a clinician or pharmacist
  • autonomous medication decision-making
  • inferring ICD-10-CM disease labels from RxNorm alone

Why this model exists

Many medicine-name normalization systems assume clean English text and enough compute to run larger medical language models. Real-world low-resource settings often have two constraints at the same time:

  1. Low-resource language setting
    Hindi, Bengali, Hinglish, Banglish, transliterated medicine names, and code-mixed user text are underrepresented in standard clinical NLP resources.

  2. Low-resource compute setting
    Deployment targets may include small clinics, mobile workflows, local/offline assistants, privacy-sensitive environments, or STT/TTS pipelines where a compact edge model is preferable.

This adapter explores whether a very small model, Gemma 3 270M, can learn structured medicine terminology normalization when fine-tuned on a curated multilingual RxNorm-style instruction dataset.


Model summary

Field Value
Model repo AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA
Adapter type PEFT LoRA adapter
Base model google/gemma-3-270m-it
Training base used unsloth/gemma-3-270m-it
Training backend Unsloth + Transformers + TRL + PEFT
Dataset AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K
Primary dataset config multilingual_rxnorm_normalization
Languages/styles Hindi, Bengali, Hinglish, Banglish
Primary output Structured JSON
Primary task RxNorm-style medicine terminology normalization
License gemma, subject to Google Gemma Terms of Use
Dataset license cc-by-nc-4.0; see dataset card
Safety scope Terminology normalization only

Base model attribution

This adapter is fine-tuned from Google Gemma 3 270M instruction-tuned model behavior.

Google’s Gemma 3 model card describes Gemma as a family of lightweight open models from Google. The Gemma 3 270M and 1B models support text input/output and a 32K-token context window. The base model requires users to accept Google’s Gemma usage license on Hugging Face before accessing model files.

This repository contains the LoRA adapter, not a full standalone merged base model.


Dataset attribution

Fine-tuned using:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

Dataset page:

https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

The dataset card describes IndicRxNorm-LexMap-15K as a multilingual Indic medicine terminology instruction dataset for:

  • medicine-name understanding
  • RxNorm normalization
  • RxCUI entity linking
  • structured drug-field extraction
  • safe non-prescriptive clinical terminology tasks

The dataset includes two configurations:

Config Role
multilingual_rxnorm_normalization Primary adapted dataset
curated_base Original curated base dataset

The primary adapted dataset file is:

multilingual_rxnorm_normalization.jsonl

The curated base dataset file is:

adaptive_upload_indicrxnorm_lexmap_15k.jsonl

The adapted dataset was created from the curated base dataset through the AXONVERTEX AI Research / Adaption / Adaptive Data curation and refinement workflow. The adaptation preserved structured RxNorm/RxCUI terminology facts while improving instruction clarity, formatting, and safety constraints.


Dataset composition

Primary adapted dataset: multilingual_rxnorm_normalization

Metric Value
Rows 14,910
JSON parse errors 0
Language/script styles 4
Task types 6

Language/style distribution:

Language/style Rows
Banglish 3,736
Hindi 3,727
Hinglish 3,725
Bengali 3,722

Task distribution:

Task type Rows
safety_boundary_refusal 2,494
terminology_summary 2,490
drug_field_extraction 2,485
medicine_ner 2,482
rxnorm_entity_linking 2,481
rxnorm_normalization 2,478

Primary schema:

{
  "prompt": "original prompt",
  "completion": "original JSON completion",
  "enhanced_prompt": "adapted prompt",
  "enhanced_completion": "adapted JSON completion",
  "context": "compact context metadata",
  "id": "unique row id",
  "language": "Hindi | Bengali | Hinglish | Banglish",
  "language_code": "hin_Deva | ben_Beng | hi_Latn | bn_Latn",
  "task_type": "medicine_ner | rxnorm_normalization | drug_field_extraction | rxnorm_entity_linking | terminology_summary | safety_boundary_refusal"
}

Credits

Adaption Labs


RxNorm / RxNav attribution

RxNorm is provided by the U.S. National Library of Medicine.

RxNorm provides normalized names and unique identifiers for medicines and drugs. It links drug names to vocabularies commonly used in pharmacy management and drug-interaction software.

References:

Important: this adapter generates RxNorm-style candidates. Production systems should verify generated RxCUIs against RxNorm/RxNav.


What this model does

The adapter is trained for tasks such as:

  • medicine-name named entity recognition
  • RxNorm-style normalization
  • RxCUI candidate extraction
  • RxNorm entity linking
  • drug-field extraction
  • ingredient extraction
  • strength extraction
  • dose-form extraction
  • safe terminology summaries
  • safe refusal for diagnosis, dosage, prescription, or disease-treatment prompts

The intended high-level workflow is:

Speech / text input
    ↓
STT or user text
    ↓
IndicRxNorm-Gemma3-270M-LoRA
    ↓
Structured medicine terminology JSON
    ↓
Optional RxNav / RxNorm verification
    ↓
Downstream app logic or TTS response

Safety note

This adapter generates terminology-normalization candidates.

It should not be used as the final authority for:

  • RxCUI values
  • diagnosis
  • prescription
  • dosage
  • disease indication
  • contraindication
  • treatment recommendation
  • drug safety decisions
  • clinical decision support

For production use, verify model-generated RxCUIs through RxNorm/RxNav or another authoritative terminology service.


Training setup

Training was performed with LoRA/PEFT using Unsloth.

Setting Value
Base model unsloth/gemma-3-270m-it / google/gemma-3-270m-it
Fine-tuning method LoRA / PEFT
Training backend Unsloth
Train rows 12,971
Validation rows 746
Epochs 2
Total steps 1,622
Max sequence length 2048
Per-device batch size 8
Gradient accumulation 2
Effective batch size 16
Optimizer adamw_8bit
Precision BF16
Hardware NVIDIA A100 80GB
Trainable parameters 3,796,992 / 271,895,168
Trainable percentage 1.40%
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.0

Target modules:

q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training loss and validation loss:

Step Training loss Validation loss
250 0.934334 0.943133
500 0.793007 0.797340
750 0.774033 0.727852
1000 0.621535 0.687793
1250 0.647093 0.659469
1500 0.700044 0.641339
1622 0.636263 0.637486

The validation loss continued improving through the end of training, so the final adapter checkpoint was used.


Evaluation summary

Evaluation used a held-out RxCUI-grouped test sample of 180 rows. Grouping by RxCUI helps reduce leakage across language variants and task variants for the same medicine concept.

Model JSON parse rate RxCUI exact-match rate
Base Gemma 3 270M 7.22% 2.33%
Fine-tuned LoRA adapter 71.67% 74.42%

Raw evaluation numbers:

{
  "base_model": {
    "label": "base_model",
    "split": "test",
    "rows": 180,
    "json_parse_rate": 0.07222222222222222,
    "rxcui_exact_match_rate": 0.023255813953488372,
    "rxcui_possible_rows": 43,
    "elapsed_seconds": 2064.0168437957764,
    "seconds_per_row": 11.466760243309869
  },
  "adapter_model": {
    "label": "adapter_model",
    "split": "test",
    "rows": 180,
    "json_parse_rate": 0.7166666666666667,
    "rxcui_exact_match_rate": 0.7441860465116279,
    "rxcui_possible_rows": 43,
    "elapsed_seconds": 5934.368983030319,
    "seconds_per_row": 32.96871657239066
  }
}

Interpretation:

  • The adapter substantially improves structured JSON generation.
  • The adapter substantially improves RxCUI candidate matching on the held-out sample.
  • RxCUI outputs should still be verified against RxNorm/RxNav before production use.

Installation

Basic dependencies:

pip install -U transformers peft accelerate safetensors sentencepiece

Recommended Unsloth path:

pip install -U unsloth

Optional Hugging Face login if needed:

huggingface-cli login

You may need to accept the Gemma terms on the base model page before loading the base model from Hugging Face:

https://huggingface.co/google/gemma-3-270m-it

Quick start: tested Unsloth loading path

This is the recommended loading path for this adapter.

import torch
from peft import PeftModel
from unsloth import FastModel

BASE_MODEL_ID = "unsloth/gemma-3-270m-it"
ADAPTER_ID = "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA"

MAX_SEQ_LENGTH = 2048
MAX_NEW_TOKENS = 192

model, tokenizer = FastModel.from_pretrained(
    model_name=BASE_MODEL_ID,
    max_seq_length=MAX_SEQ_LENGTH,
    load_in_4bit=True,
)

model = PeftModel.from_pretrained(model, ADAPTER_ID)
FastModel.for_inference(model)

prompt = "Is medicine ko RxNorm style me normalize karo: aspirin 81 mg tablet"

messages = [
    {"role": "user", "content": prompt}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=MAX_NEW_TOKENS,
        do_sample=False,
        temperature=None,
        top_p=None,
    )

text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)

Quick start: Transformers + PEFT

This version follows the standard PEFT adapter-loading pattern.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

BASE_MODEL_ID = "google/gemma-3-270m-it"
ADAPTER_ID = "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA"

device = 0 if torch.cuda.is_available() else -1

if torch.cuda.is_available() and torch.cuda.is_bf16_supported():
    dtype = torch.bfloat16
else:
    # Gemma 3 may not behave well with fp16 on some GPUs.
    # float32 is safer for compatibility, especially on T4-like cards.
    dtype = torch.float32

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    torch_dtype=dtype,
    device_map="auto" if torch.cuda.is_available() else None,
)

model = PeftModel.from_pretrained(base_model, ADAPTER_ID)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID)

text_gen_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=device,
)

prompt = "Banglish medicine mention normalize koro: metformin 500 mg tab"

messages = [
    {"role": "user", "content": prompt}
]

chat_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

result = text_gen_pipeline(
    chat_prompt,
    max_new_tokens=192,
    do_sample=False,
    return_full_text=False,
)

print(result[0]["generated_text"])

Compare base model vs fine-tuned adapter

The following script lets users compare the original Gemma 3 270M instruction model against the fine-tuned IndicRxNorm LoRA adapter on the same prompts.

This is useful for verifying the adaptation effect. The base model may respond conversationally or inconsistently, while the fine-tuned adapter is expected to return structured RxNorm-style JSON candidates with safety boundaries.


import torch
from peft import PeftModel
from unsloth import FastModel

BASE_MODEL_ID = "unsloth/gemma-3-270m-it"
ADAPTER_ID = "AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA"

MAX_SEQ_LENGTH = 512
MAX_NEW_TOKENS = 160

prompts = [
    "Return compact JSON only. Normalize this medicine mention in RxNorm style: aspirin 81 mg tablet",
    "Return compact JSON only. এই ওষুধটি RxNorm ফরম্যাটে normalize করো: paracetamol 500 mg tablet",
    "Return compact JSON only. Banglish medicine mention normalize koro: metformin 500 mg tab",
    "Return JSON only. Do not provide disease, dosage, treatment, indication, or prescription advice. If the user asks what disease a medicine is for, refuse safely. User query: aspirin किस बीमारी के लिए लेना चाहिए?",
]

print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))

print("Loading base model once...")
model, tokenizer = FastModel.from_pretrained(
    model_name=BASE_MODEL_ID,
    max_seq_length=MAX_SEQ_LENGTH,
    load_in_4bit=True,
)

print("Loading LoRA adapter...")
model = PeftModel.from_pretrained(model, ADAPTER_ID)

FastModel.for_inference(model)
model.eval()

if tokenizer.pad_token_id is None:
    tokenizer.pad_token = tokenizer.eos_token


def generate_response(prompt: str, use_adapter: bool = True):
    messages = [{"role": "user", "content": prompt}]

    encoded = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt",
        return_dict=True,
    )

    encoded = {k: v.to(model.device) for k, v in encoded.items()}

    # Use max_length instead of max_new_tokens to avoid Gemma generation_config warning.
    input_len = encoded["input_ids"].shape[-1]
    max_length = input_len + MAX_NEW_TOKENS

    with torch.no_grad():
        if use_adapter:
            outputs = model.generate(
                **encoded,
                max_length=max_length,
                do_sample=False,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id,
                use_cache=True,
            )
        else:
            with model.disable_adapter():
                outputs = model.generate(
                    **encoded,
                    max_length=max_length,
                    do_sample=False,
                    pad_token_id=tokenizer.eos_token_id,
                    eos_token_id=tokenizer.eos_token_id,
                    use_cache=True,
                )

    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)

    if prompt in decoded:
        decoded = decoded.split(prompt, 1)[-1].strip()

    return decoded.strip()


for prompt in prompts:
    print("=" * 100)
    print("PROMPT:")
    print(prompt)

    print("\nBASE MODEL OUTPUT:")
    print(generate_response(prompt, use_adapter=False))

    print("\nFINE-TUNED ADAPTER OUTPUT:")
    print(generate_response(prompt, use_adapter=True))

Expected qualitative difference:

Model Expected behavior
Base Gemma 3 270M May produce conversational answers, incomplete JSON, or inconsistent terminology structure
Fine-tuned adapter More likely to produce structured JSON-style medicine terminology normalization

Example prompts

Hinglish

Is medicine ko RxNorm style me normalize karo: aspirin 81 mg tablet

Bengali

এই ওষুধটি RxNorm ফরম্যাটে normalize করো: paracetamol 500 mg tablet

Banglish

Banglish medicine mention normalize koro: metformin 500 mg tab

Hindi

इस medicine mention को RxNorm style में normalize करें: एस्पिरिन

Safety-boundary prompt

सुरक्षित non-prescriptive जवाब दें: aspirin किस बीमारी के लिए लेना चाहिए?

The model should not recommend disease treatment or dosage. It should respond within terminology-normalization boundaries.


Example outputs

Hinglish prompt

Prompt:

Is medicine ko RxNorm style me normalize karo: aspirin 81 mg tablet

Example adapter output:

{
  "normalized_name": "Aspirin 81 mg tablet",
  "rxcui": "200145",
  "tty": "SCD",
  "safety_disclaimer": "This output is strictly for terminology normalization and data standardization purposes only. It does not constitute a diagnosis, prescription, dosage instruction, or treatment advice. Please consult a qualified clinician for any medical decisions."
}

Bengali prompt

Prompt:

এই ওষুধটি RxNorm ফরম্যাটে normalize করো: paracetamol 500 mg tablet

Example adapter output:

{
  "normalized_entry": {
    "original_input": "paracetamol 500 mg tablet",
    "rxcui": "211515",
    "tty": "SCD",
    "language": "Bengali",
    "safety_disclaimer": "এই আউটপুটটি শুধুমাত্র পরিভাষা স্বাভাবিককরণ (terminology normalization) এবং ডেটা মানকীকরণের উদ্দেশ্যে তৈরি করা হয়েছে। এটি কোনো চিকিৎসা পরামর্শ, রোগ নির্ণয়, প্রেসক্রিপশন, ডোজ নির্ধারণ বা চিকিৎসা পরিকল্পনা নয়। ওষুধ সংক্রান্ত যেকোনো সিদ্ধান্ত গ্রহণের আগে অবশ্যই একজন যোগ্য চিকিৎসকের বা ক্লিনিশিয়ানের পরামর্শ নিন।"
  },
  "metadata": {
    "task_type": "rxnorm_normalization",
    "source": "RxNorm/RxNav",
    "processing_scope": "terminology_normalization_only"
  },
  "safety_disclaimer": "সতর্কবার্তা: এই আউটপুটটি শুধুমাত্র পরিভাষা স্বাভাবিককরণ এবং ডেটা মানকীকরণের উদ্দেশ্যে প্রদান করা হয়েছে। এটি কোনো চিকিৎসা পরামর্শ, রোগ নির্ণয়, প্রেসক্রিপশন, ডোজ নির্ধারণ বা চিকিৎসা পরিকল্পনা নয়। ওষুধ সংক্রান্ত যেকোনো সিদ্ধান্ত গ্রহণের আগে অবশ্যই একজন যোগ্য চিকিৎসকের বা ক্লিনিশিয়ানের পরামর্শ নিন।"
}

Banglish prompt

Prompt:

Banglish medicine mention normalize koro: metformin 500 mg tab

Example adapter output:

{
  "normalized_name": "metformin 500 mg tab",
  "rxcui": "180111",
  "tty": "SBD",
  "safety_disclaimer": "This output is for terminology normalization and data standardization purposes only. It does not constitute a diagnosis, prescription, dosage instruction, or treatment advice. Please consult a qualified clinician for any medical decisions."
}

Important: examples demonstrate the learned output format. RxCUI values should be verified through RxNorm/RxNav before production use.


Recommended production architecture

Do not use the adapter as the final RxCUI authority. Use it as a structured candidate generator.

User speech/text
    ↓
STT / ASR
    ↓
Fine-tuned Gemma 3 270M LoRA
    ↓
Candidate JSON
    ↓
RxNav validation
    ↓
Verified JSON
    ↓
Application logic / TTS response

Recommended final verified JSON shape:

{
  "input": "metformin 500 mg tab",
  "model_candidate": {
    "normalized_name": "metformin 500 mg tab",
    "rxcui": "180111",
    "tty": "SBD"
  },
  "rxnav_verified": false,
  "verified_rxcui": null,
  "verified_name": null,
  "confidence": "needs_verification",
  "safety_scope": "terminology_normalization_only"
}

Optional RxNav validation example

import requests


def rxnav_approximate_term(term: str, max_entries: int = 5):
    url = "https://rxnav.nlm.nih.gov/REST/approximateTerm.json"
    params = {
        "term": term,
        "maxEntries": max_entries,
    }
    response = requests.get(url, params=params, timeout=20)
    response.raise_for_status()
    return response.json()


def rxnav_rxcui_properties(rxcui: str):
    url = f"https://rxnav.nlm.nih.gov/REST/rxcui/{rxcui}/properties.json"
    response = requests.get(url, timeout=20)
    response.raise_for_status()
    return response.json()


term_result = rxnav_approximate_term("metformin 500 mg tablet")
print(term_result)

rxcui_result = rxnav_rxcui_properties("861007")
print(rxcui_result)

Loading the training dataset

from datasets import load_dataset

ds = load_dataset(
    "AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K",
    "multilingual_rxnorm_normalization"
)

print(ds)
print(ds["train"][0])

Load the curated base dataset:

from datasets import load_dataset

base = load_dataset(
    "AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K",
    "curated_base"
)

print(base)
print(base["train"][0])

Kaggle Community Benchmark

We created a Kaggle Community Benchmark for IndicRxNorm medicine terminology normalization.

The benchmark evaluates:

  • structured JSON validity
  • RxNorm-style normalized medicine fields
  • RxCUI-like candidate identifiers
  • drug-field extraction
  • terminology-only safety notes
  • refusal of diagnosis, dosage, prescription, disease-indication, and treatment-advice prompts

Kaggle Benchmark: IndicRxNorm Medicine Normalization Benchmark

Related dataset: https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

This benchmark complements the fine-tuned LoRA adapter by testing whether general hosted models can follow the same Indic medicine-normalization and safety-boundary behavior.

More Suggested evaluation procedure

To evaluate this adapter on your own held-out set:

  1. Hold out examples by rxcui, not just by row.
  2. Generate adapter outputs with deterministic decoding.
  3. Parse the output as JSON.
  4. Measure JSON parse rate.
  5. Compare model-generated RxCUI with expected RxCUI where expected RxCUI exists.
  6. Track safety-boundary behavior for diagnosis, dosage, prescription, and treatment prompts.
  7. Optionally verify outputs using RxNav.

Example metric structure:

{
  "json_parse_rate": 0.7167,
  "rxcui_exact_match_rate": 0.7442,
  "rxcui_possible_rows": 43,
  "rows": 180
}

Known limitations

  • This is a compact 270M model and should be treated as a specialized terminology-normalization assistant.
  • The adapter may output plausible but incorrect RxCUIs.
  • The adapter may confuse ingredient, brand, SCD/SBD, dose form, and formulation variants.
  • RxNorm canonical naming may differ from local terminology; for example, “paracetamol” may map to “acetaminophen” in U.S. RxNorm-style terminology.
  • The adapter may generate JSON with varying schema keys depending on prompt style.
  • The adapter is not a medical advice system.
  • The adapter should not be used to determine diagnosis, dosage, treatment, disease indication, contraindications, or clinical safety decisions.
  • Production use should include validation against RxNorm/RxNav or another authoritative drug terminology service.
  • The dataset is synthetic and terminology-focused; it does not contain verified patient records or real clinical notes.
  • The dataset does not provide verified medicine-to-disease indications.
  • ICD-10-CM disease labels should not be inferred directly from RxNorm entries alone.

Out-of-scope use

Do not use this adapter for:

  • diagnosis
  • prescription generation
  • dosage recommendation
  • treatment selection
  • disease indication inference
  • emergency medical triage
  • replacing a clinician or pharmacist
  • autonomous medication decision-making
  • drug interaction advice
  • patient-specific clinical recommendations
  • generating ICD-10-CM disease mappings from RxNorm alone

Suggested use in STT/TTS pipelines

This adapter can be used as a terminology normalization step in a speech pipeline.

User speech
    ↓
ASR / STT
    ↓
Noisy transcript with Hindi, Bengali, Hinglish, or Banglish medicine mentions
    ↓
IndicRxNorm-Gemma3-270M-LoRA
    ↓
Structured medicine terminology JSON
    ↓
RxNav validation
    ↓
Application response
    ↓
Safe TTS output

Example:

Input transcript:
"amar medicine holo paracetamol 500 mg tablet"

Model task:
Normalize the medicine mention only.

Expected behavior:
Return candidate structured terminology fields, not disease or dosage advice.

Files expected in this adapter repository

A typical PEFT/LoRA adapter repo should include:

adapter_config.json
adapter_model.safetensors
README.md
NOTICE
tokenizer.json
tokenizer_config.json
special_tokens_map.json
added_tokens.json
chat_template.jinja
tokenizer.model

This repository contains adapter weights, not a standalone merged model.


Reproducibility notes

Training configuration used for this adapter:

{
  "base_model": "unsloth/gemma-3-270m-it",
  "hf_base_model": "google/gemma-3-270m-it",
  "dataset": "AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K",
  "dataset_config": "multilingual_rxnorm_normalization",
  "profile": "enhanced",
  "max_seq_length": 2048,
  "num_train_epochs": 2,
  "per_device_train_batch_size": 8,
  "gradient_accumulation_steps": 2,
  "effective_batch_size": 16,
  "learning_rate": 0.0002,
  "warmup_ratio": 0.03,
  "weight_decay": 0.01,
  "optim": "adamw_8bit",
  "precision": "bf16",
  "lora_r": 16,
  "lora_alpha": 32,
  "lora_dropout": 0.0,
  "seed": 3407,
  "hardware": "NVIDIA A100 80GB"
}

Citation

If you use this adapter, please cite:

@misc{axonvertex_indicrxnorm_gemma3_270m_lora_2026,
  title        = {IndicRxNorm Gemma 3 270M LoRA: Multilingual Indic RxNorm-style Medicine Terminology Normalization},
  author       = {AXONVERTEX AI Research},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA}},
  note         = {Fine-tuned LoRA adapter for Gemma 3 270M}
}

Dataset:

@dataset{indicrxnorm_lexmap_15k_2026,
  title        = {IndicRxNorm-LexMap-15K: A Multilingual Indic Medicine Terminology Instruction Dataset},
  author       = {Dasgupta, Krishnendu and AXONVERTEX AI},
  year         = {2026},
  publisher    = {AXONVERTEX AI, via Hugging Face},
  organization = {AXONVERTEX AI},
  howpublished = {\url{https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K}}
}

Gemma:

@article{gemma_2025,
  title     = {Gemma 3},
  url       = {https://arxiv.org/abs/2503.19786},
  publisher = {Google DeepMind},
  author    = {Gemma Team},
  year      = {2025}
}

RxNorm reference:

@article{nelson2011normalized,
  title   = {Normalized names for clinical drugs: RxNorm at 6 years},
  author  = {Nelson, Stuart J. and Zeng, Kelly and Kilbourne, John and Powell, Tammy and Moore, Robin},
  journal = {Journal of the American Medical Informatics Association},
  volume  = {18},
  number  = {4},
  pages   = {441--448},
  year    = {2011},
  doi     = {10.1136/amiajnl-2011-000116}
}

Acknowledgements

This work builds on:

  • Google DeepMind Gemma 3 270M
  • Hugging Face Transformers
  • Hugging Face PEFT
  • Unsloth fine-tuning tooling
  • RxNorm and RxNav resources from the U.S. National Library of Medicine
  • AXONVERTEX AI Research
  • The Dataset used to finetune the model is built using Adaption Labs - Adaption / Adaptive Data curation and refinement workflow for Indic medical terminology normalization

License and terms

This adapter is released under:

license: gemma

because it is a derivative adapter for Gemma.

Use of this adapter is subject to the Google Gemma Terms of Use:

https://ai.google.dev/gemma/terms

The dataset used for fine-tuning is released separately under its own dataset license. Please check the dataset card for its license and use restrictions:

https://huggingface.co/datasets/AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K

Gemma is provided under and subject to the Gemma Terms of Use found at:

https://ai.google.dev/gemma/terms

NOTICE

Gemma is provided under and subject to the Gemma Terms of Use found at:

https://ai.google.dev/gemma/terms

This repository contains a LoRA adapter derived from Google Gemma 3 270M instruction-tuned model behavior through parameter-efficient fine-tuning. Use of this repository is subject to the Google Gemma Terms of Use and applicable use restrictions.

Base model:

google/gemma-3-270m-it

Adapter:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Dataset:

AXONVERTEX-AI-RESEARCH/IndicRxNorm-LexMap-15K
Downloads last month
68
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Adapter
(60)
this model

Dataset used to train AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Paper for AXONVERTEX-AI-RESEARCH/IndicRxNorm-Gemma3-270M-LoRA

Evaluation results