KathirKs/fineweb-edu-hindi
Viewer • Updated • 347M • 1.01k • 8
Gemma-200M-hindi is a 200M parameter model trained from scratch in Hindi language using fineweb-edu-hindi. It uses the Gemma 2 architecture. The model is trained using the v4-128 TPU chip provided by the TPU Research Cloud. The model is not sft trained.
The tokenizer is trained from scratch Gemma-hindi-tokenizer
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "KathirKs/Gemma-200M-hindi"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
input_text = '''कुछ क्षेत्रों में यह अनुमान लगाया गया है कि लगभग 30 में से एक व्यक्ति कुष्ठ रोग से संक्रमित था'''
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
# Pure sampling configuration
outputs = model.generate(
input_ids=input_ids.input_ids,
attention_mask=input_ids.attention_mask,
max_new_tokens=100,
do_sample=True, # Enables sampling
temperature=1.5, # Standard temperature for balanced randomness
top_k=1000, # No top-k filtering
top_p=1.0 # No nucleus sampling
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
The levanter codebase is used to train the model on TPUs.
The model may produce some inappropriate context. Please report if any such content is found.
For any queries, write to Kathir