Instructions to use anshullpal/ByteZira-Vaani-350M-pretrain-base-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use anshullpal/ByteZira-Vaani-350M-pretrain-base-model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="anshullpal/ByteZira-Vaani-350M-pretrain-base-model", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("anshullpal/ByteZira-Vaani-350M-pretrain-base-model", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use anshullpal/ByteZira-Vaani-350M-pretrain-base-model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "anshullpal/ByteZira-Vaani-350M-pretrain-base-model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anshullpal/ByteZira-Vaani-350M-pretrain-base-model",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/anshullpal/ByteZira-Vaani-350M-pretrain-base-model

SGLang

How to use anshullpal/ByteZira-Vaani-350M-pretrain-base-model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "anshullpal/ByteZira-Vaani-350M-pretrain-base-model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anshullpal/ByteZira-Vaani-350M-pretrain-base-model",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "anshullpal/ByteZira-Vaani-350M-pretrain-base-model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anshullpal/ByteZira-Vaani-350M-pretrain-base-model",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use anshullpal/ByteZira-Vaani-350M-pretrain-base-model with Docker Model Runner:
```
docker model run hf.co/anshullpal/ByteZira-Vaani-350M-pretrain-base-model
```

🇮🇳 ByteZira Vaani-350M

ByteZira Vaani-350M is a custom decoder-only Transformer language model trained completely from scratch using PyTorch and integrated into the Hugging Face ecosystem with a fully custom Transformers wrapper.

The model was developed by Anshul Pal under ByteZira Technologies and trained on approximately 3.3 billion tokens using a modern GPT-style architecture featuring:

RoPE positional embeddings
RMSNorm normalization
SwiGLU feed-forward layers
SDPA Attention
Flash Attention compatibility
KV-cache support
Weight tying
Gradient checkpointing

This is a pretrained foundation model and is not instruction-tuned yet.

🚀 Model Highlights

~350 Million Parameters
Trained on 3.3B Tokens
Custom GPT-style Architecture
Built Fully in PyTorch
Hugging Face Compatible
Flash Attention Ready
Modern LLM Components
Trained From Scratch

🏗️ Model Details

Property	Value
Model Name	ByteZira Vaani-350M
Parameters	~350 Million
Architecture	Custom Decoder-only Transformer
Training Tokens	3.3 Billion
Framework	PyTorch
HF Compatibility	Custom Transformers Wrapper
Developer	Anshul Pal
Organization	ByteZira Technologies

🧠 Architecture

Component	Details
Transformer Layers	24
Attention Heads	16
Embedding Size	1024
Context Length	768 Tokens
Vocabulary Size	50,257
Positional Encoding	RoPE
Normalization	RMSNorm
Feed Forward Network	SwiGLU
Attention	SDPA / Flash Attention Compatible
Weight Tying	Yes
Precision	FP16

📚 Training Data

The model was trained using a weighted mixture of large-scale web and educational datasets.

Dataset	Weight
HuggingFaceFW/fineweb (sample-10BT)	40%
HuggingFaceFW/fineweb-edu (sample-10BT)	30%
Wikimedia Wikipedia	30%
TinyStories + Book Corpus	5–10%
LexoraNLP/anshullpal	100%

⚙️ Training Configuration

Setting	Value
Optimizer	AdamW
Learning Rate	3e-4
Minimum LR	3e-5
Warmup Steps	51,200
LR Scheduler	Cosine Decay
Gradient Accumulation	128
Mixed Precision	FP16
Gradient Clipping	1.0

✨ Features

Custom Transformer Architecture
RoPE Positional Embeddings
RMSNorm
SwiGLU
SDPA Attention
Flash Attention Compatible
Hugging Face generate() Support
KV Cache Support
Gradient Checkpointing
Weight Tying

📊 Benchmark Results

Evaluated using the EleutherAI LM Evaluation Harness.

Task	Metric	Score
ARC Easy	Accuracy	0.3312
HellaSwag	Accuracy	0.2650
PIQA	Accuracy	0.5631

Notes

Results are from the pretrained base checkpoint.
This model is not instruction-tuned yet.
Future versions with larger token counts and instruction tuning are planned.

📦 Installation

pip install transformers torch accelerate

🔥 Usage

Load Model

from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM
)

model_id = "anshullpal/ByteZira-Vaani-350M-pretrain-base-model"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True
)

✍️ Text Generation

import torch

prompt = "India is a land of"

inputs = tokenizer(
    prompt,
    return_tensors="pt"
)

with torch.no_grad():

    outputs = model.generate(
        **inputs,

        max_new_tokens=80,

        temperature=0.45,

        top_p=0.82,

        top_k=40,

        repetition_penalty=1.35,

        no_repeat_ngram_size=4,

        do_sample=True,

        use_cache=False,

        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
    )

print(
    tokenizer.decode(
        outputs[0],
        skip_special_tokens=True
    )
)

🌐 Hugging Face Space

Try the live demo here:

👉 https://huggingface.co/spaces/anshullpal/Vaani-350M-Pretrain-Model

🔮 Future Plans

Instruction Tuned Version
Larger Context Length
1B+ Parameter Models
Better Tokenizer
Multilingual Training
Quantized Variants
Chat Optimized Models

⚠️ Limitations

Not instruction-tuned
Can generate hallucinations
Limited reasoning capability compared to larger LLMs
Primarily optimized for English text generation

📜 License

Apache-2.0 License

👨‍💻 Developer

Developed by Anshul Pal
Organization: ByteZira Technologies

⭐ Acknowledgements

Special thanks to:

Hugging Face
PyTorch
EleutherAI
FineWeb Dataset Contributors
Open-source AI Community

Downloads last month: 1,113

Safetensors

Model size

0.4B params

Tensor type

F32

anshullpal
/

ByteZira-Vaani-350M-pretrain-base-model