persadian-Nano-V4
MoE architecture optimized for T4-class GPUs
Model Details
- Architecture: Mixture of Experts (8 experts) + Adaptive Hyper-Connections + Compressed Sparse Attention
- Parameters: ~160M
- Context Length: 8,192 tokens
- Target Hardware: T4 / consumer-class GPUs
- Inference Focus: Lightweight active-path computation for research environments
Three Novel Innovations
- Adaptive Hyper-Connections - Input-dependent routing weights (not fixed Sinkhorn)
- Progressive Expert Activation - Starts with 1 expert, grows to 2 during inference
- Online Compressed KV Cache - Adaptive compression based on sequence length
| Feature | Persadian-Nano-V4 |
|---|---|
| Hyper-Connections | ✅Adaptive input-dependent routing |
| Expert Activation | ✅Progressive expert scaling during inference |
| KV Cache | ✅Online adaptive KV compression |
| Attention Design | ✅Compressed Sparse Hybrid Attention |
| MoE Routing | ✅Dynamic progressive routing |
| Context Optimization | ✅Colab-optimized memory efficiency |
| Hardware Requirement | ✅Optimized for single-GPU research environments |
| Parameter Count | ✅~160M parameters |
| Active Compute | ✅Lightweight active-path compute |
| Deployment Target | ✅Prosumer laptops + edge GPUs |
| Training Accessibility | ✅Independent researchers & startups |
| Training Cost | ✅Near-zero using T4 GPU |
| Research Direction | ✅Experimental open nano-architecture |
| Inference Efficiency | ✅Optimized for constrained hardware |
| Innovation Focus | ✅Efficiency-first with adaptive systems |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
# The 'trust_remote_code=True' flag is essential for custom models
model = AutoModelForCausalLM.from_pretrained(
"persadian/persadian-nano-v4",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("persadian/persadian-nano-v4")
# Move model to GPU if available
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Generate text
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
@misc{persadian2026nano,
author = {Persadh, Darshani},
title = {persadian-Nano-V4},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/persadian/persadian-Nano-V4}
}
- Downloads last month
- 195
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support