persadian-Nano-V4

MoE architecture optimized for T4-class GPUs

Model Details

Architecture: Mixture of Experts (8 experts) + Adaptive Hyper-Connections + Compressed Sparse Attention
Parameters: ~160M
Context Length: 8,192 tokens
Target Hardware: T4 / consumer-class GPUs
Inference Focus: Lightweight active-path computation for research environments

Three Novel Innovations

Adaptive Hyper-Connections - Input-dependent routing weights (not fixed Sinkhorn)
Progressive Expert Activation - Starts with 1 expert, grows to 2 during inference
Online Compressed KV Cache - Adaptive compression based on sequence length

Feature	Persadian-Nano-V4
Hyper-Connections	✅Adaptive input-dependent routing
Expert Activation	✅Progressive expert scaling during inference
KV Cache	✅Online adaptive KV compression
Attention Design	✅Compressed Sparse Hybrid Attention
MoE Routing	✅Dynamic progressive routing
Context Optimization	✅Colab-optimized memory efficiency
Hardware Requirement	✅Optimized for single-GPU research environments
Parameter Count	✅~160M parameters
Active Compute	✅Lightweight active-path compute
Deployment Target	✅Prosumer laptops + edge GPUs
Training Accessibility	✅Independent researchers & startups
Training Cost	✅Near-zero using T4 GPU
Research Direction	✅Experimental open nano-architecture
Inference Efficiency	✅Optimized for constrained hardware
Innovation Focus	✅Efficiency-first with adaptive systems

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
# The 'trust_remote_code=True' flag is essential for custom models
model = AutoModelForCausalLM.from_pretrained(
    "persadian/persadian-nano-v4",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("persadian/persadian-nano-v4")

# Move model to GPU if available
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Generate text
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{persadian2026nano,
  author = {Persadh, Darshani},
  title = {persadian-Nano-V4},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/persadian/persadian-Nano-V4}
}

Downloads last month: 195

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

persadian
/

persadian-nano-v4

persadian-Nano-V4

Model Details

Three Novel Innovations

Usage

Citation

Space using persadian/persadian-nano-v4 1