persadian-Nano-V4

MoE architecture optimized for T4-class GPUs

Model Details

  • Architecture: Mixture of Experts (8 experts) + Adaptive Hyper-Connections + Compressed Sparse Attention
  • Parameters: ~160M
  • Context Length: 8,192 tokens
  • Target Hardware: T4 / consumer-class GPUs
  • Inference Focus: Lightweight active-path computation for research environments

Three Novel Innovations

  1. Adaptive Hyper-Connections - Input-dependent routing weights (not fixed Sinkhorn)
  2. Progressive Expert Activation - Starts with 1 expert, grows to 2 during inference
  3. Online Compressed KV Cache - Adaptive compression based on sequence length
Feature Persadian-Nano-V4
Hyper-Connections ✅Adaptive input-dependent routing
Expert Activation ✅Progressive expert scaling during inference
KV Cache ✅Online adaptive KV compression
Attention Design ✅Compressed Sparse Hybrid Attention
MoE Routing ✅Dynamic progressive routing
Context Optimization ✅Colab-optimized memory efficiency
Hardware Requirement ✅Optimized for single-GPU research environments
Parameter Count ✅~160M parameters
Active Compute ✅Lightweight active-path compute
Deployment Target ✅Prosumer laptops + edge GPUs
Training Accessibility ✅Independent researchers & startups
Training Cost ✅Near-zero using T4 GPU
Research Direction ✅Experimental open nano-architecture
Inference Efficiency ✅Optimized for constrained hardware
Innovation Focus ✅Efficiency-first with adaptive systems

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
# The 'trust_remote_code=True' flag is essential for custom models
model = AutoModelForCausalLM.from_pretrained(
    "persadian/persadian-nano-v4",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("persadian/persadian-nano-v4")

# Move model to GPU if available
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Generate text
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{persadian2026nano,
  author = {Persadh, Darshani},
  title = {persadian-Nano-V4},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/persadian/persadian-Nano-V4}
}
Downloads last month
195
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using persadian/persadian-nano-v4 1