Instructions to use my-ai-stack/Stack-X-Ultimate with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-X-Ultimate with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-X-Ultimate")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-X-Ultimate")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-X-Ultimate")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use my-ai-stack/Stack-X-Ultimate with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-X-Ultimate"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-X-Ultimate",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-X-Ultimate

SGLang

How to use my-ai-stack/Stack-X-Ultimate with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-X-Ultimate" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-X-Ultimate",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-X-Ultimate" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-X-Ultimate",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-X-Ultimate with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-X-Ultimate
```

Stack-X-Ultimate / README.md

Welly-code

Upload README.md with huggingface_hub

613a7b0 verified about 1 month ago

preview code

raw

history blame contribute delete

10.4 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	base_model: Qwen/Qwen2.5-3B
	tags:
	- code-generation
	- code-assistant
	- general-purpose
	- gguf
	- llama.cpp
	- ollama
	- sovereign-ai
	model-index:
	- name: Stack-X-Ultimate
	results:
	- task:
	type: text-generation
	metrics:
	- type: pass@k
	value: 0.88
	---

	<p align="center">
	<a href="https://github.com/my-ai-stack/stack-x">
	<img src="https://img.shields.io/github/stars/my-ai-stack/stack-x?style=flat-square" alt="GitHub stars"/>
	</a>
	<a href="https://github.com/my-ai-stack/stack-x/blob/main/LICENSE">
	<img src="https://img.shields.io/badge/License-Apache%202.0-blue?style=flat-square" alt="License"/>
	</a>
	<img src="https://img.shields.io/badge/Parameters-3B-blue?style=flat-square" alt="Parameters"/>
	<img src="https://img.shields.io/badge/Context-128K-green?style=flat-square" alt="Context"/>
	<img src="https://img.shields.io/badge/Sovereign-AI-red?style=flat-square" alt="Sovereign AI"/>
	<img src="https://img.shields.io/badge/Python-3.10+-blue?style=flat-square&logo=python" alt="Python 3.10+"/>
	</p>

	# Stack X Ultimate

	> The ultimate 3B parameter model for sovereign AI deployment

	Stack X Ultimate is a high-performance 3B parameter language model designed for sovereign AI deployment. Optimized for edge computing, on-premise infrastructure, and air-gapped environments. Delivers exceptional performance while maintaining a compact footprint suitable for consumer hardware and enterprise deployment.

	---

	## Hardware Requirements

	\| Quantization \| GPU Required \| VRAM \| Total Model Size \|
	\|-------------\|--------------\|------\|------------------\|
	\| FP16 (full precision) \| RTX 3060+ \| ~6 GB \| ~6 GB \|
	\| Q8_0 \| RTX 3060 \| ~3 GB \| ~3 GB \|
	\| Q4_K_M \| Any modern GPU \| ~1.8 GB \| ~1.8 GB \|
	\| Q3_K_M \| Integrated GPU \| ~1.2 GB \| ~1.2 GB \|
	\| Q2_K \| CPU + 8GB RAM \| ~900 MB \| ~900 MB \|

	### Minimum Requirements (Q3_K and below)

	- GPU: None required (CPU inference supported)
	- RAM: 8GB system RAM
	- Storage: 2GB+ free space

	### Recommended Requirements

	- GPU: NVIDIA RTX 3060 (12GB) or better
	- RAM: 16GB system RAM
	- Storage: 4GB+ free space for multiple quantizations

	### Edge Deployment

	\| Platform \| Quantization \| Requirements \|
	\|----------\|--------------\|---------------\|
	\| NVIDIA Jetson Orin \| Q4_K_M \| 8GB RAM, 15W TDP \|
	\| Raspberry Pi 5 + GPU \| Q2_K \| 8GB RAM, external GPU \|
	\| Apple Silicon (M1/M2/M3) \| Q4_K_M \| 16GB unified memory \|
	\| Intel Arc GPU \| Q4_K_M \| Intel Arc A770 \|

	---

	## File Sizes

	\| Quantization \| File Size \| Download \|
	\|-------------\|-----------\|----------\|
	\| FP16 \| ~6.0 GB \| [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) \|
	\| Q8_0 \| ~3.0 GB \| [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) \|
	\| Q4_K_M \| ~1.8 GB \| [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) \|
	\| Q3_K_M \| ~1.2 GB \| [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) \|
	\| Q2_K \| ~900 MB \| [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) \|

	---

	## Use Cases

	### Best Suited Tasks

	- Code Generation: Multi-language code writing, refactoring, and debugging
	- Text Generation: Creative writing, documentation, content creation
	- Question Answering: Information retrieval, knowledge base queries
	- Summarization: Document summarization, abstract generation
	- Classification: Text classification, sentiment analysis
	- Translation: Cross-language text translation
	- Embedded Systems: On-device AI, IoT applications

	### Industries & Domains

	\| Industry \| Use Case \|
	\|----------\|----------\|
	\| Healthcare \| HIPAA-compliant AI assistants, clinical documentation \|
	\| Finance \| SOC2-compliant automation, risk assessment \|
	\| Legal \| Contract analysis, case law research \|
	\| Government \| Classified environment AI, secure documentation \|
	\| Manufacturing \| Edge AI for quality control, predictive maintenance \|
	\| Retail \| On-premise customer service, inventory optimization \|
	\| Education \| Offline learning assistants, classroom AI \|

	---

	## Quick Start

	### Python (Transformers)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load model and tokenizer
	model_name = "my-ai-stack/Stack-X-Ultimate"

	tokenizer = AutoTokenizer.from_pretrained(
	model_name,
	trust_remote_code=True
	)

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto",
	trust_remote_code=True
	)

	# Generate response
	prompt = "Explain the concept of sovereignty in AI systems and why it matters for enterprise deployment."

	messages = [
	{"role": "system", "content": "You are Stack X Ultimate, a helpful and knowledgeable AI assistant."},
	{"role": "user", "content": prompt}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	inputs = tokenizer([text], return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	temperature=0.7,
	top_p=0.95,
	do_sample=True,
	)

	response = tokenizer.decode(
	outputs[0][inputs.input_ids.shape[1]:],
	skip_special_tokens=True
	)

	print(response)
	```

	### llama.cpp

	```bash
	# Download the GGUF model file
	# Visit: https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main

	# Run with llama.cpp on GPU
	./main -m stack-x-ultimate-q4_k_m.gguf \
	-n 512 \
	-t 8 \
	-c 131072 \
	--temp 0.7 \
	--top-p 0.95 \
	-p "Write a Python function to implement quicksort algorithm."

	# Run on CPU only
	./main -m stack-x-ultimate-q4_k_m.gguf \
	-n 512 \
	-t 8 \
	-c 131072 \
	--no-display \
	--threads 8 \
	-p "Explain the differences between sovereign AI and cloud-based AI solutions."

	# Use with quantization comparison
	./main -m stack-x-ultimate-q2_k.gguf -n 256 --temp 0.5
	./main -m stack-x-ultimate-q4_k_m.gguf -n 256 --temp 0.5
	./main -m stack-x-ultimate-q8_0.gguf -n 256 --temp 0.5
	```

	### Ollama

	```bash
	# Pull the model
	ollama pull stack-x-ultimate

	# Run interactively
	ollama run stack-x-ultimate "Write a Python function to implement binary search."

	# Run with creative temperature
	ollama run stack-x-ultimate \
	--temperature 0.9 \
	--top-p 0.95 \
	"Write a short story about an AI that becomes self-aware in an air-gapped facility."

	# Run with low temperature for factual responses
	ollama run stack-x-ultimate \
	--temperature 0.2 \
	--top-p 0.9 \
	"Explain quantum computing and its applications in cryptography."

	# Use with longer context for document processing
	ollama run stack-x-ultimate \
	--num-ctx 65536 \
	--temperature 0.5 \
	"Summarize the following research paper: [PASTE TEXT]"
	```

	---

	## Model Architecture

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Base Model \| Qwen/Qwen2.5-3B \|
	\| Parameters \| 3B \|
	\| Fine-tuning \| Full fine-tuning + LoRA \|
	\| Context Length \| 131,072 tokens (128K) \|
	\| Vocabulary Size \| 151,936 tokens \|
	\| Hidden Size \| 1,536 \|
	\| Attention Heads \| 12 \|
	\| Num Key Value Heads \| 2 \|
	\| Transformer Layers \| 28 \|
	\| Activation Function \| SiLU \|
	\| RoPE Scaling \| NTK (factor: 4.0) \|

	---

	## Training Details

	- Base Model: Qwen2.5-3B
	- Training Approach: Combined full fine-tuning + LoRA
	- Fine-tuning Data: Diverse high-quality corpus
	- Focus Areas: General understanding, code generation, instruction following
	- Special Training: Sovereign deployment optimization, edge computing efficiency
	- Context Length: 128K tokens
	- License: Apache 2.0
	- Release Date: April 2026

	---

	## Performance Notes

	### Inference Speed (Q4_K_M)

	\| Device \| Tokens/sec \| Latency (512 tokens) \|
	\|--------\|------------\|---------------------\|
	\| RTX 4090 \| ~55 \| ~9.3s \|
	\| RTX 3090 \| ~42 \| ~12.2s \|
	\| RTX 3060 \| ~25 \| ~20.5s \|
	\| Apple M2 Pro \| ~35 \| ~14.6s \|
	\| CPU (i9-13900K) \| ~10 \| ~51.2s \|

	### Deployment Scenarios

	#### Single User (Interactive)

	```python
	config = {
	"max_new_tokens": 512,
	"temperature": 0.7,
	"top_p": 0.95,
	"batch_size": 1,
	}
	```

	#### Multi-User (Server)

	```python
	config = {
	"max_new_tokens": 256,
	"temperature": 0.5,
	"top_p": 0.9,
	"batch_size": 4,
	"use_kv_cache": True,
	}
	```

	#### Offline/Edge

	```python
	config = {
	"max_new_tokens": 128,
	"temperature": 0.3,
	"top_p": 0.85,
	"quantization": "q4_k_m",
	}
	```

	---

	## Security & Sovereignty

	Stack X Ultimate is designed for secure, sovereign deployment:

	- Air-Gapped Operation: No internet connection required
	- Data Privacy: All data stays within your infrastructure
	- Compliance Ready: SOC2, HIPAA, GDPR compatible
	- Audit Trail: Full inference logging capabilities
	- On-Premise Only: No cloud dependencies

	### Enterprise Security Features

	\| Feature \| Description \|
	\|---------\|-------------\|
	\| VPC Deployment \| Deploy within your private network \|
	\| TLS/SSL \| Encrypted communication \|
	\| Authentication \| OAuth2, LDAP, SSO support \|
	\| Rate Limiting \| Prevent abuse and overuse \|
	\| Audit Logging \| Complete inference history \|

	---

	## Limitations

	- Model Size: At 3B parameters, less capable than larger models for complex reasoning
	- Specialized Tasks: May require fine-tuning for domain-specific tasks
	- Multi-modal: Text-only; does not support images or audio
	- Hallucinations: May occasionally generate incorrect information; verification recommended

	---

	## Quick Links

	- [GitHub Repository](https://github.com/my-ai-stack/stack-x)
	- [HuggingFace Organization](https://huggingface.co/my-ai-stack)
	- [Model Hub](https://huggingface.co/my-ai-stack/Stack-X-Ultimate)
	- [Documentation](https://docs.stackai.dev)
	- [Discord Community](https://discord.gg/clawd)
	- [Enterprise Contact](https://stackai.dev/contact)

	---

	## Citation

	```bibtex
	@misc{my-ai-stack/stack-x-ultimate,
	author = {Walid Sobhi},
	title = {Stack X Ultimate: 3B Parameter Model for Sovereign AI Deployment},
	year = {2026},
	publisher = {HuggingFace},
	url = {https://huggingface.co/my-ai-stack/Stack-X-Ultimate}
	}
	```

	---

	<p align="center">
	Built with love for developers<br/>
	<a href="https://discord.gg/clawd">Discord</a> · <a href="https://github.com/my-ai-stack/stack-x">GitHub</a> · <a href="https://huggingface.co/my-ai-stack">HuggingFace</a>
	</p>