Text Generation
Transformers
Safetensors
GGUF
English
qwen2
code-generation
code-assistant
general-purpose
llama.cpp
ollama
sovereign-ai
conversational
Eval Results (legacy)
text-generation-inference
Instructions to use my-ai-stack/Stack-X-Ultimate with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use my-ai-stack/Stack-X-Ultimate with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="my-ai-stack/Stack-X-Ultimate") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-X-Ultimate") model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-X-Ultimate") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use my-ai-stack/Stack-X-Ultimate with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "my-ai-stack/Stack-X-Ultimate" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-X-Ultimate", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/my-ai-stack/Stack-X-Ultimate
- SGLang
How to use my-ai-stack/Stack-X-Ultimate with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-X-Ultimate" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-X-Ultimate", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "my-ai-stack/Stack-X-Ultimate" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "my-ai-stack/Stack-X-Ultimate", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use my-ai-stack/Stack-X-Ultimate with Docker Model Runner:
docker model run hf.co/my-ai-stack/Stack-X-Ultimate
| language: | |
| - en | |
| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| base_model: Qwen/Qwen2.5-3B | |
| tags: | |
| - code-generation | |
| - code-assistant | |
| - general-purpose | |
| - gguf | |
| - llama.cpp | |
| - ollama | |
| - sovereign-ai | |
| model-index: | |
| - name: Stack-X-Ultimate | |
| results: | |
| - task: | |
| type: text-generation | |
| metrics: | |
| - type: pass@k | |
| value: 0.88 | |
| <p align="center"> | |
| <a href="https://github.com/my-ai-stack/stack-x"> | |
| <img src="https://img.shields.io/github/stars/my-ai-stack/stack-x?style=flat-square" alt="GitHub stars"/> | |
| </a> | |
| <a href="https://github.com/my-ai-stack/stack-x/blob/main/LICENSE"> | |
| <img src="https://img.shields.io/badge/License-Apache%202.0-blue?style=flat-square" alt="License"/> | |
| </a> | |
| <img src="https://img.shields.io/badge/Parameters-3B-blue?style=flat-square" alt="Parameters"/> | |
| <img src="https://img.shields.io/badge/Context-128K-green?style=flat-square" alt="Context"/> | |
| <img src="https://img.shields.io/badge/Sovereign-AI-red?style=flat-square" alt="Sovereign AI"/> | |
| <img src="https://img.shields.io/badge/Python-3.10+-blue?style=flat-square&logo=python" alt="Python 3.10+"/> | |
| </p> | |
| # Stack X Ultimate | |
| > The ultimate 3B parameter model for sovereign AI deployment | |
| Stack X Ultimate is a high-performance 3B parameter language model designed for sovereign AI deployment. Optimized for edge computing, on-premise infrastructure, and air-gapped environments. Delivers exceptional performance while maintaining a compact footprint suitable for consumer hardware and enterprise deployment. | |
| --- | |
| ## Hardware Requirements | |
| | Quantization | GPU Required | VRAM | Total Model Size | | |
| |-------------|--------------|------|------------------| | |
| | FP16 (full precision) | RTX 3060+ | ~6 GB | ~6 GB | | |
| | Q8_0 | RTX 3060 | ~3 GB | ~3 GB | | |
| | Q4_K_M | Any modern GPU | ~1.8 GB | ~1.8 GB | | |
| | Q3_K_M | Integrated GPU | ~1.2 GB | ~1.2 GB | | |
| | Q2_K | CPU + 8GB RAM | ~900 MB | ~900 MB | | |
| ### Minimum Requirements (Q3_K and below) | |
| - **GPU**: None required (CPU inference supported) | |
| - **RAM**: 8GB system RAM | |
| - **Storage**: 2GB+ free space | |
| ### Recommended Requirements | |
| - **GPU**: NVIDIA RTX 3060 (12GB) or better | |
| - **RAM**: 16GB system RAM | |
| - **Storage**: 4GB+ free space for multiple quantizations | |
| ### Edge Deployment | |
| | Platform | Quantization | Requirements | | |
| |----------|--------------|---------------| | |
| | NVIDIA Jetson Orin | Q4_K_M | 8GB RAM, 15W TDP | | |
| | Raspberry Pi 5 + GPU | Q2_K | 8GB RAM, external GPU | | |
| | Apple Silicon (M1/M2/M3) | Q4_K_M | 16GB unified memory | | |
| | Intel Arc GPU | Q4_K_M | Intel Arc A770 | | |
| --- | |
| ## File Sizes | |
| | Quantization | File Size | Download | | |
| |-------------|-----------|----------| | |
| | FP16 | ~6.0 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | | |
| | Q8_0 | ~3.0 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | | |
| | Q4_K_M | ~1.8 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | | |
| | Q3_K_M | ~1.2 GB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | | |
| | Q2_K | ~900 MB | [Download](https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main) | | |
| --- | |
| ## Use Cases | |
| ### Best Suited Tasks | |
| - **Code Generation**: Multi-language code writing, refactoring, and debugging | |
| - **Text Generation**: Creative writing, documentation, content creation | |
| - **Question Answering**: Information retrieval, knowledge base queries | |
| - **Summarization**: Document summarization, abstract generation | |
| - **Classification**: Text classification, sentiment analysis | |
| - **Translation**: Cross-language text translation | |
| - **Embedded Systems**: On-device AI, IoT applications | |
| ### Industries & Domains | |
| | Industry | Use Case | | |
| |----------|----------| | |
| | Healthcare | HIPAA-compliant AI assistants, clinical documentation | | |
| | Finance | SOC2-compliant automation, risk assessment | | |
| | Legal | Contract analysis, case law research | | |
| | Government | Classified environment AI, secure documentation | | |
| | Manufacturing | Edge AI for quality control, predictive maintenance | | |
| | Retail | On-premise customer service, inventory optimization | | |
| | Education | Offline learning assistants, classroom AI | | |
| --- | |
| ## Quick Start | |
| ### Python (Transformers) | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| # Load model and tokenizer | |
| model_name = "my-ai-stack/Stack-X-Ultimate" | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| model_name, | |
| trust_remote_code=True | |
| ) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype=torch.float16, | |
| device_map="auto", | |
| trust_remote_code=True | |
| ) | |
| # Generate response | |
| prompt = "Explain the concept of sovereignty in AI systems and why it matters for enterprise deployment." | |
| messages = [ | |
| {"role": "system", "content": "You are Stack X Ultimate, a helpful and knowledgeable AI assistant."}, | |
| {"role": "user", "content": prompt} | |
| ] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True | |
| ) | |
| inputs = tokenizer([text], return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=512, | |
| temperature=0.7, | |
| top_p=0.95, | |
| do_sample=True, | |
| ) | |
| response = tokenizer.decode( | |
| outputs[0][inputs.input_ids.shape[1]:], | |
| skip_special_tokens=True | |
| ) | |
| print(response) | |
| ``` | |
| ### llama.cpp | |
| ```bash | |
| # Download the GGUF model file | |
| # Visit: https://huggingface.co/my-ai-stack/Stack-X-Ultimate/tree/main | |
| # Run with llama.cpp on GPU | |
| ./main -m stack-x-ultimate-q4_k_m.gguf \ | |
| -n 512 \ | |
| -t 8 \ | |
| -c 131072 \ | |
| --temp 0.7 \ | |
| --top-p 0.95 \ | |
| -p "Write a Python function to implement quicksort algorithm." | |
| # Run on CPU only | |
| ./main -m stack-x-ultimate-q4_k_m.gguf \ | |
| -n 512 \ | |
| -t 8 \ | |
| -c 131072 \ | |
| --no-display \ | |
| --threads 8 \ | |
| -p "Explain the differences between sovereign AI and cloud-based AI solutions." | |
| # Use with quantization comparison | |
| ./main -m stack-x-ultimate-q2_k.gguf -n 256 --temp 0.5 | |
| ./main -m stack-x-ultimate-q4_k_m.gguf -n 256 --temp 0.5 | |
| ./main -m stack-x-ultimate-q8_0.gguf -n 256 --temp 0.5 | |
| ``` | |
| ### Ollama | |
| ```bash | |
| # Pull the model | |
| ollama pull stack-x-ultimate | |
| # Run interactively | |
| ollama run stack-x-ultimate "Write a Python function to implement binary search." | |
| # Run with creative temperature | |
| ollama run stack-x-ultimate \ | |
| --temperature 0.9 \ | |
| --top-p 0.95 \ | |
| "Write a short story about an AI that becomes self-aware in an air-gapped facility." | |
| # Run with low temperature for factual responses | |
| ollama run stack-x-ultimate \ | |
| --temperature 0.2 \ | |
| --top-p 0.9 \ | |
| "Explain quantum computing and its applications in cryptography." | |
| # Use with longer context for document processing | |
| ollama run stack-x-ultimate \ | |
| --num-ctx 65536 \ | |
| --temperature 0.5 \ | |
| "Summarize the following research paper: [PASTE TEXT]" | |
| ``` | |
| --- | |
| ## Model Architecture | |
| | Attribute | Value | | |
| |-----------|-------| | |
| | Base Model | Qwen/Qwen2.5-3B | | |
| | Parameters | 3B | | |
| | Fine-tuning | Full fine-tuning + LoRA | | |
| | Context Length | 131,072 tokens (128K) | | |
| | Vocabulary Size | 151,936 tokens | | |
| | Hidden Size | 1,536 | | |
| | Attention Heads | 12 | | |
| | Num Key Value Heads | 2 | | |
| | Transformer Layers | 28 | | |
| | Activation Function | SiLU | | |
| | RoPE Scaling | NTK (factor: 4.0) | | |
| --- | |
| ## Training Details | |
| - **Base Model**: Qwen2.5-3B | |
| - **Training Approach**: Combined full fine-tuning + LoRA | |
| - **Fine-tuning Data**: Diverse high-quality corpus | |
| - **Focus Areas**: General understanding, code generation, instruction following | |
| - **Special Training**: Sovereign deployment optimization, edge computing efficiency | |
| - **Context Length**: 128K tokens | |
| - **License**: Apache 2.0 | |
| - **Release Date**: April 2026 | |
| --- | |
| ## Performance Notes | |
| ### Inference Speed (Q4_K_M) | |
| | Device | Tokens/sec | Latency (512 tokens) | | |
| |--------|------------|---------------------| | |
| | RTX 4090 | ~55 | ~9.3s | | |
| | RTX 3090 | ~42 | ~12.2s | | |
| | RTX 3060 | ~25 | ~20.5s | | |
| | Apple M2 Pro | ~35 | ~14.6s | | |
| | CPU (i9-13900K) | ~10 | ~51.2s | | |
| ### Deployment Scenarios | |
| #### Single User (Interactive) | |
| ```python | |
| config = { | |
| "max_new_tokens": 512, | |
| "temperature": 0.7, | |
| "top_p": 0.95, | |
| "batch_size": 1, | |
| } | |
| ``` | |
| #### Multi-User (Server) | |
| ```python | |
| config = { | |
| "max_new_tokens": 256, | |
| "temperature": 0.5, | |
| "top_p": 0.9, | |
| "batch_size": 4, | |
| "use_kv_cache": True, | |
| } | |
| ``` | |
| #### Offline/Edge | |
| ```python | |
| config = { | |
| "max_new_tokens": 128, | |
| "temperature": 0.3, | |
| "top_p": 0.85, | |
| "quantization": "q4_k_m", | |
| } | |
| ``` | |
| --- | |
| ## Security & Sovereignty | |
| Stack X Ultimate is designed for secure, sovereign deployment: | |
| - **Air-Gapped Operation**: No internet connection required | |
| - **Data Privacy**: All data stays within your infrastructure | |
| - **Compliance Ready**: SOC2, HIPAA, GDPR compatible | |
| - **Audit Trail**: Full inference logging capabilities | |
| - **On-Premise Only**: No cloud dependencies | |
| ### Enterprise Security Features | |
| | Feature | Description | | |
| |---------|-------------| | |
| | VPC Deployment | Deploy within your private network | | |
| | TLS/SSL | Encrypted communication | | |
| | Authentication | OAuth2, LDAP, SSO support | | |
| | Rate Limiting | Prevent abuse and overuse | | |
| | Audit Logging | Complete inference history | | |
| --- | |
| ## Limitations | |
| - **Model Size**: At 3B parameters, less capable than larger models for complex reasoning | |
| - **Specialized Tasks**: May require fine-tuning for domain-specific tasks | |
| - **Multi-modal**: Text-only; does not support images or audio | |
| - **Hallucinations**: May occasionally generate incorrect information; verification recommended | |
| --- | |
| ## Quick Links | |
| - [GitHub Repository](https://github.com/my-ai-stack/stack-x) | |
| - [HuggingFace Organization](https://huggingface.co/my-ai-stack) | |
| - [Model Hub](https://huggingface.co/my-ai-stack/Stack-X-Ultimate) | |
| - [Documentation](https://docs.stackai.dev) | |
| - [Discord Community](https://discord.gg/clawd) | |
| - [Enterprise Contact](https://stackai.dev/contact) | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{my-ai-stack/stack-x-ultimate, | |
| author = {Walid Sobhi}, | |
| title = {Stack X Ultimate: 3B Parameter Model for Sovereign AI Deployment}, | |
| year = {2026}, | |
| publisher = {HuggingFace}, | |
| url = {https://huggingface.co/my-ai-stack/Stack-X-Ultimate} | |
| } | |
| ``` | |
| --- | |
| <p align="center"> | |
| Built with love for developers<br/> | |
| <a href="https://discord.gg/clawd">Discord</a> · <a href="https://github.com/my-ai-stack/stack-x">GitHub</a> · <a href="https://huggingface.co/my-ai-stack">HuggingFace</a> | |
| </p> |