Instructions to use prithivMLmods/Sombrero-Opus-14B-Elite13 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivMLmods/Sombrero-Opus-14B-Elite13 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="prithivMLmods/Sombrero-Opus-14B-Elite13") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Sombrero-Opus-14B-Elite13") model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Sombrero-Opus-14B-Elite13") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use prithivMLmods/Sombrero-Opus-14B-Elite13 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "prithivMLmods/Sombrero-Opus-14B-Elite13" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Sombrero-Opus-14B-Elite13", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/prithivMLmods/Sombrero-Opus-14B-Elite13
- SGLang
How to use prithivMLmods/Sombrero-Opus-14B-Elite13 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "prithivMLmods/Sombrero-Opus-14B-Elite13" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Sombrero-Opus-14B-Elite13", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "prithivMLmods/Sombrero-Opus-14B-Elite13" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Sombrero-Opus-14B-Elite13", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use prithivMLmods/Sombrero-Opus-14B-Elite13 with Docker Model Runner:
docker model run hf.co/prithivMLmods/Sombrero-Opus-14B-Elite13
Sombrero-Opus-14B-Elite13
Sombrero-Opus-14B-Elite13 builds upon the Qwen 2.5 14B modality architecture, elevating reasoning performance in mid- to large-scale models. This iteration focuses on enhancing general-purpose comprehension, structured intelligence, and interactive versatility. Fine-tuned with an advanced reasoning chain and carefully curated datasets, Elite13 offers improved contextual understanding, logical coherence, and multi-step problem-solving.
Key improvements include:
- Expanded Domain Fluency: Delivers refined general knowledge across disciplines for more accurate and coherent answers.
- Advanced Instruction Parsing: Enhanced capacity to interpret and execute complex instructions while preserving structure and clarity.
- Robust Prompt Flexibility: Excels in adapting to diverse interaction styles, from casual inquiries to formal requests.
- Extended Context Window: Handles up to 128K tokens of input and generates up to 8K tokens in a single output — ideal for detailed reasoning and expansive replies.
- Global Linguistic Range: Offers proficiency in 29+ languages, including English, Chinese, French, Spanish, Japanese, Arabic, and more.
Quickstart with Transformers
Use the following snippet to load and test the model using transformers and apply_chat_template:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/Sombrero-Opus-14B-Elite13"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "What are the key principles of general-purpose AI?"
messages = [
{"role": "system", "content": "You are a helpful assistant capable of answering a wide range of questions."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Intended Use
Cognitive Reasoning & General Q&A Designed to support high-level thinking and accurate responses across general domains.
Education & Research Support Suitable for generating study guides, academic summaries, and informative explanations.
Conversational Intelligence Powers AI assistants and chatbots with memory-aware, context-sensitive dialogues.
Cross-Language Communication Useful in multilingual environments for translation, communication, and content creation.
Data-Aware Structuring Capable of converting unstructured data into meaningful formats like JSON or tabular summaries.
Lengthy Content Generation Suitable for drafting articles, technical documents, or creative prose with sustained coherence.
Limitations
Resource-Intensive Execution Requires robust computational infrastructure (e.g., ≥48GB VRAM) to run efficiently.
Residual Biases Though tuned for neutrality, occasional bias may surface from inherited training data.
Creative Variability Creative outputs such as fiction or poetry may vary in quality and style coherence.
Lack of Real-Time Knowledge The model operates with a static knowledge base and lacks access to current world events.
Drift in Extended Outputs Long responses may introduce cumulative inaccuracies or lose focus over time.
Prompt Dependence Output quality is sensitive to the clarity and specificity of the initial prompt.
- Downloads last month
- 5
Model tree for prithivMLmods/Sombrero-Opus-14B-Elite13
Base model
Qwen/Qwen2.5-14B