Instructions to use prithivMLmods/Sombrero-Opus-14B-Elite13 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Sombrero-Opus-14B-Elite13 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Sombrero-Opus-14B-Elite13")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Sombrero-Opus-14B-Elite13")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Sombrero-Opus-14B-Elite13")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Sombrero-Opus-14B-Elite13 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Sombrero-Opus-14B-Elite13"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Sombrero-Opus-14B-Elite13",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Sombrero-Opus-14B-Elite13

SGLang

How to use prithivMLmods/Sombrero-Opus-14B-Elite13 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Sombrero-Opus-14B-Elite13" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Sombrero-Opus-14B-Elite13",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Sombrero-Opus-14B-Elite13" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Sombrero-Opus-14B-Elite13",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Sombrero-Opus-14B-Elite13 with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Sombrero-Opus-14B-Elite13
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Sombrero-Opus-14B-Elite13

Sombrero-Opus-14B-Elite13 builds upon the Qwen 2.5 14B modality architecture, elevating reasoning performance in mid- to large-scale models. This iteration focuses on enhancing general-purpose comprehension, structured intelligence, and interactive versatility. Fine-tuned with an advanced reasoning chain and carefully curated datasets, Elite13 offers improved contextual understanding, logical coherence, and multi-step problem-solving.

Key improvements include:

Expanded Domain Fluency: Delivers refined general knowledge across disciplines for more accurate and coherent answers.
Advanced Instruction Parsing: Enhanced capacity to interpret and execute complex instructions while preserving structure and clarity.
Robust Prompt Flexibility: Excels in adapting to diverse interaction styles, from casual inquiries to formal requests.
Extended Context Window: Handles up to 128K tokens of input and generates up to 8K tokens in a single output — ideal for detailed reasoning and expansive replies.
Global Linguistic Range: Offers proficiency in 29+ languages, including English, Chinese, French, Spanish, Japanese, Arabic, and more.

Quickstart with Transformers

Use the following snippet to load and test the model using transformers and apply_chat_template:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Sombrero-Opus-14B-Elite13"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "What are the key principles of general-purpose AI?"
messages = [
    {"role": "system", "content": "You are a helpful assistant capable of answering a wide range of questions."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Intended Use

Cognitive Reasoning & General Q&A Designed to support high-level thinking and accurate responses across general domains.
Education & Research Support Suitable for generating study guides, academic summaries, and informative explanations.
Conversational Intelligence Powers AI assistants and chatbots with memory-aware, context-sensitive dialogues.
Cross-Language Communication Useful in multilingual environments for translation, communication, and content creation.
Data-Aware Structuring Capable of converting unstructured data into meaningful formats like JSON or tabular summaries.
Lengthy Content Generation Suitable for drafting articles, technical documents, or creative prose with sustained coherence.

Limitations

Resource-Intensive Execution Requires robust computational infrastructure (e.g., ≥48GB VRAM) to run efficiently.
Residual Biases Though tuned for neutrality, occasional bias may surface from inherited training data.
Creative Variability Creative outputs such as fiction or poetry may vary in quality and style coherence.
Lack of Real-Time Knowledge The model operates with a static knowledge base and lacks access to current world events.
Drift in Extended Outputs Long responses may introduce cumulative inaccuracies or lose focus over time.
Prompt Dependence Output quality is sensitive to the clarity and specificity of the initial prompt.