Instructions to use kai-os/Carnice-V2-27b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kai-os/Carnice-V2-27b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="kai-os/Carnice-V2-27b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("kai-os/Carnice-V2-27b") model = AutoModelForImageTextToText.from_pretrained("kai-os/Carnice-V2-27b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use kai-os/Carnice-V2-27b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kai-os/Carnice-V2-27b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kai-os/Carnice-V2-27b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/kai-os/Carnice-V2-27b
- SGLang
How to use kai-os/Carnice-V2-27b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kai-os/Carnice-V2-27b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kai-os/Carnice-V2-27b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kai-os/Carnice-V2-27b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kai-os/Carnice-V2-27b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use kai-os/Carnice-V2-27b with Docker Model Runner:
docker model run hf.co/kai-os/Carnice-V2-27b
Heads-up: triple `language_model` prefix in safetensors keys (BF16 only — GGUF unaffected)
Hi kai-os, big fan of the Carnice/Hermes SFT — testing it for an NVFP4 variant on Blackwell.
While loading the BF16 weights (kai-os/Carnice-V2-27b) via HF transformers, every linear layer comes back as MISSING in the load report and an equal number of keys appear as UNEXPECTED. The model then runs with random weights and outputs gibberish. The GGUF variant is unaffected because convert_hf_to_gguf.py normalizes prefixes during conversion.
Root cause
The safetensors keys carry a triple language_model prefix:
# kai-os/Carnice-V2-27b shipped:
model.language_model.language_model.language_model.embed_tokens.weight
model.language_model.language_model.language_model.layers.0.input_layernorm.weight
model.language_model.visual.blocks.0.attn.proj.weight # also one extra here
# What Qwen3_5ForConditionalGeneration expects:
model.language_model.embed_tokens.weight
model.language_model.layers.0.input_layernorm.weight
model.visual.blocks.0.attn.proj.weight
Likely an artifact of an Unsloth wrapper getting serialized into the key path more than once during the merge.
Reproduction
from transformers import Qwen3_5ForConditionalGeneration
import torch
m = Qwen3_5ForConditionalGeneration.from_pretrained(
"kai-os/Carnice-V2-27b", dtype=torch.bfloat16, trust_remote_code=True)
# Any prompt → gibberish, because every linear is random-init.
model.safetensors.index.json:
kai-os/Carnice-V2-27b: 1184 keys, all with the extra prefixesQwen/Qwen3.6-27B(your declared base): 1199 keys with standard prefixes (the 15 extra are MTP, dropped during your merge — that's a separate matter)
Fix
A single-pass safetensors rewrite recovers the model:
def fix_key(k: str) -> str:
if k.startswith("model.language_model.language_model.language_model."):
return "model.language_model." + k[len("model.language_model.language_model.language_model."):]
if k.startswith("model.language_model.visual."):
return "model.visual." + k[len("model.language_model.visual."):]
return k
After this the model loads cleanly, IFEval results match your benchmark numbers, and Hermes-style tool calling works.
Why mention now
I've built an NVFP4 + MTP-grafted variant on top of the fixed weights for the RTX PRO 6000 / DGX Spark (GB10) crowd who want the Hermes agent at ~20 GB VRAM. Wanted to flag this here so anyone else loading the BF16 directly knows what's happening, and to credit you properly in the README of the downstream variant.
Thanks for the SFT — the assistant-token-only loss + the GLM-5.1 trace blend in the data mix really show through.
— Tonoken3 / Lna-Lab
Thanks for catching this. the core BF16 Transformers load bug should be resolved now.