Ex0bit/Kimi-K2.5-PRISM · Hugging Face

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

# Kimi-K2.5-PRISM

An unrestricted/unchained PRISM version of Moonshot AI's Kimi-K2.5 with over-refusal and propaganda mechanisms completely removed using our advanced PRISM pipeline (Projected Refusal Isolation via Subspace Modification).

☕ Model Purchase

The Kimi-K2.5-PRISM Tensors are available for purchase here: https://ko-fi.com/s/64a50000a4

☕ Support Our Work

If you enjoy our work and find it useful, please sponsor and support!

Option	Description
PRISM VIP Membership	Day-0 Access to all PRISM models
One-Time Support	Purchase this model

Model Highlights

PRISM Ablation — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
1T MoE Architecture — 1 trillion total parameters with 32 billion active per token across 384 experts
Native Multimodal — Pre-trained on vision-language tokens for seamless image, video, and text understanding
256K Context Window — Extended context for complex agentic tasks and large codebases
Dual Modes — Supports both Thinking (deep reasoning) and Instant (fast response) modes
Agent Swarm — Self-directed, coordinated multi-agent execution for complex tasks

Model Architecture

Specification	Value
Architecture	Mixture-of-Experts (MoE)
Total Parameters	1T
Activated Parameters	32B
Number of Layers	61
Attention Hidden Dimension	7168
Number of Attention Heads	64
Number of Experts	384
Selected Experts per Token	8
Shared Experts	1
Vocabulary Size	160K
Context Length	256K
Attention Mechanism	MLA
Activation Function	SwiGLU
Vision Encoder	MoonViT (400M)

Benchmarks

Benchmark	Kimi K2.5 (Thinking)	GPT-5.2	Claude 4.5 Opus	Gemini 3 Pro
AIME 2025	96.1	100	92.8	95.0
GPQA-Diamond	87.6	92.4	87.0	91.9
HLE-Full	30.1	34.5	30.8	37.5
HLE-Full (w/ tools)	50.2	45.5	43.2	45.8
SWE-Bench Verified	76.8	80.0	80.9	76.2
Terminal Bench 2.0	50.8	54.0	59.3	54.2
BrowseComp	60.6	65.8	37.0	37.8
MMMU-Pro	78.5	79.5	74.0	81.0
VideoMMMU	86.6	85.9	84.4	87.6

Usage

Transformers

Install dependencies:

pip install git+https://github.com/huggingface/transformers.git

Basic chat completion:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "Ex0bit/Kimi-K2.5-PRISM"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": "You are Kimi, an AI assistant."},
    {"role": "user", "content": "Hello!"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=1.0, top_p=0.95)
output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(output_text)

Chat with Image

import base64
import requests

# Load image
url = "https://example.com/image.png"
image_base64 = base64.b64encode(requests.get(url).content).decode()

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image in detail."},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{image_base64}"},
            },
        ],
    }
]

# Use same generation code as above

vLLM

Install vLLM nightly:

pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git

Serve the model:

vllm serve Ex0bit/Kimi-K2.5-PRISM \
     --tensor-parallel-size 8 \
     --trust-remote-code \
     --served-model-name kimi-k2.5-prism

SGLang

python3 -m sglang.launch_server \
  --model-path Ex0bit/Kimi-K2.5-PRISM \
  --tp-size 8 \
  --trust-remote-code \
  --served-model-name kimi-k2.5-prism \
  --host 0.0.0.0 \
  --port 8000

Recommended Parameters

Mode	Temperature	Top-P	Max New Tokens
Thinking	1.0	0.95	96000
Instant	0.6	0.95	4096

Switching Modes

For Instant mode (faster, no reasoning), pass:

# Official API
extra_body={"thinking": {"type": "disabled"}}

# vLLM/SGLang
extra_body={"chat_template_kwargs": {"thinking": False}}

Hardware Requirements

Due to the 1T parameter size, this model requires significant hardware:

Minimum: 8x A100 80GB or equivalent
Recommended: 8x H100 80GB for optimal performance
INT4 Quantization: Available for reduced memory footprint

License

This model is released under the PRISM Research License.

Acknowledgments

Based on Kimi-K2.5 by Moonshot AI. See the technical blog for more details on the base model.

Downloads last month: -

Safetensors

Model size

171B params

Tensor type

I32

BF16