Instructions to use MultiverseComputingCAI/Hypernova-60B-2602 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MultiverseComputingCAI/Hypernova-60B-2602 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MultiverseComputingCAI/Hypernova-60B-2602")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2602")
model = AutoModelForCausalLM.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2602")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MultiverseComputingCAI/Hypernova-60B-2602 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MultiverseComputingCAI/Hypernova-60B-2602"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2602",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2602

SGLang

How to use MultiverseComputingCAI/Hypernova-60B-2602 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MultiverseComputingCAI/Hypernova-60B-2602" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2602",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MultiverseComputingCAI/Hypernova-60B-2602" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2602",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MultiverseComputingCAI/Hypernova-60B-2602 with Docker Model Runner:
```
docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2602
```

HyperNova 60B 2602

Powered by CompactifAI

Optimized for Efficient Inference · Reduced Memory Footprint · Native Tool Calling Support

Highlights
Model Overview
Key Characteristics
Quick Start
What's New in HyperNova 60B 2602
Tool Calling
Training & Fine-Tuning
Architecture
Evaluation & Benchmarks
Languages
Intended Use
Safety & Limitations
Model Information
Citation

Model Overview

HyperNova 60B 2602 is a model developed based on OpenAI’s gpt-oss-120b, developed by Multiverse Computing. The original gpt-oss-120b is an open-weight model (117B parameters, 5.1B active in MoE) designed for powerful reasoning, agentic tasks, and versatile developer use. This version is compressed with CompactifAI, Multiverse Computing’s proprietary technology, reducing parameter count and memory requirements while aiming to preserve strong reasoning.

The model is instruction-tuned and supports native tool calling (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with lower memory footprint and deployment flexibility.

Technical Deep Dive

For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B v2602, read this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.

Key Characteristics

Characteristic	Description
Base model	OpenAI gpt-oss-120b (117B params, MoE; open-weight, Apache 2.0)
🛠️ Tool calling	Native support; OpenAI-style function / tool calling schemas; agentic use (e.g. function calling, structured outputs)
🧠 Parameters	60B total parameters after CompactifAI compression (reduced vs. base 117B)
📐 Architecture	Decoder-only Transformer (from gpt-oss lineage)
🗜️ Compression	CompactifAI (proprietary compression technology)
Primary language	English
Other languages	Not formally evaluated

Quick Start

This model can be loaded with the Transformers API. Use trust_remote_code=True (required for the gpt-oss architecture). Recommended approach: AutoModelForCausalLM with apply_chat_template:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "MultiverseComputingCAI/HyperNova-60B-2602"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
messages = [{"role": "user", "content": "What is a Hypernova?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True,
)
inputs = inputs.to(model.device)
attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device)
outputs = model.generate(
    inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    attention_mask=attention_mask,
)
reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(reply)

Alternatively you can use the pipeline API with trust_remote_code=True; the pipeline returns the full conversation structure, so extract the assistant message from outputs[0]["generated_text"] as needed.

What’s New in HyperNova 60B 2602

HyperNova 60B 2602 is a model developed based on gpt-oss-120b, retaining the base model’s strengths while reducing memory and improving deployment flexibility.

Summary

Model developed based on gpt-oss-120b: Same Apache 2.0 license and design goals (reasoning, agentic tasks, tool use); smaller footprint via CompactifAI.
Tool use: Retains support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas).
Reasoning: Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis.
Evaluated on tool-focused benchmarks (e.g. BFCL v4, Tau2-bench) and general benchmarks alongside other CompactifAI and gpt-oss variants.

Tool Calling

HyperNova 60B 2602 supports native tool use and is well-suited for:

Function calling with defined schemas
Structured outputs
Agentic operations (e.g. browser tasks, code execution where supported)

The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows OpenAI-style schemas; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed.

Example Tool Call

{
  "name": "get_weather",
  "arguments": {
    "city": "Paris",
    "date": "2026-02-10"
  }
}

Training & Fine-Tuning

Base Model: gpt-oss-120b

The base model gpt-oss-120b was trained on OpenAI’s harmony response format and is intended for use with that format for correct behavior. It supports configurable reasoning levels (low / medium / high) and native tool use. See the original model card and arXiv:2508.10925 for details.

CompactifAI Compression & Optional Fine-Tuning

Compression: CompactifAI was applied to produce a smaller, efficient model (60B parameters) while aiming to preserve reasoning and tool-use capabilities.
Optional fine-tuning: This variant may include additional fine-tuning for tool calling and structured outputs; exact training details are model-specific.

Architecture

Model Specifications

Specification	Value
Base model	openai/gpt-oss-120b (117B params, 5.1B active MoE)
Total parameters	60B, 4.8B active MoE

Evaluation & Benchmarks

Evaluation Methodology

Benchmark scores were obtained with the following setups. Methodology varies by benchmark family.

MMLU-Pro, AIME25, GPQA:d, LiveCodeBench

Evaluation framework: Lighteval
Inference library: vLLM 0.14.0
Reasoning effort: medium
Decoding: temperature = 0.6, max_tokens = 131072, top_p = 1.0, top_k = 0
Batch size: 64

IFBench, AA-LCR, SciCode

Evaluation framework: Nemo-skills
Inference library: vLLM 0.14.0
Reasoning effort: medium
Decoding: temperature = 1.0, max_tokens = 131072, top_p = 1.0, top_k = 0
Batch size: 64

BFCL v4 (17 splits)

Evaluation framework: EvalScope 1.4.1
Inference library: vLLM 0.14.0
Reasoning effort: high
Decoding: temperature = 0.6, max_tokens = 16384, parallel_tool_calls = true, tool-call parser openai

Tau2-bench (Telecom)

Evaluation framework: EvalScope 1.4.1
Inference library: vLLM 0.14.0
Reasoning effort: high (agent extra_body.reasoning_effort)
Decoding (agent): temperature = 1.0, top_p = 1.0, min_tokens = 1
Decoding (judge / user simulator): temperature = 0.7, timeout = 600
Reproducibility: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge)

Terminal-Bench Hard (Artificial Analysis subset):

Evaluation framework: laude-institute/harbor == 0.1.43
Inference library: vLLM == 0.15.0
Reasoning effort: high
Decoding: temperature = 1.0, top_p = 1.0, max-model-len = 131072
Reproducibility: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard)
Agent: terminus-2, max episodes 100; repeats 3;

Quantitative Results (Reported & Planned)

Scores are accuracy or benchmark-specific metrics. Use — or TBD for evaluations not yet run. Reported numbers use the methodology described above (reasoning: cai-eval + Nemo-skills; BFCL v4 and Tau2-bench: cai-eval + EvalScope); other entries to be documented.

Benchmark	gpt-oss-20b	gpt-oss-120b	HyperNova 60B 2602
MMLU-Pro	74	78	74
BFCL v4	61	64	62
Tau2-bench (Telecom)	59	68	61
AIME25	72	80	76
GPQA:d	63	69	69
IFBench	55	63	60
SciCode	34	38	32
LiveCodeBench	64	66	64
Terminal Bench	9	22	16
AA-LCR	37	50	36
AA-Omnis. Index	-40	-36	-41
AA-Omnis. Accuracy	16	21	15

Quantitative Results (Inference Performance)

Representative throughput and memory under the evaluation setup above. Comparison against gpt-oss-120b on the same hardware.

Performance evaluation conditions

Inference library: vLLM 0.14.0
Hardware: 1× NVIDIA H200 Tensor Core GPU
Conditions: concurrency=128

Summary of Improvements:

Throughput (tok/s): Hypernova is 39.5% faster
Median TTFT (ms): Hypernova is 50.8% faster

Languages

Primary language: English
Other languages: Not formally evaluated

The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured.

Intended Use

Recommended Use Cases

Aligned with gpt-oss-120b use cases, with the benefit of a smaller footprint:

Reasoning and analysis (with configurable reasoning effort where supported)
Tool-augmented and agentic applications (function calling, web browsing, code execution, structured outputs)
Code generation and reasoning
Chatbots and virtual assistants
Retrieval-augmented generation (RAG)
Deployments where gpt-oss-120b is desirable but memory or latency is constrained

Out-of-Scope Uses

Harmful, illegal, or deceptive content generation
Impersonation of real individuals without consent
High-risk decision-making without human oversight
Surveillance or tracking of individuals
Any use that violates applicable laws or regulations

Safety & Limitations

Known Limitations

English-centric training data (inherited from base model).
Format: For best results, use the same harmony response format as gpt-oss-120b where applicable; behavior may differ otherwise.
Tool calling depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed.
Compression may affect some behaviors; evaluate for your use case.

Recommendations

Validate tool outputs before execution
Use human oversight for critical applications
Perform task-specific evaluation prior to deployment

Model Information

Field	Value
Model name	HyperNova 60B 2602
Based on	openai/gpt-oss-120b
Version	2602
Release date	26/02/2026
Developed by	Multiverse Computing
License	Apache 2.0
Contact	business@multiversecomputing.com

Citation

If you use this model, please cite the base model and this variant:

@misc{openai2025gptoss120b,
  title         = {gpt-oss-120b \& gpt-oss-20b Model Card},
  author        = {OpenAI},
  year          = {2025},
  eprint        = {2508.10925},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2508.10925}
}
@misc{hypernova60b2602,
  title = {HyperNova 60B 2602: Model developed based on gpt-oss-120b},
  author = {Multiverse Computing},
  year = {2026},
  url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602},
  note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology}
}

Built by Multiverse Computing · Report an issue · Discord

Downloads last month: 1,118

Safetensors

Model size

59B params

Tensor type

BF16

Model tree for MultiverseComputingCAI/Hypernova-60B-2602

Base model

openai/gpt-oss-120b

Quantized

MultiverseComputingCAI/HyperNova-60B