HyperNova 60B 2602
Powered by CompactifAI
Optimized for Efficient Inference · Reduced Memory Footprint · Native Tool Calling Support
Table of Contents
- Highlights
- Model Overview
- Key Characteristics
- Quick Start
- What's New in HyperNova 60B 2602
- Tool Calling
- Training & Fine-Tuning
- Architecture
- Evaluation & Benchmarks
- Languages
- Intended Use
- Safety & Limitations
- Model Information
- Citation
Model Overview
HyperNova 60B 2602 is a model developed based on OpenAI’s gpt-oss-120b, developed by Multiverse Computing. The original gpt-oss-120b is an open-weight model (117B parameters, 5.1B active in MoE) designed for powerful reasoning, agentic tasks, and versatile developer use. This version is compressed with CompactifAI, Multiverse Computing’s proprietary technology, reducing parameter count and memory requirements while aiming to preserve strong reasoning.
The model is instruction-tuned and supports native tool calling (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with lower memory footprint and deployment flexibility.
Key Characteristics
| Characteristic | Description |
|---|---|
| Base model | OpenAI gpt-oss-120b (117B params, MoE; open-weight, Apache 2.0) |
| 🛠️ Tool calling | Native support; OpenAI-style function / tool calling schemas; agentic use (e.g. function calling, structured outputs) |
| 🧠 Parameters | 60B total parameters after CompactifAI compression (reduced vs. base 117B) |
| 📐 Architecture | Decoder-only Transformer (from gpt-oss lineage) |
| 🗜️ Compression | CompactifAI (proprietary compression technology) |
| Primary language | English |
| Other languages | Not formally evaluated |
Quick Start
This model can be loaded with the Transformers API. Use trust_remote_code=True (required for the gpt-oss architecture). Recommended approach: AutoModelForCausalLM with apply_chat_template:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "MultiverseComputingCAI/HyperNova-60B-2602"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "What is a Hypernova?"}]
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True,
)
inputs = inputs.to(model.device)
attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device)
outputs = model.generate(
inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
attention_mask=attention_mask,
)
reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(reply)
Alternatively you can use the pipeline API with trust_remote_code=True; the pipeline returns the full conversation structure, so extract the assistant message from outputs[0]["generated_text"] as needed.
What’s New in HyperNova 60B 2602
HyperNova 60B 2602 is a model developed based on gpt-oss-120b, retaining the base model’s strengths while reducing memory and improving deployment flexibility.
Summary
- Model developed based on gpt-oss-120b: Same Apache 2.0 license and design goals (reasoning, agentic tasks, tool use); smaller footprint via CompactifAI.
- Tool use: Retains support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas).
- Reasoning: Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis.
- Evaluated on tool-focused benchmarks (e.g. BFCL v4, Tau2-bench) and general benchmarks alongside other CompactifAI and gpt-oss variants.
Tool Calling
HyperNova 60B 2602 supports native tool use and is well-suited for:
- Function calling with defined schemas
- Structured outputs
- Agentic operations (e.g. browser tasks, code execution where supported)
The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows OpenAI-style schemas; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed.
Example Tool Call
{
"name": "get_weather",
"arguments": {
"city": "Paris",
"date": "2026-02-10"
}
}
Training & Fine-Tuning
Base Model: gpt-oss-120b
The base model gpt-oss-120b was trained on OpenAI’s harmony response format and is intended for use with that format for correct behavior. It supports configurable reasoning levels (low / medium / high) and native tool use. See the original model card and arXiv:2508.10925 for details.
CompactifAI Compression & Optional Fine-Tuning
- Compression: CompactifAI was applied to produce a smaller, efficient model (60B parameters) while aiming to preserve reasoning and tool-use capabilities.
- Optional fine-tuning: This variant may include additional fine-tuning for tool calling and structured outputs; exact training details are model-specific.
Architecture
Model Specifications
| Specification | Value |
|---|---|
| Base model | openai/gpt-oss-120b (117B params, 5.1B active MoE) |
| Total parameters | 60B, 4.8B active MoE |
Evaluation & Benchmarks
Evaluation Methodology
Benchmark scores were obtained with the following setups. Methodology varies by benchmark family.
MMLU-Pro, AIME25, GPQA:d, LiveCodeBench
- Evaluation framework: Lighteval
- Inference library: vLLM 0.14.0
- Reasoning effort: medium
- Decoding: temperature = 0.6, max_tokens = 131072, top_p = 1.0, top_k = 0
- Batch size: 64
IFBench, AA-LCR, SciCode
- Evaluation framework: Nemo-skills
- Inference library: vLLM 0.14.0
- Reasoning effort: medium
- Decoding: temperature = 1.0, max_tokens = 131072, top_p = 1.0, top_k = 0
- Batch size: 64
BFCL v4 (17 splits)
- Evaluation framework: EvalScope 1.4.1
- Inference library: vLLM 0.14.0
- Reasoning effort: high
- Decoding: temperature = 0.6, max_tokens = 16384, parallel_tool_calls = true, tool-call parser openai
Tau2-bench (Telecom)
- Evaluation framework: EvalScope 1.4.1
- Inference library: vLLM 0.14.0
- Reasoning effort: high (agent
extra_body.reasoning_effort) - Decoding (agent): temperature = 1.0, top_p = 1.0, min_tokens = 1
- Decoding (judge / user simulator): temperature = 0.7, timeout = 600
- Reproducibility: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge)
Terminal-Bench Hard (Artificial Analysis subset):
- Evaluation framework: laude-institute/harbor == 0.1.43
- Inference library: vLLM == 0.15.0
- Reasoning effort: high
- Decoding: temperature = 1.0, top_p = 1.0, max-model-len = 131072
- Reproducibility: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard)
- Agent: terminus-2, max episodes 100; repeats 3;
Quantitative Results (Reported & Planned)
Scores are accuracy or benchmark-specific metrics. Use — or TBD for evaluations not yet run. Reported numbers use the methodology described above (reasoning: cai-eval + Nemo-skills; BFCL v4 and Tau2-bench: cai-eval + EvalScope); other entries to be documented.
| Benchmark | gpt-oss-20b | gpt-oss-120b | HyperNova 60B 2602 |
|---|---|---|---|
| MMLU-Pro | 74 | 78 | 74 |
| BFCL v4 | 61 | 64 | 62 |
| Tau2-bench (Telecom) | 59 | 68 | 61 |
| AIME25 | 72 | 80 | 76 |
| GPQA:d | 63 | 69 | 69 |
| IFBench | 55 | 63 | 60 |
| SciCode | 34 | 38 | 32 |
| LiveCodeBench | 64 | 66 | 64 |
| Terminal Bench | 9 | 22 | 16 |
| AA-LCR | 37 | 50 | 36 |
| AA-Omnis. Index | -40 | -36 | -41 |
| AA-Omnis. Accuracy | 16 | 21 | 15 |
Quantitative Results (Inference Performance)
Representative throughput and memory under the evaluation setup above. Comparison against gpt-oss-20b and gpt-oss-120b on the same hardware.
Performance evaluation conditions
Describe the setup used to obtain the numbers in the table below (replace the placeholders or add a short paragraph):
- Inference library: vLLM 0.14.0
- Hardware: 4× NVIDIA H200 Tensor Core GPU
- Conditions: batch size=512, context length=512, decode length=256
- Notes: dtype=default
| Metric | gpt-oss-20b | gpt-oss-120b | HyperNova 60B 2602 | Hardware |
|---|---|---|---|---|
| Tokens / second (decode) | 250 | 228 | 240 | 4× NVIDIA H200 Tensor Core GPU |
| Time to first token (ms) | 26 | 26 | 25 | 4× NVIDIA H200 Tensor Core GPU |
| Peak GPU memory (GB) | 13 | 61 | 32 | 4× NVIDIA H200 Tensor Core GPU |
Languages
- Primary language: English
- Other languages: Not formally evaluated
The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured.
Intended Use
Recommended Use Cases
Aligned with gpt-oss-120b use cases, with the benefit of a smaller footprint:
- Reasoning and analysis (with configurable reasoning effort where supported)
- Tool-augmented and agentic applications (function calling, web browsing, code execution, structured outputs)
- Code generation and reasoning
- Chatbots and virtual assistants
- Retrieval-augmented generation (RAG)
- Deployments where gpt-oss-120b is desirable but memory or latency is constrained
Out-of-Scope Uses
- Harmful, illegal, or deceptive content generation
- Impersonation of real individuals without consent
- High-risk decision-making without human oversight
- Surveillance or tracking of individuals
- Any use that violates applicable laws or regulations
Safety & Limitations
Known Limitations
- English-centric training data (inherited from base model).
- Format: For best results, use the same harmony response format as gpt-oss-120b where applicable; behavior may differ otherwise.
- Tool calling depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed.
- Compression may affect some behaviors; evaluate for your use case.
Recommendations
- Validate tool outputs before execution
- Use human oversight for critical applications
- Perform task-specific evaluation prior to deployment
Model Information
| Field | Value |
|---|---|
| Model name | HyperNova 60B 2602 |
| Based on | openai/gpt-oss-120b |
| Version | 2602 |
| Release date | 26/02/2026 |
| Developed by | Multiverse Computing |
| License | Apache 2.0 |
| Contact | business@multiversecomputing.com |
Citation
If you use this model, please cite the base model and this variant:
@misc{openai2025gptoss120b,
title = {gpt-oss-120b \& gpt-oss-20b Model Card},
author = {OpenAI},
year = {2025},
eprint = {2508.10925},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2508.10925}
}
@misc{hypernova60b2602,
title = {HyperNova 60B 2602: Model developed based on gpt-oss-120b},
author = {Multiverse Computing},
year = {2026},
url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602},
note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology}
}
Built by Multiverse Computing · Report an issue · Discord
- Downloads last month
- 113


