Trump-Forecaster

RL-Tuned gpt-oss-120b for Predicting Trump Administration Actions

We fine-tuned gpt-oss-120b with reinforcement learning to predict Trump administration actions. Trained on the WWTD-2025 dataset of 2,108 binary forecasting questions generated with the Lightning Rod SDK, Trump-Forecaster beats GPT-5 on held-out forecasting questions.

This repo contains a LoRA adapter (5.3 GB) for gpt-oss-120b. A standalone merge.py script is included to produce a full merged model if needed.

Dataset · Lightning Rod SDK · Future-as-Label paper · Outcome-based RL paper


Results

Evaluated on 682 held-out test questions under two conditions: with news context, and without context (question only). The no-context condition reveals whether the model knows what it doesn't know—untrained models project false confidence, while RL training fixes overconfidence.

Model Brier (With Context) BSS Brier (No Context) BSS ECE (With Context) ECE (No Context)
GPT-5 0.200 +0.14 0.258 -0.11 0.091 0.191
gpt-oss-120b (base) 0.213 +0.08 0.260 -0.12 0.111 0.190
Trump-Forecaster 0.194 +0.16 0.242 -0.04 0.079 0.164

Brier Skill Score

Brier Score Comparison

ECE Comparison

Metrics

  • Brier Score: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. Brier Skill Score (BSS) expresses this as improvement over always predicting the base rate—positive means the model learned something useful beyond historical frequency.
  • Expected Calibration Error (ECE): Measures whether predicted probabilities match actual frequencies. "70%" predictions should resolve "yes" 70% of the time. Lower is better.

Training

  • Base model: openai/gpt-oss-120b (120B MoE, 5.1B active params, 128 experts Top-4)
  • Method: GRPO with Brier score reward via Tinker
  • LoRA rank: 32
  • Learning rate: 4e-5
  • Batch size: 32, group size 8
  • Training steps: 50
  • Max tokens: 16,384

Usage

This repo contains a LoRA adapter trained with Tinker. The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone merge.py script is included.

Merge into full model

pip install torch transformers safetensors tqdm huggingface-hub
python merge.py --output ./trump-forecaster-merged

This downloads the base model, dequantizes to bf16, applies the LoRA adapter, and saves the merged model.

Inference

import sglang as sgl

engine = sgl.Engine(
    model_path="./trump-forecaster-merged",
    tokenizer_path="openai/gpt-oss-120b",
    trust_remote_code=True,
    dtype="bfloat16",
    tp_size=2,
)

news_context = "... relevant news articles ..."

prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".

Question: Will Trump impose 25% tariffs on all goods from Canada by February 1, 2025?

Context:
{news_context}

Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""

output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
print(output["text"])

Links

Downloads last month
97
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LightningRodLabs/Trump-Forecaster

Adapter
(26)
this model

Dataset used to train LightningRodLabs/Trump-Forecaster

Papers for LightningRodLabs/Trump-Forecaster

Evaluation results