Golf-Forecaster

RL-Tuned gpt-oss-120b for Predicting Professional Golf Outcomes

Starting from nothing but 9 search queries, we used the Lightning Rod SDK to automatically generate 3,178 forecasting questions from news articles, label them using real outcomes, and train this model via RL. No expertise required. No manual labeling. No domain-specific engineering. The result beats GPT-5 on held-out questions.

You can do this in any domain — just change the search queries. See how we built the dataset.

This repo contains a LoRA adapter for gpt-oss-120b. A standalone merge.py script is included to merge it into a full model.

Results

Evaluated on 855 held-out test questions (temporal split, Aug 2025+).

Model	Brier Score	Brier Skill Score	ECE
Golf-Forecaster	0.207	+17.0%	0.062
gpt-oss-120b (base)	0.218	+12.8%	0.083
GPT-5	0.218	+12.8%	0.106

Brier Score: Mean squared error between predicted probability and outcome. Lower is better. BSS measures improvement over always predicting the base rate. ECE: Whether predicted probabilities match actual frequencies. Lower is better.

Training

Base model: openai/gpt-oss-120b (120B MoE, 5.1B active params)
Method: GRPO with Brier score reward via Tinker
LoRA rank: 32, learning rate 4e-5, batch size 32, group size 8, 100 steps

Usage

The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone merge.py script is included.

Merge into full model

pip install torch transformers safetensors tqdm huggingface-hub
python merge.py --output ./golf-forecaster-merged

Inference

import sglang as sgl

engine = sgl.Engine(
    model_path="./golf-forecaster-merged",
    tokenizer_path="openai/gpt-oss-120b",
    trust_remote_code=True,
    dtype="bfloat16",
    tp_size=2,
)

news_context = "... relevant news articles ..."

prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".

Question: Will Scottie Scheffler win the 2025 Masters?

Context:
{news_context}

Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""

output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
print(output["text"])

Model tree for LightningRodLabs/Golf-Forecaster

Base model

openai/gpt-oss-120b

Adapter

(26)

this model

Dataset used to train LightningRodLabs/Golf-Forecaster

Papers for LightningRodLabs/Golf-Forecaster

Future-as-Label: Scalable Supervision from Real-World Outcomes

Paper • 2601.06336 • Published Jan 9 • 2

Outcome-based Reinforcement Learning to Predict the Future

Paper • 2505.17989 • Published May 23, 2025 • 2

Evaluation results

Brier Score on GolfForecasting
test set self-reported

0.207
Expected Calibration Error on GolfForecasting
test set self-reported

0.062

LightningRodLabs
/

Golf-Forecaster