OCAR · Surprise Agent-RL (Archived) - a Ricardo-H Collection

Ricardo-H 's Collections

WS TM-WM Sweep - Qwen3 Agents (40k)

BehR-WM (LLaMA3.1-8B) TextWorld WM/W2R Trajectories

TW WM-TM (LLaMA3.1-8B) Step171 TextWorld WM/W2R Trajectories

WebShop TM-WM Checkpoint Sweep - Qwen3-32B Agent (32k, TP=4)

TW WM-TM Step170 TextWorld WM/W2R Trajectories

Step92 WebShop WM/W2R Trajectories

ws-llama-webshop-token-match-0429

OCAR · Surprise Agent-RL (Archived)

BehR: Behavior-Consistent World Models

alfworld-dual-token-0416

ws-wm-0410ministral

grpo-alfworld-0410

ws-wm-crossjudge-llama-0406

rlvr-f1-llama-textworld-f1

rlvr-f1-llama-webshop-f1

ws-wm-llama-0227

OCAR · Surprise Agent-RL (Archived)

updated 20 days ago

Archived checkpoints from the terminated OCAR/surprise-as-credit line. See post-mortem in the verl-agent GitHub repo.