Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Ricardo-H 's Collections
WS TM-WM Sweep - Qwen3 Agents (40k)
BehR-WM (LLaMA3.1-8B) TextWorld WM/W2R Trajectories
TW WM-TM (LLaMA3.1-8B) Step171 TextWorld WM/W2R Trajectories
WebShop TM-WM Checkpoint Sweep - Qwen3-32B Agent (32k, TP=4)
TW WM-TM Step170 TextWorld WM/W2R Trajectories
tw-wm-tm-0501
Step92 WebShop WM/W2R Trajectories
ws-llama-webshop-token-match-0429
OCAR · Surprise Agent-RL (Archived)
BehR: Behavior-Consistent World Models
alfworld-dual-token-0416
ws-wm-0410ministral
grpo-alfworld-0410
ws-wm-crossjudge-llama-0406
rlvr-f1-llama-textworld-f1
rlvr-f1-llama-webshop-f1
rlvr-f1
ws-wm-0314
ws-wm-f1-0314
ws-wm-llama-0227
ws-wm-0224

OCAR · Surprise Agent-RL (Archived)

updated 20 days ago

Archived checkpoints from the terminated OCAR/surprise-as-credit line. See post-mortem in the verl-agent GitHub repo.

Upvote
-

  • Ricardo-H/ocar-v3-alfworld-7b

    8B • Updated 20 days ago • 18

  • Ricardo-H/ocar-grpo-observe-alfworld-7b

    8B • Updated 20 days ago • 20

  • Ricardo-H/ocar-grpo-observe-alfworld-1.5b

    2B • Updated 20 days ago • 24

  • Ricardo-H/ocar-gigpo-observe-alfworld-1.5b

    2B • Updated 20 days ago • 21
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs