Papers
arxiv:2603.16542

Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

Published on Mar 17
· Submitted by
Wanpeng Zhang
on Mar 19
Authors:
,
,
,
,
,
,
,

Abstract

Posterior-Transition Reweighting (PTR) improves offline robot policy adaptation by dynamically weighting training samples based on the attribution of their post-action consequences, enabling more conservative and effective learning from heterogeneous datasets.

AI-generated summary

Offline post-training adapts a pretrained robot policy to a target dataset by supervised regression on recorded actions. In practice, robot datasets are heterogeneous: they mix embodiments, camera setups, and demonstrations of varying quality, so many trajectories reflect recovery behavior, inconsistent operator skill, or weakly informative supervision. Uniform post-training gives equal credit to all samples and can therefore average over conflicting or low-attribution data. We propose Posterior-Transition Reweighting (PTR), a reward-free and conservative post-training method that decides how much each training sample should influence the supervised update. For each sample, PTR encodes the observed post-action consequence as a latent target, inserts it into a candidate pool of mismatched targets, and uses a separate transition scorer to estimate a softmax identification posterior over target indices. The posterior-to-uniform ratio defines the PTR score, which is converted into a clipped-and-mixed weight and applied to the original action objective through self-normalized weighted regression. This construction requires no tractable policy likelihood and is compatible with both diffusion and flow-matching action heads. Rather than uniformly trusting all recorded supervision, PTR reallocates credit according to how attributable each sample's post-action consequence is under the current representation, improving conservative offline adaptation to heterogeneous robot data.

Community

Paper author Paper submitter

We propose Posterior-Transition Reweighting (PTR), which is a reward-free and conservative post-training method that decides how much each training sample should influence the supervised update. PTR is particularly suitable for complex, heterogeneous robot data of varying quality.

arxiv: https://arxiv.org/abs/2603.16542
blog: https://research.beingbeyond.com/ptr

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.16542 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.16542 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.16542 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.