Flow-GRPO Post-training
Flow-GRPO Post-training
Auto-generated stub. Edit this file to add more details.
A post-training optimization technique that applies a Reward-Driven Optimization (GRPO) over the stochastic sampling process derived from a Continuous Normalizing Flow (CNF) to steer trajectory predictions towards desired behavioral outcomes.
Why It Matters
This novel two-stage approach explicitly aligns flow-based generative modeling with behavioral objectives (social compliance, physical feasibility) using Reinforcement Learning (GRPO) applied to the flow’s SDE formulation.
Evidence
we perform Flow-GRPO post-training,where deterministic flow rollout is reformulated as stochastic ODE-to-SDE sampling to enable trajectory exploration, and a composite reward combines view-aware social compliance with map-aware physical feasibility.
Related Papers
Metadata & Links
- created_at
- 2026-03-29T06:07:28Z