Improve R2V fidelity-instruction tradeoff
Background: Current research in reference-to-video (R2V) generation is often constrained by the limited diversity of training data, leading to a trade-off between adhering to text instructions and preserving the fidelity of the reference subject’s appearance.
Question / Future Work: Future work should involve using a more diverse dataset for training to achieve a better balance between instruction following (adhering to the text prompt) and reference fidelity (preserving the identity and appearance of the reference image).
Metadata & Links
- created_at
- 2026-03-27T06:07:04Z