Improve R2V fidelity-instruction tradeoff

Home / Open Questions / Improve R2V fidelity-instruction tradeoff

Background: Current research in reference-to-video (R2V) generation is often constrained by the limited diversity of training data, leading to a trade-off between adhering to text instructions and preserving the fidelity of the reference subject’s appearance.

Question / Future Work: Future work should involve using a more diverse dataset for training to achieve a better balance between instruction following (adhering to the text prompt) and reference fidelity (preserving the identity and appearance of the reference image).

Metadata & Links

created_at: 2026-03-27T06:07:04Z
source_papers: [[2603.25743-refalign-representation-alignment-for-reference-to-video-gen]]