Skip to content

Improve R2V fidelity-instruction tradeoff

Home / Open Questions / Improve R2V fidelity-instruction tradeoff

Background: Current research in reference-to-video (R2V) generation is often constrained by the limited diversity of training data, leading to a trade-off between adhering to text instructions and preserving the fidelity of the reference subject’s appearance.

Question / Future Work: Future work should involve using a more diverse dataset for training to achieve a better balance between instruction following (adhering to the text prompt) and reference fidelity (preserving the identity and appearance of the reference image).

Metadata & Links