Robustness to Imperfect Visuals
Background: The TAG method leverages counterfactual visual inputs derived from complex synthesis pipelines to provide guidance during inference.
Question / Future Work: Future work should focus on improving the robustness of TAG when faced with imperfect or noisy visual observations encountered in complex, unstructured, real-world settings, especially when the synthesized counterfactual baselines ($I_{\text{uncond}}$) do not perfectly cancel out all background or distractor biases.
Metadata & Links
- created_at
- 2026-03-26T06:26:20Z