Skip to content

Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation

Home / Papers / Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation

Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation

Authors: Xinying Guo, Chenxi Jiang, Hyun Bin Kim, Ying Sun, Yang Xiao, Yuhang Han, Jianfei Yang Date: 2026-03-25 Paper ID: arxiv:2603.24576

Summary

Chameleon addresses the challenge of non-Markovian decision-making in long-horizon robotic manipulation caused by perceptual aliasing, where identical observations map to different underlying states. The system is inspired by human episodic memory, utilizing a novel memory architecture that writes geometry-grounded multimodal tokens to preserve fine-grained, disambiguating context. It employs a differentiable memory stack to enable goal-directed recall, overcoming the limitations of semantic compression used in prior work. Evaluated on the new Camo-Dataset, Chameleon shows consistent improvements in decision reliability and long-horizon control in perceptually challenging scenarios.

Key Contributions

  • Introduction of Chameleon, a novel memory system for non-Markovian robotic manipulation that writes geometry-grounded multimodal tokens to an episodic memory stack.
  • Development of a goal-directed, differentiable recall mechanism that preserves fine-grained perceptual cues lost in similarity-based retrieval methods.
  • Creation of Camo-Dataset, a real-robot UR5e dataset specifically designed to evaluate long-horizon control under perceptual aliasing, spatial tracking, and sequential manipulation.
  • Demonstration that Chameleon consistently improves decision reliability and long-horizon control over strong baselines in perceptually confusable robotic tasks.

Limitations

The paper focuses on real-robot UR5e manipulation, and its generalization to highly complex or unstructured environments remains to be fully explored. The computational overhead of managing geometry-grounded multimodal tokens is not explicitly detailed compared to simpler compressed traces.

Open Questions & Future Work

Key Concepts

  • Chameleon Episodic Memory: A system that uses geometry-grounded multimodal tokens and a differentiable memory stack to enable robust long-horizon robotic manipulation under perceptual aliasing.

Datasets

Limitations

The paper focuses on real-robot UR5e manipulation, and its generalization to highly complex or unstructured environments remains to be fully explored. The computational overhead of managing geometry-grounded multimodal tokens is not explicitly detailed compared to simpler compressed traces.

Metadata & Links

url
https://arxiv.org/abs/2603.24576
paper_id
2603.24576
paper_source
arxiv
domain
robotics
tags
agentlong-contextmultimodalmemoryroboticsvision-language-model
architectures
datasets
Camo-Dataset
skill
TimeSeriesSkill
created_at
2026-03-26T06:26:24Z