Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation
Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation
Authors: Xinying Guo, Chenxi Jiang, Hyun Bin Kim, Ying Sun, Yang Xiao, Yuhang Han, Jianfei Yang Date: 2026-03-25 Paper ID: arxiv:2603.24576
Summary
Chameleon addresses the challenge of non-Markovian decision-making in long-horizon robotic manipulation caused by perceptual aliasing, where identical observations map to different underlying states. The system is inspired by human episodic memory, utilizing a novel memory architecture that writes geometry-grounded multimodal tokens to preserve fine-grained, disambiguating context. It employs a differentiable memory stack to enable goal-directed recall, overcoming the limitations of semantic compression used in prior work. Evaluated on the new Camo-Dataset, Chameleon shows consistent improvements in decision reliability and long-horizon control in perceptually challenging scenarios.
Key Contributions
- Introduction of Chameleon, a novel memory system for non-Markovian robotic manipulation that writes geometry-grounded multimodal tokens to an episodic memory stack.
- Development of a goal-directed, differentiable recall mechanism that preserves fine-grained perceptual cues lost in similarity-based retrieval methods.
- Creation of Camo-Dataset, a real-robot UR5e dataset specifically designed to evaluate long-horizon control under perceptual aliasing, spatial tracking, and sequential manipulation.
- Demonstration that Chameleon consistently improves decision reliability and long-horizon control over strong baselines in perceptually confusable robotic tasks.
Limitations
The paper focuses on real-robot UR5e manipulation, and its generalization to highly complex or unstructured environments remains to be fully explored. The computational overhead of managing geometry-grounded multimodal tokens is not explicitly detailed compared to simpler compressed traces.
Open Questions & Future Work
- cross-embodiment-transfer-for-episodic-memory
- learning-event-segmentation-in-robotics
- integrating-episodic-memory-with-foundation-models
Key Concepts
- Chameleon Episodic Memory: A system that uses geometry-grounded multimodal tokens and a differentiable memory stack to enable robust long-horizon robotic manipulation under perceptual aliasing.
Datasets
Limitations
The paper focuses on real-robot UR5e manipulation, and its generalization to highly complex or unstructured environments remains to be fully explored. The computational overhead of managing geometry-grounded multimodal tokens is not explicitly detailed compared to simpler compressed traces.
Links
Metadata & Links
- url
- https://arxiv.org/abs/2603.24576
- paper_id
- 2603.24576
- paper_source
- arxiv
- domain
- robotics
- tags
- agentlong-contextmultimodalmemoryroboticsvision-language-model
- architectures
-
- datasets
- Camo-Dataset
- skill
- TimeSeriesSkill
- created_at
- 2026-03-26T06:26:24Z