Skip to content

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

Home / Papers / Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

Authors: Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li Date: 2026-03-26 Paper ID: arxiv:2603.25740

Summary

Drive My Way (DMW) is a novel personalized Vision-Language-Action (VLA) driving framework designed to overcome the rigidity of current autonomous systems by adapting to individual driver styles and real-time natural language commands. The core methodology involves learning a user embedding from a collected dataset of multiple human drivers to condition the driving policy, which is then further modulated by short-term language instructions. Evaluations on the Bench2Drive benchmark confirm that DMW effectively adapts to style instructions, and user studies validate that the generated driving behaviors are recognizable as the intended driver’s personal style. This work establishes personalization as a critical factor for human-centered autonomous driving systems.

Key Contributions

  • Proposed Drive My Way (DMW), a personalized Vision-Language-Action (VLA) driving framework that aligns with users’ long-term driving habits and real-time language instructions.
  • Introduced a mechanism to learn a user embedding from a personalized driving dataset collected across multiple drivers to condition the driving policy for personalization.
  • Demonstrated superior style instruction adaptation and achieved user-recognizable personalized driving behavior in closed-loop evaluations on the Bench2Drive benchmark.

Limitations

The paper relies on a proprietary personalized driving dataset collected across multiple drivers, which may limit generalizability across entirely new user populations or environments not represented in the training data.

Open Questions & Future Work

Key Concepts

  • Personalized Vision-Language-Action Model: A driving model framework that integrates user-specific embeddings derived from driving habits with natural language instructions to achieve personalized autonomous driving behaviors.

Datasets

Limitations

The paper relies on a proprietary personalized driving dataset collected across multiple drivers, which may limit generalizability across entirely new user populations or environments not represented in the training data.

Metadata & Links

url
https://arxiv.org/abs/2603.25740
paper_id
2603.25740
paper_source
arxiv
domain
robotics
tags
multimodalagenttool-usevision-language-modelevaluationbenchmark
architectures
decoder-only
datasets
Bench2Drive
skill
GeneralMLSkill
created_at
2026-03-27T06:07:07Z