OccAny: Generalized Unconstrained Urban 3D Occupancy
OccAny: Generalized Unconstrained Urban 3D Occupancy
Authors: Anh-Quan Cao, Tuan-Hung Vu Date: 2026-03-24 Paper ID: arxiv:2603.23502
Summary
OccAny addresses the limitations of existing 3D occupancy prediction methods, which heavily rely on in-domain annotations and precise sensor priors, by proposing the first generalized framework for unconstrained urban 3D occupancy prediction. The model operates on out-of-domain, uncalibrated scenes using sequential, monocular, or surround-view images to predict metric occupancy alongside segmentation features. Key innovations include Segmentation Forcing to improve occupancy quality and enable mask prediction, and a Novel View Rendering pipeline for geometry completion via test-time augmentation. Extensive experiments validate that OccAny achieves state-of-the-art performance among visual geometry methods while remaining competitive with in-domain self-supervised approaches on standard urban occupancy benchmarks.
Key Contributions
- Introduction of OccAny, the first generalized 3D occupancy framework capable of operating on out-of-domain, uncalibrated scenes to predict metric occupancy coupled with segmentation features.
- Proposal of Segmentation Forcing to improve the quality of occupancy prediction and enable mask-level output from the model.
- Development of a Novel View Rendering pipeline to infer novel-view geometry, which is utilized for test-time view augmentation to enhance geometry completion.
- Demonstration that OccAny outperforms visual geometry baselines and remains competitive with in-domain self-supervised methods across diverse input settings (sequential, monocular, surround-view).
Limitations
The abstract focuses on outperforming visual geometry baselines and competing with in-domain self-supervised methods, but does not explicitly detail the remaining gap compared to fully supervised, in-domain methods or the robustness to extreme out-of-domain shifts.
Key Concepts
- Segmentation Forcing: A technique used within the OccAny framework to enhance 3D occupancy quality by enforcing the prediction of segmentation masks alongside the occupancy grid.
- Novel View Rendering pipeline: A component in OccAny designed to infer novel-view geometry from existing views, facilitating test-time augmentation for improved geometry completion in 3D occupancy prediction.
Datasets
Limitations
The abstract focuses on outperforming visual geometry baselines and competing with in-domain self-supervised methods, but does not explicitly detail the remaining gap compared to fully supervised, in-domain methods or the robustness to extreme out-of-domain shifts.
Links
Metadata & Links
- url
- https://arxiv.org/abs/2603.23502
- paper_id
- 2603.23502
- paper_source
- arxiv
- domain
- computer-vision
- tags
- object-detectionimage-segmentationmultimodalvision-language-modelrobustnessevaluationdataset
- architectures
-
- datasets
- urban occupancy prediction datasets
- skill
- TimeSeriesSkill
- created_at
- 2026-03-25T21:17:54Z