This 20-second driving clip from nuPlan is rendered with 3D rasterization using left, front, and right cameras, which preserves semantic and metric accuracy while discarding photorealism to ensure rendering speed and direct usability for end-to-end training.
- 🏆 1st Place in the Waymo Open Dataset Vision-based End-to-End Driving Challenge (2025) (UniPlan entry).
- 🏆 Current Leaderboard #1 on the Waymo Open Dataset Vision-based E2E Driving Leaderboard , NAVSIM v1 navtest , and NAVSIM v2 navhard (RAP entry).
- 🏆 State-of-the-art on Bench2Drive.
End-to-end (E2E) driving policies trained via imitation learning rely only on expert demonstrations. Once deployed in closed loop, these policies lack recovery data: small mistakes cannot be corrected and quickly escalate into failures. A promising direction is to augment training with alternative viewpoints and trajectories beyond the logged path. While prior works use photorealistic digital twins built with neural rendering or game engines, these methods are prohibitively slow and costly, and thus mainly serve evaluation purposes.
In this work, we argue that photorealism is unnecessary for training E2E planners. What truly matters is semantic fidelity and scalability: driving depends on geometry and dynamics, not textures or lighting. Motivated by this, we introduce 3D Rasterization, which replaces expensive rendering with lightweight rasterization of annotated primitives, enabling counterfactual recovery maneuvers and cross-agent view synthesis. To ensure these synthetic views transfer effectively to real-world deployment, we further propose a Raster-to-Real feature-space alignment that bridges the sim-to-real gap without requiring pixel-level realism.
Together, these components form Rasterization Augmented Planning (RAP), a scalable data augmentation pipeline for end-to-end driving. RAP achieves state-of-the-art closed-loop robustness and long-tail generalization, ranking 1st on four major benchmarks: NAVSIM v1/v2, Waymo Open Dataset Vision-based E2E Driving, and Bench2Drive. Our results show that lightweight rasterization with feature alignment suffices to scale E2E training, offering a practical and scalable alternative to photorealistic rendering.

Comparison of rendering paradigms for end-to-end driving. Neural or engine-based methods (left) aim to minimize the sim-to-real gap in pixel space, but incur high computational cost. In contrast, our approach (right) leverages 3D rasterization, which is scalable and fully controllable, and aligns rasterized inputs with real images in feature space.
Recent efforts in end-to-end driving often rely on photorealistic digital twins built with neural rendering or game engines, which focus on pixel-level realism but remain computationally heavy and limited in scalability. In this work, we show that photorealism is not necessary for training robust planners. Instead, our Rasterization Augmented Planning (RAP) leverages lightweight 3D rasterization to generate semantically faithful augmentations, and bridges the sim-to-real gap via feature-space alignment. This design enables scalable data synthesis and significantly improves robustness and generalization in closed-loop driving.

Overview of the proposed RAP. (a) Data Augmentations via 3D Rasterization: annotated driving logs are converted into large-scale synthetic samples through cross-agent view synthesis and recovery-oriented perturbation. (b) Raster-to-Real Alignment: paired real and rasterized inputs are processed by a frozen image encoder and a learnable feature projector. Spatial alignment minimizes MSE loss against detached raster features, while global alignment uses a gradient reversal layer and domain classifier to enforce domain confusion.
RAP consists of two key components. First, 3D rasterization transforms annotated logs into diverse, large-scale augmentations by generating novel viewpoints and recovery maneuvers. Second, Raster-to-Real alignment ensures these synthetic samples transfer effectively to real-world deployment by aligning rasterized and real inputs in feature space. Together, these components provide a scalable and robust data pipeline for closed-loop end-to-end driving.
Recovery-oriented Perturbation
We perturb logged expert trajectories with lateral/longitudinal offsets and noise, then re-render them with 3D rasterization, generating counterfactual scenes that teach the planner to recover from distribution shifts.
Before perturbation: Ego follows the expert trajectory.
After perturbation: Counterfactual ego trajectories drift from the expert path, creating recovery scenarios.
Cross-agent View Synthesis
Instead of rendering only from the ego trajectory, we substitute the ego with other agents in the same scenario while keeping the original camera parameters fixed, producing views from diverse perspectives without extra sensors. Combined with recovery-oriented perturbations, this scales the dataset to over 500k rasterized training samples, covering diverse viewpoints, richer interactions, and rare recovery scenarios.
Cross-agent view synthesis: nuPlan scenarios rendered from multiple agents’ perspectives, expanding training data beyond the ego view without additional sensors.
Raster-to-Real Alignment

PCA visualization of frozen DINOv3 features. Rasterized and real inputs share similar structures, supporting rasterization as a perceptually valid substitute for real imagery.
To bridge the gap between synthetic rasters and real images, we introduce Raster-to-Real (R2R) alignment, which enforces feature consistency at both spatial and global levels. Spatial alignment minimizes token-wise differences between paired real and raster features, while global alignment employs adversarial training to make overall feature distributions indistinguishable. Together, these objectives ensure that rasterized augmentations transfer effectively to real-world deployment.
@misc{feng2025rap3drasterizationaugmented,
title = {RAP: 3D Rasterization Augmented End-to-End Planning},
author = {Lan Feng and Yang Gao and Eloi Zablocki and Quanyi Li and Wuyang Li and Sichao Liu and Matthieu Cord and Alexandre Alahi},
year = {2025},
eprint = {2510.04333},
archivePrefix= {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2510.04333}
}