1Toyota Research Institute
2Amazon Web Services
We introduce AnyView, a diffusion-based video generation framework for dynamic view synthesis. Given a single video from any camera trajectory, the task is to predict a temporally synchronized video of the same scene from any other camera trajectory. Unlike most existing methods that rely on explicit 3D reconstructions, costly test-time optimization, or are limited to narrow viewpoint changes, AnyView operates end-to-end and supports extreme camera displacements where there may be little overlap between input and output viewpoints.
We adopt Cosmos, a latent diffusion transformer, as our base model. Rather than using warped depth maps as explicit conditioning, we rely solely on an implicitly learned 4D representation. We encode all camera parameters into a unified Plücker representation \( \boldsymbol{P} = (\boldsymbol{r}, \boldsymbol{m}) \), combining extrinsics and intrinsics into dense per-pixel ray and moment vectors. These embeddings are concatenated along the channel dimension, while both viewpoints are concatenated along the sequence dimension to form the overall set of tokens.
For training, we combined 12 different 4D (multi-view video) datasets across four distinct domains: Robotics, Driving, 3D, and Other. During training, we perform weighted sampling to ensure each domain is seen equally often (i.e. comprises 25% of the batch) to create a balanced representation.
We introduce Kubric-5D, a newly generated variation of Kubric-4D that vastly increases the diversity of camera trajectories, incorporating advanced filmmaking effects such as the dolly zoom. These scenes contain multi-object interactions with rich visual appearance and complicated dynamics, with synchronized videos from multiple viewpoints covering a diverse range of camera motions.
Download link coming soon!
We propose AnyViewBench, a multi-faceted benchmark that covers datasets across multiple domains:
@inproceedings{vanhoorick2026anyview,
title={AnyView: Synthesizing Any Novel View in Dynamic Scenes},
author={Van Hoorick, Basile and Chen, Dian and Iwase, Shun and Tokmakov, Pavel and Irshad, Muhammad Zubair and Vasiljevic, Igor and Gupta, Swati and Cheng, Fangzhou and Zakharov, Sergey and Guizilini, Vitor Campagnolo},
journal={In Submission},
year={2026}}
Below is a comprehensive gallery of videos corresponding to all figures in the paper, including comparisons with baseline methods and supplementary visualizations.
The webpage template was inspired by this project page.