Reangle-A-Video: 4D Video Generation as Video-to-Video Translation


Hyeonho Jeong*Suhyeon Lee*Jong Chul Ye
    KAIST AI    
* indicates equal contribution

   

Press Enter to pause or resume all videos.

TL;DR

We introduce Reangle-A-Video, a unified framework for generating synchronized multi-view videos from a single input video.
We achieve (a) Static view transport and (b) Dynamic camera control without relying on any multi-view generative priors, where both approaches offer six degrees of freedom.



(a) Static view transport: Regeneration of the input video shooting from target viewpoints.

Input video
View from Orbit right
View from Orbit down
View from Orbit left
View from Dolly zoom in
View from Dolly zoom out



(b) Dynamic camera control: Regeneration of the input video following target camera movements.

Input video
Camera movement: Orbit up
Camera movement: Orbit down
Camera movement: Orbit left
Camera movement: Orbit right
Camera movement: Dolly zoom in




Method

Reangle-A-Video decomposes a dynamic 4D scene into view-specific appearance (starting image) and view-invariant motion (image-to-video), addressing each component separately. We first embed the scene's view-invariant motion into a pre-trained video diffusion model using our novel self-supervised training with data augmentation strategy. Initially, to capture diverse perspectives from a single monocular video, we repeatedly perform point-based warping to generate a set of warped videos. These videos, together with the original video, form the training dataset for fine-tuning a pre-trained image-to-video diffusion model with a masked diffusion objective. To achieve (b) dynamic camera control, we sample videos using the fine-tuned model with the original first frame as input. In contrast, for (a) static view transport, we generate view-transported starting images by inpainting the warped first frames under an inference-time view consistency guidance using an off-the-shelf multi-view stereo reconstruction network.


Multi-view motion learning pipeline


Multi-view image inpainting pipeline




Camera Visualizations

We demonstrate six degrees of freedom in both (a) Static view transport and (b) Dynamic camera control.
Here we visualize the transported viewpoints and camera movements used in our work.






Results: (a) Static view transport

Input video
View from Orbit left
View from Orbit right
 
View from Orbit down
View from Dolly zoom in



Input video
View from Orbit left
View from Orbit right
View from Orbit up
View from Orbit down
View from Dolly zoom in



Input video
View from Orbit left
View from Orbit right
 
View from Orbit up
View from Dolly zoom in



Input video
View from Orbit left
View from Orbit down
View from Orbit up



Input video
View from Orbit left
View from Orbit down
View from Dolly zoom in



Input video
View from Orbit right
View from Orbit down
View from Orbit up



Input video
View from Orbit left
View from Orbit down
View from Dolly zoom in




Results: (b) Dynamic camera control

Input video
Camera movement: Orbit down
Camera movement: Orbit up
Camera movement: Orbit left
Camera movement: Dolly zoom in
Camera movement: Dolly zoom out



Input video
Camera movement: Orbit left
Camera movement: Dolly zoom in
 
Camera movement: Orbit down
Camera movement: Orbit up



Input video
Camera movement: Dolly zoom in
Camera movement: Dolly zoom out
Camera movement: Orbit left
Camera movement: Orbit right
Camera movement: Orbit down



Input video
Camera movement: Orbit right
Camera movement: Dolly zoom out



Input video
Camera movement: Orbit left
Camera movement: Orbit up
Camera movement: Dolly zoom in



Input video
Camera movement: Orbit right
Camera movement: Orbit up
Camera movement: Dolly zoom in



Input video
Camera movement: Dolly zoom in
Input video
Camera movement: Orbit down



Input video
Camera movement: Orbit down
Input video
Camera movement: Dolly zoom in




Comparisons (1/2)

For (a) Static view transport, we compare with Generative Camera Dolly, and Vanilla CogVideoX I2V which employs the same input frame as ours.


Input video
 
View from Orbit left
Reangle-A-Video (Ours)
View from Orbit down
Reangle-A-Video (Ours)
Input video
Generative Camera Dolly
Generative Camera Dolly
Input video
Vanilla CogVideoX I2V
Vanilla CogVideoX I2V



Input video
 
View from Orbit down
Reangle-A-Video (Ours)
View from Dolly zoom in
Reangle-A-Video (Ours)
Input video
Generative Camera Dolly
Generative Camera Dolly
Input video
Vanilla CogVideoX I2V
Vanilla CogVideoX I2V



Input video
 
View from Orbit up
Reangle-A-Video (Ours)
View from Orbit right
Reangle-A-Video (Ours)
Input video
Generative Camera Dolly
Generative Camera Dolly
Input video
Vanilla CogVideoX I2V
Vanilla CogVideoX I2V




Comparisons (2/2)

For (b) Dynamic camera control, we compare with NVS-Solver and Trajectory Attention.


Input video
 
Camera movement: Orbit right
Reangle-A-Video (Ours)
Camera movement: Orbit up
Reangle-A-Video (Ours)
Input video
NVS-Solver
NVS-Solver
Input video
Trajectory Attention
Trajectory Attention



Input video
 
Camera movement: Orbit right
Reangle-A-Video (Ours)
Camera movement: Dolly zoom out
Reangle-A-Video (Ours)
Input video
NVS-Solver
NVS-Solver
Input video
Trajectory Attention
Trajectory Attention



Input video
 
Camera movement: Dolly zoom in
Reangle-A-Video (Ours)
Camera movement: Orbit up
Reangle-A-Video (Ours)
Input video
NVS-Solver
NVS-Solver
Input video
Trajectory Attention
Trajectory Attention