Sync-NeRF: Generalizing Dynamic NeRFs
to Unsynchronized Videos
AAAI 2024
-
Seoha Kim*
Yonsei University -
Jeongmin Bae*
Yonsei University -
Youngsik Yun
Yonsei University
-
Hahyun Lee
ETRI -
Gun Bang
ETRI -
Youngjung Uh
Yonsei University
Teaser
The commonly used Plenoptic Video Dataset in 4D scene reconstruction contains an unsynchronized video. If the baseline includes the unsynchronized view in the training set, it fails to reconstruct the motion around the unsynchronized view. In the same settings, our method is superior to the baseline.
Video
Abstract
Recent advancements in 4D scene reconstruction using neural radiance fields (NeRF) have demonstrated the ability to represent dynamic scenes from multi-view videos. However, they fail to reconstruct the dynamic scenes and struggle to fit even the training views in unsynchronized settings. It happens because they employ a single latent embedding for a frame while the multi-view images at the frame were actually captured at different moments. To address this limitation, we introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with NeRF. By design, our method is applicable for various baselines and improves them with large margins. Furthermore, finding the offsets naturally works as synchronizing the videos without manual effort. Experiments are conducted on the common Plenoptic Video Dataset and a newly built Unsynchronized Dynamic Blender Dataset to verify the performance and robustness of our method.
Results
To show the effectiveness of our method, we created an Unsynchronized Plenoptic Video Dataset that intentionally perturbed synchronization. In the video, the left side is the baseline and the right is a model that applied our method to the baseline.
Method Overview
Problem Statement
In an ideal situation, a specific frame captures the same moment of the scene across multiple cameras.
However, the video synchronization process requires manual efforts and even if the videos are assumed to be synchronized, there can be a temporal mismatch within a frame.
Existing methods can be problematic without considering the time deviations when the input views are unsynchronized.
Comparably, Sync-NeRF resolves this problem by learning time offsets $\delta$ that model the temporal gap for each training view.
It works as assigning correct temporal latent embeddings to unsynchronized videos, which is equivalent to the synchronizing process by shifting.
Our synchronized temporal embedding
We introduce two approaches for modeling temporal embedding in dynamic NeRF. On models using per-frame temporal embeddings, we present an implicit function $T_{\theta}$, which takes the time $t_k$ calibrated by the time offset $\delta_k$ as input and outputs temporal embedding $z(t_k)$. On grid-based models, we query the embedding at the calibrated time $t_k$.
Citation
@article{Kim2024Sync,
author = {Kim, Seoha and Bae, Jeongmin and Yun, Youngsik and Lee, Hahyun and Bang, Gun and Uh, Youngjung},
year = {2024},
month = {03},
pages = {2777-2785},
title = {Sync-NeRF: Generalizing Dynamic NeRFs to Unsynchronized Videos},
volume = {38},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
doi = {10.1609/aaai.v38i3.28057}
}
Acknowledgements
The website template was borrowed from BakedSDF.