Sync-NeRF: Generalizing Dynamic NeRFs
to Unsynchronized Videos
AAAI 2024

  • Seoha Kim*
    Yonsei University
  • Jeongmin Bae*
    Yonsei University
  • Youngsik Yun
    Yonsei University
  • Hahyun Lee
    ETRI
  • Gun Bang
    ETRI
  • Youngjung Uh
    Yonsei University
*Denotes equal contribution
Visual Intelligence Lab at Yonsei University

Teaser


The commonly used Plenoptic Video Dataset in 4D scene reconstruction contains an unsynchronized video. If the baseline includes the unsynchronized view in the training set, it fails to reconstruct the motion around the unsynchronized view. In the same settings, our method is superior to the baseline.

Video

Abstract

Recent advancements in 4D scene reconstruction using neural radiance fields (NeRF) have demonstrated the ability to represent dynamic scenes from multi-view videos. However, they fail to reconstruct the dynamic scenes and struggle to fit even the training views in unsynchronized settings. It happens because they employ a single latent embedding for a frame while the multi-view images at the frame were actually captured at different moments. To address this limitation, we introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with NeRF. By design, our method is applicable for various baselines and improves them with large margins. Furthermore, finding the offsets naturally works as synchronizing the videos without manual effort. Experiments are conducted on the common Plenoptic Video Dataset and a newly built Unsynchronized Dynamic Blender Dataset to verify the performance and robustness of our method.

Results


To show the effectiveness of our method, we created an Unsynchronized Plenoptic Video Dataset that intentionally perturbed synchronization. In the video, the left side is the baseline and the right is a model that applied our method to the baseline.

Method Overview

Problem Statement


problem statement

In an ideal situation, a specific frame captures the same moment of the scene across multiple cameras. However, the video synchronization process requires manual efforts and even if the videos are assumed to be synchronized, there can be a temporal mismatch within a frame.

Existing methods can be problematic without considering the time deviations when the input views are unsynchronized. Comparably, Sync-NeRF resolves this problem by learning time offsets $\delta$ that model the temporal gap for each training view. It works as assigning correct temporal latent embeddings to unsynchronized videos, which is equivalent to the synchronizing process by shifting.

Our synchronized temporal embedding


continuous temporal embedding

We introduce two approaches for modeling temporal embedding in dynamic NeRF. On models using per-frame temporal embeddings, we present an implicit function $T_{\theta}$, which takes the time $t_k$ calibrated by the time offset $\delta_k$ as input and outputs temporal embedding $z(t_k)$. On grid-based models, we query the embedding at the calibrated time $t_k$.

Citation

@article{Kim2024Sync,
author = {Kim, Seoha and Bae, Jeongmin and Yun, Youngsik and Lee, Hahyun and Bang, Gun and Uh, Youngjung},
year = {2024},
month = {03},
pages = {2777-2785},
title = {Sync-NeRF: Generalizing Dynamic NeRFs to Unsynchronized Videos},
volume = {38},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
doi = {10.1609/aaai.v38i3.28057} }

Acknowledgements

The website template was borrowed from BakedSDF.