Sync-NeRF

Sync-NeRF: Generalizing Dynamic NeRFs
to Unsynchronized Videos
AAAI 2024

Seoha Kim*
Yonsei University
Jeongmin Bae*
Yonsei University
Youngsik Yun
Yonsei University

Hahyun Lee
ETRI
Gun Bang
ETRI
Youngjung Uh
Yonsei University

*Denotes equal contribution
Visual Intelligence Lab at Yonsei University

Teaser

The commonly used Plenoptic Video Dataset in 4D scene reconstruction contains an unsynchronized video. If the baseline includes the unsynchronized view in the training set, it fails to reconstruct the motion around the unsynchronized view. In the same settings, our method is superior to the baseline.

Video

Abstract

Recent advancements in 4D scene reconstruction using neural radiance fields (NeRF) have demonstrated the ability to represent dynamic scenes from multi-view videos. However, they fail to reconstruct the dynamic scenes and struggle to fit even the training views in unsynchronized settings. It happens because they employ a single latent embedding for a frame while the multi-view images at the frame were actually captured at different moments. To address this limitation, we introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with NeRF. By design, our method is applicable for various baselines and improves them with large margins. Furthermore, finding the offsets naturally works as synchronizing the videos without manual effort. Experiments are conducted on the common Plenoptic Video Dataset and a newly built Unsynchronized Dynamic Blender Dataset to verify the performance and robustness of our method.

Results

To show the effectiveness of our method, we created an Unsynchronized Plenoptic Video Dataset that intentionally perturbed synchronization. In the video, the left side is the baseline and the right is a model that applied our method to the baseline.

Method Overview

Problem Statement

In an ideal situation, a specific frame captures the same moment of the scene across multiple cameras. However, the video synchronization process requires manual efforts and even if the videos are assumed to be synchronized, there can be a temporal mismatch within a frame.

Existing methods can be problematic without considering the time deviations when the input views are unsynchronized. Comparably, Sync-NeRF resolves this problem by learning time offsets $\delta$ that model the temporal gap for each training view. It works as assigning correct temporal latent embeddings to unsynchronized videos, which is equivalent to the synchronizing process by shifting.

Our synchronized temporal embedding

We introduce two approaches for modeling temporal embedding in dynamic NeRF. On models using per-frame temporal embeddings, we present an implicit function $T_{\theta}$, which takes the time $t_k$ calibrated by the time offset $\delta_k$ as input and outputs temporal embedding $z(t_k)$. On grid-based models, we query the embedding at the calibrated time $t_k$.

Citation

@article{Kim2024Sync,
author = {Kim, Seoha and Bae, Jeongmin and Yun, Youngsik and Lee, Hahyun and Bang, Gun and Uh, Youngjung},
year = {2024},
month = {03},
pages = {2777-2785},
title = {Sync-NeRF: Generalizing Dynamic NeRFs to Unsynchronized Videos},
volume = {38},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
doi = {10.1609/aaai.v38i3.28057} }

Acknowledgements

The website template was borrowed from BakedSDF.

Sync-NeRF: Generalizing Dynamic NeRFs
to Unsynchronized Videos
AAAI 2024

Paper

Code

Dataset

Poster

Teaser

Video

Abstract

Results

Method Overview

Problem Statement

Our synchronized temporal embedding

Citation

Acknowledgements

Sync-NeRF: Generalizing Dynamic NeRFs to Unsynchronized Videos AAAI 2024

Paper

Code

Dataset

Poster

Teaser

Video

Abstract

Results

Method Overview

Problem Statement

Our synchronized temporal embedding

Citation

Acknowledgements

Sync-NeRF: Generalizing Dynamic NeRFs
to Unsynchronized Videos
AAAI 2024