A Better Understanding of SDS
We show that SDS (and its variants) can be cast as a
Schrödinger Bridge (SB) problem, which aims to find the optimal transport between two
distributions. SDS approximates this optimal path between the current optimized image distribution
(e.g., renderings from a NeRF) and a target distribution (e.g., text-conditioned natural
image distribution).
While SDS tries to model this optimal path,
it is a per-iteration approximation of it—and this ultimately causes
its characteristic artifacts.
COCO-FID=86.02 | Time: 4.48min
COCO-FID=91.70 | Time: 7.20min
COCO-FID=89.96 | Time: 6.21min
COCO-FID=59.22 | Time: 16.02min
COCO-FID=55.65 | Time: 21.46min
COCO-FID=67.89 | Time: 4.48min
VSD
SDS
Ours
We thank Matthew Tancik, Jiaming Song, Riley Peterlinz, Ayaan Haque, Ethan Weber, Konpat Preechakul, Ruiqi Gao, Amit Kohli and Ben Poole for their helpful feedback and discussion.
This project is supported in part by a Google Research Scholar award and IARPA DOI/IBC No. 140D0423C0035. The
views and conclusions contained herein are those of the authors and do not
represent the official policies or endorsements of these institutions.
@article{mcallister2024rethinking,
title={Rethinking Score Distillation as a Bridge Between Image Distributions},
author={David McAllister and Songwei Ge and Jia-Bin Huang and David W. Jacobs and Alexei A. Efros and Aleksander Holynski and Angjoo Kanazawa},
journal={arXiv preprint arXiv:2406.09417},
year={2024}
}