SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

Chenlin Meng    Yang Song    Jiaming Song   
Jiajun Wu    Jun-Yan Zhu    Stefano Ermon   

Stanford University     Carnegie Mellon University

Paper | GitHub | Colab

SDEdit is an image synthesis and editing framework based on stochastic differential equations (SDEs). SDEdit allows stroke-based image synthesis, stroke-based image editing and image compositing without task specific optimization.


We introduce a new image editing and synthesis framework, Stochastic Differential Editing (SDEdit), based on arecent generative model using stochastic differential equations (SDEs). Given an input image with user edits (e.g.,hand-drawn color strokes), we first add noise to the input according to an SDE, and subsequently denoise it by simulating the reverse SDE to gradually increase its likelihood under the prior. Our method does not require task-specificloss function designs, which are critical components for recent image editing methods based on GAN inversion. Compared to conditional GANs, we do not need to collect new datasets of original and edited images for new applications. Therefore, our method can quickly adapt to various editing tasks at test time without re-training models. Our approach achieves strong performance on a wide range of applications, including image synthesis and editing guided bystroke paintings and image compositing.

paper thumbnail


arXiv 2108.01073, 2021.


Chenlin Meng, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu and Stefano Ermon. "Image Synthesis and Editing with Stochastic Differential Equations", in arXiv, 2021.

Introducing SDEdit: a powerful image synthesis and editing technique

The key intuition of SDEdit is to "hijack" the reverse stochastic process of SDE-based generative models, as illustrated in the figure below. Given an input image for editing, such as a stroke painting or an image with strokes, we can add a suitable amount of noise to make its artifacts undetectable, while still preserving the overall structure of the image. We then initialize the reverse SDE with this noisy input, and simulate the reverse process to obtain a denoised image of high quality. Because the denoised image and the input resembles each other with noise perturbations, they also share the overall image structure.


Synthesizing images from strokes with SDEdit

Given an input stroke painting, our goal is to generate a realistic image that shares the same structure as the input when no paired data is available. We present stroke-based image synthesis with SDEdit on LSUN bedroom, LSUN church and CelebA-HQ datasets. We notice that SDEdit can generate multiple diverse images for each stroke painting.


Scribble-based image editing with SDEdit

Given an input with user added strokes, we want to generate a realistic image based on the user's edit. We observe that our method can generate image edits that are both realistic and faithful (to the user edit), while avoid making undesired modifications. (See the figure below.)


Image compositing with SDEdit

Given an image, users can specify how they want the edited image to look like using pixel patches copied from other reference images. Our goal is to generate a realistic image based on the user's edit. In the figure below, "original" stands for the orignal image, and "input" stands for an input designed by users. We observe that SDEdit can generate both faithful and realistic images with much lower LPIPS scores compared to GAN baselines.


Related Work