Are you interested in the latest developments in the realm of image editing? A new tool has hit the scene, offering a fresh take on the traditional methods of editing: DragDiffusion. A product of in-depth research by Yujun Shi, Chuhui Xue, Jiachun Pan, Wenqing Zhang, Vincent Y. F. Tan, and Song Bai, this tool enables interactive point-based image editing through the unique application of diffusion models. You can access more about this project here.
What is DragDiffusion?
Please note that DragDiffusion is a research project, not a commercial product. The primary function of this tool is to facilitate interactive point-based image editing. It's designed to run on a Nvidia GPU with a Linux system, although other configurations are yet to be tested.
How to Install and Run DragDiffusion?
Installing and running DragDiffusion involves a straightforward process. For the installation of required libraries, run the following command: bashCopy code
conda env create -f environment.yaml conda activate dragdiff Before running DragDiffusion, set up "accelerate" with the following command: accelerate config.
After setting up, two steps are needed to use DragDiffusion:
Step 1: Train a LoRA
To train a LoRA (Latent Representation Augmentation) on your input image, first place the image under a folder. Make sure this folder only contains this one image. Then, set "SAMPLE_DIR" and "OUTPUT_DIR" in the script "lora/train_lora.sh" to be proper values. "SAMPLE_DIR" should be the directory containing your input image, and "OUTPUT_DIR" should be where you want to save the trained LoRA. Also, you need to set the option "--instance_prompt" in the script "lora/train_lora.sh" to be a proper prompt. Note that this prompt does not have to be a complicated one.
After the "lora/train_lora.sh" file has been configured properly, run the following command to train a LoRA: bash lora/train_lora.sh.
Step 2: Perform "Drag" Editing
Upon training the LoRA, you can now run the following command to start the Gradio user interface: python3 drag_ui_real.py. Please refer to the Demo video for a detailed explanation of how to perform the "drag" editing.
The editing process involves several steps, from dropping your input image into the left-most box to clicking the "Run" button to initiate the algorithm. The final results will be displayed in the right-most box.
Here is an explanation for parameters in the user interface:
prompt: The prompt describing the user input image (should be the same as the prompt used to train LoRA).
lora_path: The path to the trained LoRA.
n_pix_step: Maximum number of steps of motion supervision. Increase this value if handle points have not been "dragged" to the desired position.
lam: The regularization coefficient controlling unmasked region stays unchanged. Increase this value if the unmasked region has changed more than what was desired.
n_actual_inference_step: Number of DDIM inversion steps performed.
DragDiffusion is a step forward in the world of image editing, introducing a whole new level of interaction and precision to the process. It showcases the power and potential of diffusion models in transforming traditional practices, enhancing user experience, and opening new possibilities in the field.
Comments