Author: Wei Jiang, Richard Lee Davis, Kevin Gonyop Kim, Pierre Dillenbourg
https://proceedings.mlr.press/v176/jiang22a.html
Abstract: We have developed a new tool that makes it possible for people with zero programming experience to intentionally and meaningfully explore the latent space of a GAN. We combine a number of methods from the literature into a single system that includes multiple functionalities: uploading and locating images in the latent space, image generation with text, visual style mixing, and intentional and intuitive latent space exploration. This tool was developed to provide a means for designers to explore the "design space" of their domains. Our goal was to create a system to support novices in gaining a more complete, expert understanding of their domain's design space by lowering the barrier of entry to using deep generative models in creative practice.
We use Zalando dataset which can also be downloaded from Zalando images and Zalando Text Image Pairs. The dataset itself consists of 8732 high-resolution images, each depicting a dress from the available on the Zalando shop against a white-background.
!python train.py --outdir "training_runs" --snap 20 --metrics "none" --data "data/square_256_imgs.zip"
If the resume the model from a checkpoint, we can --resume
!python train.py --outdir "training_runs" --snap 20 --metrics "none" --data "data/square_256_imgs.zip" --resume "training_runs/00015-square_256_imgs-auto1-resumecustom/network-snapshot-000400.pkl"
!python "DALLE-pytorch/train_dalle.py" --vae_path "DALLE-pytorch/wandb/vae-final.pt" --image_text_folder "data/text_images"
Download the models in the following links and save them in your Google Drive.
Model | Download |
---|---|
Pretrained fashion GAN | fashion-gan-pretrained.pkl |
Finetuned DALL-E model | DALLE-finetuend.pkl |
We applied PCA analysis to identify the semantically meaningful directions in latent space. By exploring the first 10 principle components, we found sleeve, pattern, etc.
To project the image into latent space, we employ SGD with perceptual loss + pixel-by-pixel MSE loss between two images. This loss noticeably improved our tool’s ability to embed out-of-sample examples in the latent space of the GAN.
We implemented two methods to locate the design. The first method was to randomly sample images from the latent space, then to pass these along with the text description through a CLIP. model to find a small number of images which most closely matched the text. The second method was to fine-tune a DALL-E model on the Feidegger dataset, and then to pass the text descriptions to DALL-E and let it generate designs. We compare it with other models:
- FahionGAN: realistic, diverse but low resolution.
- DALLE: diverse, creative but less accurate.
- Stable Diffusion: accurate, high resolution but not diverse (when given specific text with only changing background and models).
We have built a website for user testing: generarive.fashion
The interface of our neural design space exploration tool. Users can upload images in the workplace on the left or generate random image through random button. Also, they can generate examples via text descriptions using the text box. Users can drag these examples to the style-mixing region or save them in the workplace. Users can selectively combine elements from three designs using the visual style-mixing panel. The output image is shown in the center of the canvas on the right. The 2D-dimensional canvas represents the design space for two attributes in the horizontal and vertical axes, and these attributes can be changed by using a drop-down menu for each axis. Dragging the image within the canvas is equivalent to moving through the latent space of the GAN in semantically meaningful directions.
@InProceedings{pmlr-v176-jiang22a,
title = {GANs for All: Supporting Fun and Intuitive Exploration of GAN Latent Spaces},
author = {Jiang, Wei and Davis, Richard Lee and Kim, Kevin Gonyop and Dillenbourg, Pierre},
booktitle = {Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track},
pages = {292--296},
year = {2022}
}
This project and application is a semester project at EPFL CHILI Lab