Skip to content

Collection of papers in image synthesis with paper graph.

Notifications You must be signed in to change notification settings

Victarry/awesome-image-synthesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

awesome image synthesis papers

Collection of papers in image synthesis.

Updates:

  • 🔥 2024.7: A new awesome list for papers dedicated for diffusion model. diffusion.md

Note: The following awesome list will not be maintained. Cut off in 2022.

Unconditional/(Class Conditional) Image Generation

GAN Architecture

flowchart TB
  GAN[VanillaGAN, 2014] -- architecture tricks --> DCGAN[DCGAN, 2016]
  DCGAN -- Progressive growing --> PG[PG-GAN, 2018]
  PG --> BigGAN[BigGAN, 2019]
  PG -- AdaIN, mapping network --> SG1[StyleGAN, 2019]
  SG1 -- Weight demodulation --> SG2[StyleGAN2, 2020]
  SG2 -- Translate and rotate equivariance --> SG3[StyleGAN3, 2021]
  DCGAN -- Autoregressive transformer \n for vision tokens --> VQGAN
  VQGAN -- transformers architecture \n of generator and discriminator --> TransGAN
Loading

Generative adversarial nets.
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio*
NeurIPS 2014. [PDF] [Tutorial] Cited:2075

DCGAN Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.
Alec Radford, Luke Metz, Soumith Chintala.
ICLR 2016. [PDF] Cited:13117

PG-GAN Progressive Growing of GANs for Improved Quality, Stability, and Variation.
Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen.
ICLR 2018. [PDF] Cited:6527

StyleGAN A Style-Based Generator Architecture for Generative Adversarial Networks.
Tero Karras, Samuli Laine, Timo Aila.
CVPR 2019. [PDF] Cited:8707

BigGAN Large Scale GAN Training for High Fidelity Natural Image Synthesis.
Andrew Brock, Jeff Donahue, Karen Simonyan.
ICLR 2019. [PDF] Cited:4748

StyleGAN2 Analyzing and Improving the Image Quality of StyleGAN.
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila.
CVPR 2020. [PDF] Cited:4863

VQGAN Taming Transformers for High-Resolution Image Synthesis
Patrick Esser, Robin Rombach, Björn Ommer.
CVPR 2021. [PDF] [Project] Cited:1969

TransGAN TransGAN: Two Transformers Can Make One Strong GAN, and That Can Scale Up
Yifan Jiang, Shiyu Chang, Zhangyang Wang.
CVPR 2021. [PDF] [Pytorch] Cited:312

StyleGAN3 Alias-Free Generative Adversarial Networks.
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo Aila.
NeurIPS 2021. [PDF] [Project] Cited:1272

StyleSwin: Transformer-based GAN for High-resolution Image Generation
Bowen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, Baining Guo.
CVPR 2022. [PDF] Cited:171

StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Axel Sauer, Katja Schwarz, Andreas Geiger
SIGGRAPH 2022. [PDF] Cited:351

summary 1. Propose architecture changes based on Projected GANs. (1) Regularization: Only appying path length regularization after model has been sufficiently trained. Blur all images with gaussian filter for the first 200k images. (2) Reduce latent code z dimension to 64 and preserve w code to 512-d. (3) Use pretrained class embdding as conditioning and set it learnable. 2. Design a progressive growing strategy to StyleGAN3. 3. Leverage classifier guidance.

GAN Objective

A Large-Scale Study on Regularization and Normalization in GANs
Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly
ICML 2019. [PDF] Cited:147

EB-GAN Energy-based Generative Adversarial Networks
Junbo Zhao, Michael Mathieu, Yann LeCun.
ICLR 2017. [PDF] Cited:1089

Towards Principled Methods for Training Generative Adversarial Networks
Martin Arjovsky, Léon Bottou
ICLR 2017. [PDF] Cited:1974

LSGAN Least Squares Generative Adversarial Networks.
Xudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau, Zhen Wang, Stephen Paul Smolley.
ICCV 2017. [PDF] Cited:4221

WGAN Wasserstein GAN
Martin Arjovsky, Soumith Chintala, Léon Bottou.
ICML 2017. [PDF] Cited:4582

GGAN Geometric GAN
Jae Hyun Lim, Jong Chul Ye.
Axiv 2017. [PDF] Cited:477

AC-GAN Conditional Image Synthesis With Auxiliary Classifier GANs
Augustus Odena, Christopher Olah, Jonathon Shlens.
ICML 2017. [PDF] Cited:2975

cGANs with Projection Discriminator
Takeru Miyato, Masanori Koyama.
ICLR 2018. [PDF] Cited:728

S³-GAN High-Fidelity Image Generation With Fewer Labels
Mario Lucic*, Michael Tschannen*, Marvin Ritter*, Xiaohua Zhai, Olivier Bachem, Sylvain Gelly.
ICML 2019. [PDF] [Tensorflow] Cited:149

Autoencoder-based framework

VAE Auto-Encoding Variational Bayes.
Diederik P.Kingma, Max Welling.
ICLR 2014. [PDF] Cited:17730

AAE Adversarial Autoencoders.
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey.
arxiv 2015. [PDF] Cited:2122

VAE/GAN Autoencoding beyond pixels using a learned similarity metric.
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther.
ICML 2016. [PDF] Cited:1900

VampPrior VAE with a VampPrior
Jakub M. Tomczak, Max Welling.
AISTATS 2018. [PDF] [Pytorch] Cited:578

BiGAN Adversarial Feature Learning
Jeff Donahue, Philipp Krähenbühl, Trevor Darrell.
ICLR 2017. [PDF] Cited:1758

AIL Adversarial Learned Inference
Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville.
ICLR 2017. [PDF] Cited:1286

VEEGAN Veegan: Reducing mode collapse in gans using implicit variational learning.
Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann, Charles Sutton.
NeurIPS 2017. [PDF] [Github] Cited:626

AGE Adversarial Generator-Encoder Networks.
Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky.
AAAI 2018. [PDF] [Pytorch] Cited:129

IntroVAE IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis.
Huaibo Huang, Zhihang Li, Ran He, Zhenan Sun, Tieniu Tan.
NeurIPS 2018. [PDF] Cited:232

Disentangled Inference for {GAN}s with Latently Invertible Autoencoder
Jiapeng Zhu, Deli Zhao, Bo Zhang, Bolei Zhou
IJCV 2020. [PDF] Cited:29

ALAE Adversarial Latent Autoencoders
Stanislav Pidhorskyi, Donald Adjeroh, Gianfranco Doretto.
CVPR 2020. [PDF] Cited:238

VAEs

Variational Inference with Normalizing Flows
Danilo Jimenez Rezende, Shakir Mohamed
ICML 2015. [PDF] Cited:3619

Improved Variational Inference with Inverse Autoregressive Flow
Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
NeurIPS 2016. [PDF] Cited:1681

NVAE: A Deep Hierarchical Variational Autoencoder
Arash Vahdat, Jan Kautz
NeurIPS 2020. [PDF] Cited:735

Diffusion Models

Improved techniques for training score-based generative models.
Yang Song, Stefano Ermon
NeurIPS 2020. [PDF] Cited:864

DDPM Denoising Diffusion Probabilistic Models
Jonathan Ho, Ajay Jain, Pieter Abbeel
NeurIPS 2020. [PDF] Cited:9830

Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
ICLR 2021. [PDF] Cited:3827

Improved-DDPM Improved Denoising Diffusion Probabilistic Models
Alex Nichol, Prafulla Dhariwal
ICML 2021. [PDF] Cited:2343

Variational Diffusion Models.
Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho
NeurIPS 2021. [PDF] Cited:767

Guided-Diffusion Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal, Alex Nichol
NeurIPS 2021. [PDF] Cited:4798

Classifier-Free Diffusion Guidance.
Jonathan Ho, Tim Salimans
NeurIPS 2021. [PDF] Cited:2201

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon
ICLR 2022. [PDF] Cited:904

DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Models
Gwanghyun Kim, Taesung Kwon, Jong Chul Ye
CVPR 2022. [PDF] Cited:443

Blended Diffusion: Text-driven Editing of Natural Images
Omri Avrahami, Dani Lischinski, Ohad Fried
CVPR 2022. [PDF] Cited:648

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen
ICML 2022. [PDF] Cited:2497

Palette: Image-to-Image diffusion models.
Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi
SIGGRAPH 2022. [PDF] Cited:1122

RePaint: Inpainting using Denoising Diffusion Probabilistic Models
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool
CVPR 2022 [PDF] Cited:918

Disentangled Image Generation

DC-IGN Deep Convolutional Inverse Graphics Network
Tejas D. Kulkarni, Will Whitney, Pushmeet Kohli, Joshua B. Tenenbaum.
NeurIPS 2015. [[PDF](Deep Convolutional Inverse Graphics Network)]

InfoGAN InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel.
NeurIPS 2016. [PDF] Cited:4014

Beta-VAE beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
I. Higgins, L. Matthey, Arka Pal, Christopher P. Burgess, Xavier Glorot, M. Botvinick, S. Mohamed, Alexander Lerchner.
ICLR 2017. [PDF]

AnnealedVAE Understanding disentangling in β-VAE
Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, Alexander Lerchner.
NeurIPS 2017. [PDF] Cited:768

Factor-VAE Disentangling by Factorising
Hyunjik Kim, Andriy Mnih.
NeurIPS 2017. [PDF] Cited:1219

DCI A framework for the quantitative evaluation of disentangled representations.
Cian Eastwood, Christopher K. I. Williams ICLR 2018. [PDF]

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations.
Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem.
ICML(best paper award) 2019. [PDF] Cited:1301

Regularization / Limited Data

WGAN-GP Improved training of wasserstein gans
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville.
NeurIPS 2017. [PDF] Cited:8625

The Numerics of GANs
Lars Mescheder, Sebastian Nowozin, Andreas Geiger
NeurIPS 2017. [PDF] Cited:440

R1-regularization Which Training Methods for GANs do actually Converge?
Lars Mescheder, Andreas Geiger, Sebastian Nowozin.
ICML 2018. [PDF] Cited:1348

SN-GAN Spectral Normalization for Generative Adversarial Networks.
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida.
ICLR 2018. [PDF] Cited:4071

CR-GAN Consistency regularization for generative adversarial networks.
Han Zhang, Zizhao Zhang, Augustus Odena, Honglak Lee.
ICLR 2020. [PDF] Cited:255

Summary Motivation: GANs training is unstable. Traditional regularization methods introduce non-trivial computational overheads. Discrminator is easy to focus on local features instead of semantic information. Images of different semantic objects may be close in the discriminator's feature space due to their similarity in viewpoint.
Method: Restrict the discrminator's intermediate features to be consistent under data augmentations of the same image. The generator doesn't need to change.
Experiment: (1) Augmentation details: randomly shifting the image by a few pixels and randomly flipping the image horizontally. (2) Effect of CR: Improve FID of generated images. (3) Ablation Study: Training with data augmentation will prevent discriminator from overfitting on training data, but not improve FID. The author claim this is due to consistency regularization further enforce the discriminator to learn a semantic representation.

Differentiable Augmentation for Data-Efficient GAN Training.
Zhao Shengyu, Liu Zhijian, Lin Ji, Zhu Jun-Yan, Han Song.
NeurIPS 2020. [PDF] [Project]
Cited:528

ICR-GAN Improved consistency regularization for GANs.
Zhengli Zhao, Sameer Singh, Honglak Lee, Zizhao Zhang, Augustus Odena, Han Zhang.
AAAI 2021. [PDF] Cited:131

Summary Motivation: The consistency regularization will introduce artifacts into GANs sample correponding to
Method: 1. (bCR) In addition to CR, bCR also encourage discriminator output the same feature for generated image and its augmentation. 2. (zCR) zCR encourage discriminator insensitive to generated images with perturbed latent code, while encourage generator sensitive to that.
Experiment: the augmentation to image is same as CR-GAN, the augmentation to latent vector is guassian noise.

StyleGAN-ADA Training Generative Adversarial Networks with Limited Data.
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila.
NeurIPS 2020. [PDF] [Tensorflow] [Pytorch] Cited:1568

Gradient Normalization for Generative Adversarial Networks.
Yi-Lun Wu, Hong-Han Shuai, Zhi-Rui Tam, Hong-Yu Chiu.
ICCV 2021. [PDF] Cited:52

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data.
Liming Jiang, Bo Dai, Wayne Wu, Chen Change Loy.
NeurIPS 2021. [PDF] Cited:80

Metric

Inception-Score/IS Improved Techniques for Training GANs Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen.
NeurIPS 2016. [PDF] Cited:8114

FID, TTUR GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter.
NeurIPS 2017. [PDF] Cited:433

SWD Sliced Wasserstein Generative Models Jiqing Wu, Zhiwu Huang, Dinesh Acharya, Wen Li, Janine Thoma, Danda Pani Paudel, Luc Van Gool.
CVPR 2019. [PDF] Cited:0

Fast Convergence

FastGAN Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis
Bingchen Liu, Yizhe Zhu, Kunpeng Song, Ahmed Elgammal.
ICLR 2021. [PDF] Cited:192

ProjectedGAN Projected GANs Converge Faster
Axel Sauer, Kashyap Chitta, Jens Müller, Andreas Geiger
[PDF] [Project] [Pytorch] Cited:188

GAN Adaptation

Transferring GANs: generating images from limited data.
Yaxing Wang, Chenshen Wu, Luis Herranz, Joost van de Weijer, Abel Gonzalez-Garcia, Bogdan Raducanu.
ECCV 2018. [PDF] Cited:257

Image Generation From Small Datasets via Batch Statistics Adaptation.
Atsuhiro Noguchi, Tatsuya Harada.
ICCV 2019 [PDF] Cited:183

Freeze Discriminator: A Simple Baseline for Fine-tuning GANs.
Sangwoo Mo, Minsu Cho, Jinwoo Shin.
CVPRW 2020 [PDF] [Pytorch] Cited:192

Resolution dependant GAN interpolation for controllable image synthesis between domains.
Justin N. M. Pinkney, Doron Adler
NeruIPS workshop 2020. [PDF] Cited:125

Few-shot image generation with elastic weight consolidation. Yijun Li, Richard Zhang, Jingwan Lu, Eli Shechtman
NeruIPS 2020. [PDF] Cited:150

Minegan: effective knowledge transfer from gans to target domains with few images. Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, Joost van de Weijer
CVPR 2020. [PDF] Cited:169

One-Shot Domain Adaptation For Face Generation
Chao Yang, Ser-Nam Lim
CVPR 2020. [PDF] Cited:35

Unsupervised image-to-image translation via pre-trained StyleGAN2 network
Jialu Huang, Jing Liao, Sam Kwong
TMM 2021. [PDF] Cited:58

Few-shot Adaptation of Generative Adversarial Networks
Esther Robb, Wen-Sheng Chu, Abhishek Kumar, Jia-Bin Huang.
arxiv 2020 [PDF] Cited:85

AgileGAN: stylizing portraits by inversion-consistent transfer learning.
Guoxian Song, Linjie Luo, Jing Liu, Wan-Chun Ma, Chunpong Lai, Chuanxia Zheng, Tat-Jen Cham
TOG/SIGGRAPH 2021. [PDF] [Project]

Few-shot Image Generation via Cross-domain Correspondence
Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang.
CVPR 2021. [PDF] Cited:214

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Rinon Gal, Or Patashnik, Haggai Maron, Gal Chechik, Daniel Cohen-Or.
arxiv 2021 [PDF] [Project] Cited:161

Stylealign: Analysis and Applications of Aligned StyleGAN Models
Zongze Wu, Yotam Nitzan, Eli Shechtman, Dani Lischinski
ICLR 2022. [PDF] Cited:47

One-Shot Generative Domain Adaptation
Ceyuan Yang, Yujun Shen*, Zhiyi Zhang, Yinghao Xu, Jiapeng Zhu, Zhirong Wu, Bolei Zhou
arXiv 2021. [PDF] Cited:41

Mind the Gap: Domain Gap Control for Single Shot Domain Adaptation for Generative Adversarial Networks
Peihao Zhu, Rameen Abdal, John Femiani, Peter Wonka
ICLR 2022. [PDF] Cited:76

Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment
Jiayu Xiao, Liang Li, Chaofei Wang, Zheng-Jun Zha, Qingming Huang
CVPR 2022. [PDF] Cited:55

JoJoGAN: One Shot Face Stylization
Min Jin Chong, David Forsyth
arxiv 2022. [PDF] Cited:59

When why and which pretrained GANs are useful?
Timofey Grigoryev, Andrey Voynov, Artem Babenko
ICLR 2022. [PDF]

CtlGAN: Few-shot Artistic Portraits Generation with Contrastive Transfer Learning
Yue Wang, Ran Yi, Ying Tai, Chengjie Wang, and Lizhuang Ma
arxiv 2022. [PDF] Cited:12

One-Shot Adaptation of GAN in Just One CLIP
Gihyun Kwon, Jong Chul Ye
arxiv 2022. [PDF] Cited:33

A Closer Look at Few-shot Image Generation
Yunqing Zhao, Henghui Ding, Houjing Huang, Ngai-Man Cheung
CVPR 2022. [PDF] Cited:54

Diffusion Guided Domain Adaptation of Image Generators
Kunpeng Song, Ligong Han, Bingchen Liu, Dimitris Metaxas, Ahmed Elgammal
arxiv 2022. [PDF] Cited:27

Domain Expansion of Image Generators
Yotam Nitzan, Michaël Gharbi, Richard Zhang, Taesung Park, Jun-Yan Zhu, Daniel Cohen-Or, Eli Shechtman
arxiv 2023. [PDF] Cited:12

VAEs

Variational Inference with Normalizing Flows
Danilo Jimenez Rezende, Shakir Mohamed
ICML 2015. [PDF] Cited:3619

Improved Variational Inference with Inverse Autoregressive Flow
Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
NeurIPS 2016. [PDF] Cited:1681

NVAE: A Deep Hierarchical Variational Autoencoder
Arash Vahdat, Jan Kautz
NeurIPS 2020. [PDF] Cited:735

Other Generative Models

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski
CVPR 2017. [PDF] Cited:622

GLO Optimizing the Latent Space of Generative Networks
Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam
ICML 2018. [PDF] Cited:393

Non-Adversarial Image Synthesis with Generative Latent Nearest Neighbors
Yedid Hoshen, Jitendra Malik
CVPR 2019. [PDF] Cited:55

Latent Interpolation

Sampling generative networks: Notes on a few effective techniques.
Tom White.
arxiv 2016 [PDF] Cited:71

Latent space oddity: on the curvature of deep generative models
Georgios Arvanitidis, Lars Kai Hansen, Søren Hauberg.
ICLR 2018. [PDF] Cited:233

Feature-Based Metrics for Exploring the Latent Space of Generative Models
Samuli Laine.
ICLR 2018 Workshop. [PDF]

Two-stage Generation Models

VQ-VAE Neural Discrete Representation Learning
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
NeurIPS 2017. [PDF] Cited:3523

VQ-VAE-2 Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi, Aaron van den Oord, Oriol Vinyals
NeurIPS 2019. [PDF] Cited:1392

VQGAN Taming Transformers for High-Resolution Image Synthesis
Patrick Esser, Robin Rombach, Björn Ommer.
CVPR 2021. [PDF] [Project] Cited:1969

DALLE Zero-Shot Text-to-Image Generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever.
ICML 2021. [PDF] Cited:3626

The Image Local Autoregressive Transformer
Chenjie Cao, Yuxin Hong, Xiang Li, Chengrong Wang, Chengming Xu, XiangYang Xue, Yanwei Fu
NeruIPS 2021. [PDF] Cited:12

MaskGIT MaskGIT: Masked Generative Image Transformer
Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman
arxiv 2022. [PDF] Cited:357

VQGAN-CLIP VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Katherine Crowson, Stella Biderman, Daniel Kornis, Dashiell Stander, Eric Hallahan, Louis Castricato, Edward Raff
arxiv 2021. [PDF] Cited:304

ASSET Autoregressive Semantic Scene Editing with Transformers at High Resolutions
Difan Liu, Sandesh Shetty, Tobias Hinz, Matthew Fisher, Richard Zhang, Taesung Park, Evangelos Kalogerakis
SIGGRAPH 2022. [Pytorch]

CLIP-GEN CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP
Zihao Wang, Wei Liu, Qian He, Xinglong Wu, Zili Yi
arxiv 2022. [PDF] Cited:59

PUT Reduce Information Loss in Transformers for Pluralistic Image Inpainting
Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu
CVPR 2022. [Pytorch]

High-Quality Pluralistic Image Completion via Code Shared VQGAN
Chuanxia Zheng, Guoxian Song, Tat-Jen Cham, Jianfei Cai, Dinh Phung, Linjie Luo
arxiv 2022. [PDF] Cited:8

L-Verse: Bidirectional Generation Between Image and Text
Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae
CVPR 2022. [PDF] Cited:21

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes.
Sam Bond-Taylor, Peter Hessey, Hiroshi Sasaki, Toby P. Breckon, Chris G. Willcocks
arxiv 2021. [PDF] Cited:54

MaskGIT MaskGIT: Masked Generative Image Transformer
Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman
arxiv 2022. [PDF] Cited:357

Imagebart: Bidirectional context with multinomial diffusion for autoregressive image synthesis.
Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer
NeruIPS 2021. [PDF] Cited:128

Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo
CVPR 2022. [PDF] Cited:552

Improved Vector Quantized Diffusion Models
Zhicong Tang, Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen
arxiv 2022. [PDF] Cited:50

Text2Human Text2Human: Text-Driven Controllable Human Image Generation
Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei Liu
SIGGRAPH 2022. [PDF] Cited:30

RQ-VAE Autoregressive image generation using residual quantization
Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, Wook-Shin Han
CVPR 2022. [PDF] Cited:149

Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Image Manipulation with Deep Generative Model

GAN Inversion

iGAN Generative Visual Manipulation on the Natural Image Manifold
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros.
ECCV 2016. [PDF] [github] Cited:1341

IcGAN Invertible Conditional GANs for image editing
Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, Jose M. Álvarez
NIPS 2016 Workshop. [PDF] Cited:626

Neural photo editing with introspective adversarial networks
Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston.
ICLR 2017. [PDF] Cited:442

Inverting The Generator of A Generative Adversarial Network.
Antonia Creswell, Anil Anthony Bharath.
NeurIPS 2016 Workshop. [PDF] Cited:309

GAN Paint Semantic Photo Manipulation with a Generative Image Prior
David Bau, Hendrik Strobelt, William Peebles, Jonas Wulff, Bolei Zhou, Jun-Yan Zhu, Antonio Torralba.
SIGGRAPH 2019. [PDF] Cited:321

GANSeeing Seeing What a GAN Cannot Generate.
David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, Antonio Torralba.
ICCV 2019. [PDF] Cited:275

summary Summary: To see what a GAN cannot generate (mode collapse problem), this paper first inspects the distribution of semantic classes of generated images compared with groundtruth images. Sencond, by inverting images, the failure cases of image instances can be directly observed.
Class Distribution-Level Mode Collapse: StyleGAN outperforms WGAN-GP.
Instance Level Mode Collapse with GAN Inversion: (1) Use intermediate features instead of initial latent code as the optimization target. (2) Propose layer-wise inversion to learn the encoder for inversion, note this inversion output z coe. (3) Use restirction on z code to regularilize the inversion of intermediate feature.
Experiment: (1) Directly optimization on z not work. (2) encoder + optimization works better (3) Layer-wise inversion obviously better.
Limitation: Layer-wise inversion is not performed on StyleGAN.

Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?
Rameen Abdal, Yipeng Qin, Peter Wonka.
ICCV 2019. [PDF] Cited:986

Image2StyleGAN++: How to Edit the Embedded Images?
Rameen Abdal, Yipeng Qin, Peter Wonka.
CVPR 2020. [PDF] Cited:502

IDInvert In-Domain GAN Inversion for Real Image Editing
Jiapeng Zhu, Yujun Shen, Deli Zhao, Bolei Zhou.
ECCV 2020. [PDF] Cited:592

summary Motivation: Traditional GAN Inversion method train the encoder in the latent space via optimizing the distance to |E(G(z))-z|. However, the gradient to encoder is agnostic about the semantic distribution of generator's latent space. (For example, latent code far from mean vector is less editable.) This paper first train a domain-guided encoder, and then propose domain-regularized optimization by involving the encoder as a regularizer to finetune the code produced by the encoder and better recover the target image.
Method: (1) Objective for training encoder: MSE loss and perceptual loss for reconstructed real image, adversarial loss. (2) Objective for refining embeded code: perceptual loss and MSE for reconstructed image, distance from to inverted code by encoder as regularization.

Experiment: (1) semantic analysis of inverted code: Train attribute boundry of inverted code with InterFaceGAN, compared with Image2StyleGAN, the Precision-Recall Curve performs betters. (2) Inversion Quality: Compared by FID, SWD, MSE and visual quality. (3) Application: Image Interpolation, Semantic Manipulation, Semantic Diffusion(Inversion of Composed image and then optimize with only front image), Style Mixing (4) Ablation Study: Larger weight for encoder bias the optimization towards the domain constraint such that the inverted codes are more semantically meaningful. Instead, the cost is that the target image cannot be ideally recovered for per-pixel values.

Editing in Style: Uncovering the Local Semantics of GANs
Edo Collins, Raja Bala, Bob Price, Sabine Süsstrunk.
CVPR 2020. [PDF] [Pytorch] Cited:258

summary StyleGAN's style code controls the global style of images, so how to make local manipulation based on style code? Remeber that the style code is to modulate the variance of intermediate variations, different channels control different local semantic elements like noise and eyes. So we can identity the channel most correlated to the region of interest for local manipulation, and then replace value of source image style code of that channel with corresponding target channel.
Details: The corresponding between RoI and channel is measured by feature map magnitude within each cluster, and the cluster is calculated from spherical k-means on features in 32x32 layer. Limitation: This paper actually does local semantic swap, and interpolation is not available.

Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space
Jonas Wulff, Antonio Torralba
arxiv 2020. [PDF] Cited:43

Improved StyleGAN Embedding: Where are the Good Latents?
Peihao Zhu, Rameen Abdal, Yipeng Qin, John Femiani, Peter Wonka
arxiv 2020. [PDF] Cited:104

pix2latent Transforming and Projecting Images into Class-conditional Generative Networks
Minyoung Huh,Richard Zhang,Jun-Yan Zhu,Sylvain Paris,Aaron Hertzmann
ECCV 2020. [PDF] Cited:103

pSp,pixel2style2pixel Encoding in style: a stylegan encoder for image-to-image translation.
CVPR 2021. [PDF] [Pytorch] Cited:962

e4e, encode for editing Designing an encoder for StyleGAN image manipulation.
Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, Daniel Cohen-Or.
SIGGRAPH 2021. [PDF] Cited:653

ReStyle Restyle: A residual-based stylegan encoder via iterative refinement.
Yuval Alaluf, Or Patashnik, Daniel Cohen-Or.
ICCV 2021. [PDF] [Project] Cited:310

Collaborative Learning for Faster StyleGAN Embedding.
Shanyan Guan, Ying Tai, Bingbing Ni, Feida Zhu, Feiyue Huang, Xiaokang Yang.
arxiv 2020. [PDF] Cited:98

Summary 1. Motivation: Traditional methods either use optimization based of learning based methods to get the embeded latent code. However, the optimization based method suffers from large time cost and is sensitive to initiialization. The learning based method get relative worse image quality due to the lack of direct supervision on latent code.
2. This paper introduce a collaborartive training process consisting of an learnable embedding network and an optimization-based iterator to train the embedding network. For each training batch, the embedding network firstly encode the images as initialization code of the iterator, then the iterator update 100 times to optimize MSE and LPIPS loss of generated images with target image, after that the updated embedding code is used as target signal to train the embedding network with latent code distance, image-level and feature-level loss.
3. The embedding network consists of a pretrained Arcface model as identity encoder, an attribute encoder built with ResBlock, the output identity feature and attribute feature are combined via linear modulation(denomarlization in SPADE). After that a Treeconnect(a sparse alterative to fully-connected layer) is used to output the final embeded code.

Pivotal Tuning for Latent-based Editing of Real Images
Daniel Roich, Ron Mokady, Amit H. Bermano, Daniel Cohen-Or.
arxiv 2021. [PDF] Cited:440

HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing.
Yuval Alaluf, Omer Tov, Ron Mokady, Rinon Gal, Amit H. Bermano.
CVPR 2022 [PDF] [Project] Cited:213

High-Fidelity GAN Inversion for Image Attribute Editing
Tengfei Wang, Yong Zhang, Yanbo Fan, Jue Wang, Qifeng Chen.
CVPR 2022. [PDF] Cited:211

Supervised GAN Manipulation

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba.
ICLR 2019. [PDF] [Project]. Cited:0

On the "steerability" of generative adversarial networks.
Ali Jahanian, Lucy Chai, Phillip Isola.
ICLR 2020. [PDF] [Project] [Pytorch] Cited:370

Controlling generative models with continuous factors of variations.
Antoine Plumerault, Hervé Le Borgne, Céline Hudelot.
ICLR 2020. [PDF] Cited:114

InterFaceGAN Interpreting the Latent Space of GANs for Semantic Face Editing
Yujun Shen, Jinjin Gu, Xiaoou Tang, Bolei Zhou.
CVPR 2020. [PDF] [Project] Cited:1007

Enjoy your editing: Controllable gans for image editing via latent space navigation
Peiye Zhuang, Oluwasanmi Koyejo, Alexander G. Schwing
ICLR 2021. [PDF] Cited:67

Only a matter of style: Age transformation using a style-based regression model.
Yuval Alaluf, Or Patashnik, Daniel Cohen-Or
SIGGRAPH 2021. [PDF] Cited:116

Discovering Interpretable Latent Space Directions of GANs Beyond Binary Attributes.
Huiting Yang, Liangyu Chai, Qiang Wen, Shuang Zhao, Zixun Sun, Shengfeng He.
CVPR 2021. [PDF]

StyleSpace StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
Zongze Wu, Dani Lischinski, Eli Shechtman.
CVPR 2021. [PDF] Cited:426

StyleFlow StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows
Rameen Abdal, Peihao Zhu, Niloy Mitra, Peter Wonka.
SIGGRAPH 2021. [PDF] Cited:462

A Latent Transformer for Disentangled Face Editing in Images and Videos.
Xu Yao, Alasdair Newson, Yann Gousseau, Pierre Hellier.
ICCV 2021. [PDF] [ArXiV] [Github] Cited:70

Controllable and Compositional Generation with Latent-Space Energy-Based Models.
Weili Nie, Arash Vahdat, Anima Anandkumar.
NeurIPS 2021. [PDF] Cited:65

EditGAN EditGAN: High-Precision Semantic Image Editing
Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler.
NeurIPS 2021. [PDF] Cited:172

StyleFusion StyleFusion: A Generative Model for Disentangling Spatial Segments
Omer Kafri, Or Patashnik, Yuval Alaluf, Daniel Cohen-Or
arxiv 2021. [PDF] Cited:34

Unsupervised GAN Manipulation

flowchart TD
  root(Unsupervised GAN Manipulation) --> A(Mutual information)
  root --> B[Generator Parameter]
  root --> C[Training Regularization]

  A --> E[Unsupervised Discovery. Voynov. ICML 2020]
  InfoGAN -- on pretrained network --> E
  E -- RBF Path --> Warped[WarpedGANSpace. Tzelepis. ICCV 2021]
  E -- Parameter Space --> NaviGAN[NaviGAN. Cherepkov. CVPR 2021]
  E -- Contrastive Loss --> DisCo[Disco. Ren. ICLR 2022]

  B -- PCA on Intermediate/W space --> GANSpace[GANSpace. Härkönen. NIPS 2020]
  GANSpace -- Closed-form Factorization of Weight --> SeFa[SeFa. Shen. CVPR 2021]
  GANSpace -- Spatial Transformation \n on intermediate Feature --> GANS[GAN Steerability. Eliezer. ICLR 2021]

  SeFa -- Variation for intermediate features --> VisualConcept[Visual Concept Vocabulary. Schwettmann. ICCV 2021]
Loading

Unsupervised Discovery of Interpretable Directions in the GAN Latent Space.
Andrey Voynov, Artem Babenko.
ICML 2020. [PDF] Cited:367

GANSpaceGANSpace: Discovering Interpretable GAN Controls
Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, Sylvain Paris.
NeurIPS 2020 [PDF] [Pytorch] Cited:805

The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement
William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, Antonio Torralba
ECCV 2020 [PDF] [Project] Cited:107

The Geometry of Deep Generative Image Models and its Applications
Binxu Wang, Carlos R. Ponce.
ICLR 2021. [PDF] Cited:39

GAN Steerability without optimization.
Nurit Spingarn-Eliezer, Ron Banner, Tomer Michaeli
ICLR 2021. [PDF] Cited:53

SeFa Closed-Form Factorization of Latent Semantics in GANs
Yujun Shen, Bolei Zhou.
CVPR 2021 [PDF] [Project] Cited:526

NaviGAN Navigating the GAN Parameter Space for Semantic Image Editing
Anton Cherepkov, Andrey Voynov, Artem Babenko.
CVPR 2021 [PDF] [Pytorch] Cited:60

EigenGAN: Layer-Wise Eigen-Learning for GANs.
Zhenliang He, Meina Kan, Shiguang Shan.
ICCV 2021. [PDF] [Github] Cited:43

Toward a Visual Concept Vocabulary for GAN Latent Space.
Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba.
ICCV 2021. [PDF] [Project]

WarpedGANSpace: Finding Non-linear RBF Paths in GAN Latent Space.
Christos Tzelepis, Georgios Tzimiropoulos, Ioannis Patras.
ICCV 2021. [PDF] [Github] Cited:53

OroJaR: Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation.
Yuxiang Wei, Yupeng Shi, Xiao Liu, Zhilong Ji, Yuan Gao, Zhongqin Wu, Wangmeng Zuo.
ICCV 2021. [PDF] [Github] Cited:45

Optimizing Latent Space Directions For GAN-based Local Image Editing.
Ehsan Pajouheshgar, Tong Zhang, Sabine Süsstrunk.
arxiv 2021. [PDF] [Pytorch] Cited:11

Discovering Density-Preserving Latent Space Walks in GANs for Semantic Image Transformations.
Guanyue Li, Yi Liu, Xiwen Wei, Yang Zhang, Si Wu, Yong Xu, Hau San Wong.
ACM MM 2021. [PDF]

Disentangled Representations from Non-Disentangled Models
Valentin Khrulkov, Leyla Mirvakhabova, Ivan Oseledets, Artem Babenko
arxiv 2021. [PDF] Cited:14

Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs.
Jaewoong Choi, Changyeon Yoon, Junho Lee, Jung Ho Park, Geonho Hwang, Myungjoo Kang.
ICLR 2022. [PDF] Cited:23

Disco Learning Disentangled Representation by Exploiting Pretrained Generative Models: A Contrastive Learning View
Xuanchi Ren, Tao Yang, Yuwang Wang, Wenjun Zeng
ICLR 2022. [PDF] Cited:26

Rayleigh EigenDirections (REDs): GAN latent space traversals for multidimensional features.
Guha Balakrishnan, Raghudeep Gadde, Aleix Martinez, Pietro Perona.
arxiv 2022. [PDF]

Low-Rank Subspaces in GANs
Jiapeng Zhu, Ruili Feng, Yujun Shen, Deli Zhao, Zhengjun Zha, Jingren Zhou, Qifeng Chen
NeurIPS 2021. [PDF] Cited:59

Region-Based Semantic Factorization in GANs
Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen.
arxiv 2022. [PDF] Cited:24

Fantastic Style Channels and Where to Find Them: A Submodular Framework for Discovering Diverse Directions in GANs

CLIP based

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski
ICCV 2021. [PDF] [Pytorch] Cited:1007

TargetCLIPImage-Based CLIP-Guided Essence Transfer
Chefer, Hila and Benaim, Sagie and Paiss, Roni and Wolf, Lior
arxiv 2021. [PDF] Cited:45

CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders.
Kevin Frans, L.B. Soros, Olaf Witkowski.
Arxiv 2021. [PDF] Cited:160

CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions.
Omer Kafri, Or Patashnik, Yuval Alaluf, and Daniel Cohen-Or
arxiv 2021. [PDF] Cited:91

FEAT: Face Editing with Attention
Xianxu Hou, Linlin Shen, Or Patashnik, Daniel Cohen-Or, Hui Huang
arxiv 2021. [PDF] Cited:16

StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation
Peter Schaldenbrand, Zhixuan Liu, Jean Oh
NeurIPS 2021 Workshop. [PDF]

CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon, Jong Chul Ye
CVPR 2022. [PDF] Cited:190

HairCLIP: Design Your Hair by Text and Reference Image
Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Yuan, Weiming Zhang, Nenghai Yu
CVPR 2022. [PDF] Cited:81

CLIPasso: Semantically-Aware Object Sketching
Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, Ariel Shamir
arxiv 2022. [PDF] Cited:42

Inversion-based Animation

A good image generator is what you need for high-resolution video synthesis
Yu Tian, Jian Ren, Menglei Chai, Kyle Olszewski, Xi Peng, Dimitris N. Metaxas, Sergey Tulyakov.
ICLR 2021. [PDF] Cited:163

Latent Image Animator: Learning to animate image via latent space navigation.
Yaohui Wang, Di Yang, Francois Bremond, Antitza Dantcheva.
ICLR 2022. [PDF]

Image-to-Image Translation

Supervised Image Translation

pix2pix Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros.
CVPR 2017. [PDF] Cited:17644

Semantic Image Synthesis

CRN Photographic Image Synthesis with Cascaded Refinement Networks
Qifeng Chen, Vladlen Koltun.
ICCV 2017. [PDF] Cited:905

pix2pixHD High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro.
CVPR 2018. [PDF] Cited:3580

SPADE Semantic Image Synthesis with Spatially-Adaptive Normalization
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu.
CVPR 2019. [PDF] Cited:2379

SEAN SEAN: Image Synthesis with Semantic Region-Adaptive Normalization
Peihao Zhu, Rameen Abdal, Yipeng Qin, Peter Wonka.
CVPR 2020. [PDF] Cited:407

You Only Need Adversarial Supervision for Semantic Image Synthesis
Vadim Sushko, Edgar Schönfeld, Dan Zhang, Juergen Gall, Bernt Schiele, Anna Khoreva.
ICLR 2021. [PDF] Cited:156

Diverse Semantic Image Synthesis via Probability Distribution Modeling
Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Bin Liu, Gang Hua, Nenghai Yu.
CVPR 2021. [PDF] Cited:58

Efficient Semantic Image Synthesis via Class-Adaptive Normalization
Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu.
TPAMI 2021. [PDF]

Spatially-adaptive pixelwise networks for fast image translation.
Tamar Rott Shaham, Michael Gharbi, Richard Zhang, Eli Shechtman, Tomer Michaeli
CVPR 2021. [PDF] Cited:63

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
Jie Liang, Hui Zeng, Lei Zhang.
CVPR 2021. [PDF] Cited:80

Image Inpainting

Context encoders: Feature learning by inpainting.
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros
CVPR 2016. [PDF] Cited:4929

Globally and locally consistent image completion.
Satoshi Iizuka, Edgar Simo-Serra, Hiroshi Ishikawa
SIGGRAPH 2017. [PDF]

Semantic image inpainting with deep generative models.
Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do CVPR 2017. [PDF] Cited:1130

High-resolution image inpainting using multiscale neural patch synthesis
Chao Yang, Xin Lu, Zhe Lin, Eli Shechtman, Oliver Wang, Hao Li
CVPR 2017. [PDF] Cited:747

Spg-net: Segmentation prediction and guidance network for image inpainting.
Yuhang Song, Chao Yang, Yeji Shen, Peng Wang, Qin Huang, C.-C. Jay Kuo
BMVC 2018. [PDF] Cited:165

Generative image inpainting with contextual attention
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang
CVPR 2018. [PDF] Cited:2069

Free-form image inpainting with gated convolution.
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas Huang
ICCV 2019. [PDF] Cited:1519

Edgeconnect: Generative image inpainting with adversarial edge learning.
Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Z. Qureshi, Mehran Ebrahimi
ICCV 2019. [PDF] Cited:630

Pluralistic Image Completion
Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
CVPR 2019. [PDF] Cited:418

Rethinking image inpainting via a mutual encoder-decoder with feature equalizations.
Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, Chao Yang
ECCV 2020. [PDF] Cited:242

High-Fidelity Pluralistic Image Completion with Transformers
Ziyu Wan, Jingbo Zhang, Dongdong Chen, Jing Liao
ICCV 2021. [PDF] Cited:180

Reduce Information Loss in Transformers for Pluralistic Image Inpainting
Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu
CVPR 2022. [PDF] Cited:55

Attribute Editing

Deep Identity-Aware Transfer of Facial Attributes
Mu Li, Wangmeng Zuo, David Zhang
arxiv 2016. [PDF] Cited:145

Others

Various Applications

Sketch Your Own GAN
Sheng-Yu Wang, David Bau, Jun-Yan Zhu
ICCV 2021. [PDF] Cited:63

Super-resolution

Example based image translation

Unsupervised Image Transaltion

Swapping Based

High-Resolution Daytime Translation Without Domain Labels
I. Anokhin, P. Solovev, D. Korzhenkov, A. Kharlamov, T. Khakhulin, A. Silvestrov, S. Nikolenko, V. Lempitsky, and G. Sterkin.
CVPR 2020. [PDF] Cited:65

Information Bottleneck Disentanglement for Identity Swapping
Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He
CVPR 2021. [PDF]

Swapping Autoencoder for Deep Image Manipulation
Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang
NeurIPS 2020. [PDF] Cited:297

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing
Guoxing Yang, Nanyi Fei, Mingyu Ding, Guangzhen Liu, Zhiwu Lu, Tao Xiang
CVPR 2021. [PDF]

Cycle-Consistency Based

Coupled Generative Adversarial Networks
Ming-Yu Liu, Oncel Tuzel.
NeurIPS 2016 [PDF]

UNIT Unsupervised Image-to-Image Translation Networks.
Ming-Yu Liu,Thomas Breuel,Jan Kautz
NeurIPS 2017. [PDF] Cited:2575

CycleGAN Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros.
ICCV 2017. [PDF] Cited:5530

DiscoGAN Learning to Discover Cross-Domain Relations with Generative Adversarial Networks
Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, Jiwon Kim.
ICML 2017. [PDF] Cited:1898

DualGAN DualGAN: Unsupervised Dual Learning for Image-to-Image Translation
Zili Yi, Hao Zhang, Ping Tan, Minglun Gong.
ICCV 2017. [PDF] Cited:1858

BicycleGAN Toward Multimodal Image-to-Image Translation
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman.
NeurIPS 2017. [PDF] Cited:1276

MUNIT Multimodal Unsupervised Image-to-Image Translation
Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz.
ECCV 2018. [PDF] Cited:2288

DRIT Diverse Image-to-Image Translation via Disentangled Representations
Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Kumar Singh, Ming-Hsuan Yang.
ECCV 2018. [PDF] Cited:825

Augmented cyclegan: Learning many-to-many mappings from unpaired data. Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, Aaron Courville.
ICML 2018. [PDF] Cited:400

MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation.
Sanghyeon Na, Seungjoo Yoo, Jaegul Choo.
BMVC 2020. [PDF] Cited:16

MSGAN Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis
Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan Yang.
CVPR 2019. [PDF] Cited:371

U-GAT-IT U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
Junho Kim, Minjae Kim, Hyeonwoo Kang, Kwanghee Lee
ICLR 2020. [PDF] Cited:489

UVC-GAN UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation
Dmitrii Torbunov, Yi Huang, Haiwang Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, Yihui Ren
arxiv 2022. [PDF] Cited:46

Beyond Cycle-consistency

DistanceGAN One-Sided Unsupervised Domain Mapping
Sagie Benaim, Lior Wolf
NIPS 2017. [PDF] Cited:279

Council-GAN Breaking the Cycle - Colleagues are all you need
Ori Nizan , Ayellet Tal
CVPR 2020. [PDF]

ACL-GAN Unpaired Image-to-Image Translation using Adversarial Consistency Loss
Yihao Zhao, Ruihai Wu, Hao Dong.
ECCV 2020. [PDF] Cited:97

CUT Contrastive Learning for Unpaired Image-to-Image Translation
Taesung Park, Alexei A. Efros, Richard Zhang, Jun-Yan Zhu.
ECCV 2020. [PDF] Cited:959

The spatially-correlative loss for various image translation tasks
Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai.
CVPR 2021. [PDF]

Unsupervised Image-to-Image Translation with Generative Prior
Shuai Yang, Liming Jiang, Ziwei Liu and Chen Change Loy.
CVPR 2022. [PDF] Cited:27

Multi-domain

StarGAN StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, Jaegul Choo
CVPR 2018. [PDF] Cited:3277

DRIT++ DRIT++: Diverse Image-to-Image Translation via Disentangled Representations
Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, Ming-Hsuan Yang.
IJCV 2019. [PDF] Cited:761

StarGANv2 StarGAN v2: Diverse Image Synthesis for Multiple Domains
Yunjey Choi, Youngjung Uh, Jaejun Yoo, Jung-Woo Ha
CVPR 2020. [PDF] Cited:1464

Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation
Yahui Liu, Enver Sangineto, Yajing Chen, Linchao Bao, Haoxian Zhang, Nicu Sebe, Bruno Lepri, Wei Wang, Marco De Nadai
CVPR 2021. [PDF] Cited:38

A Style-aware Discriminator for Controllable Image Translation
Kunhee Kim, Sanghun Park, Eunyeong Jeon, Taehun Kim, Daijin Kim
CVPR 2022. [PDF] Cited:20

DualStyleGAN Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
Shuai Yang, Liming Jiang, Ziwei Liu and Chen Change Loy
CVPR 2022. [Pytorch]

Others

Unsupervised Cross-Domain Image Generation
Yaniv Taigman, Adam Polyak, Lior Wolf
ICLR 2017. [PDF] Cited:964

Few-shot Image Translation

FUNIT Few-shot unsupervised image-to-image translation.
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz
ICCV 2019. [PDF] Cited:556

Coco-funit:Few-shot unsupervised image translation with a content conditioned style encoder.
Kuniaki Saito, Kate Saenko, Ming-Yu Liu
ECCV 2020. [PDF] Cited:77

Few-shot Image Generation

Attribute Group Editing for Reliable Few-shot Image Generation.
Guanqi Ding, Xinzhe Han, Shuhui Wang, Shuzhe Wu, Xin Jin, Dandan Tu, Qingming Huang
CVPR 2022. [PDF] Cited:16

Style Transfer

WCT Universal Style Transfer via Feature Transforms
Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang.
NeruIPS 2017. [PDF] Cited:873

Style transfer by relaxed optimal transport and self-similarity.
Nicholas Kolkin, Jason Salavon, Greg Shakhnarovich.
CVPR 2019. [PDF] Cited:246

A Closed-Form Solution to Universal Style Transfer
Ming Lu, Hao Zhao, Anbang Yao, Yurong Chen, Feng Xu, Li Zhang
ICCV 2019. [PDF] Cited:70

Neural Neighbor Style Transfer
Nicholas Kolkin, Michal Kucera, Sylvain Paris, Daniel Sykora, Eli Shechtman, Greg Shakhnarovich
arxiv 2022. [PDF] Cited:21

Others

GANgealingGAN-Supervised Dense Visual Alignment
William Peebles, Jun-Yan Zhu, Richard Zhang, Antonio Torralba, Alexei Efros, Eli Shechtman.
arxiv 2021. [PDF] Cited:59

Text-to-Image Synthesis

End-to-end Training Based

Generating images from captions with attention.
Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov.
ICLR 2016. [PDF] Cited:412

Generative Adversarial Text to Image Synthesis
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee>.
ICML 2016. [PDF] Cited:2931

StackGAN StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas.
ICCV 2017. [PDF] Cited:2550

StackGAN++ StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
TPAMI 2018. [PDF] Cited:964

MirrorGAN: Learning Text-to-image Generation by Redescription
Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao
CVPR 2019. [PDF] Cited:491

AttnGAN AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He.
CVPR 2018. [PDF] Cited:1518

DM-GAN DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang
CVPR 2019. [PDF] Cited:505

SD-GAN Semantics Disentangling for Text-to-Image Generation
Guojun Yin, Bin Liu, Lu Sheng, Nenghai Yu, Xiaogang Wang, Jing Shao
CVPR 2019. [PDF] Cited:166

DF-GAN A Simple and Effective Baseline for Text-to-Image Synthesis
Ming Tao, Hao Tang, Fei Wu, Xiaoyuan Jing, Bingkun Bao, Changsheng Xu.
CVPR 2022. [PDF] Cited:158

Text to Image Generation with Semantic-Spatial Aware GAN
Kai Hu, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn
CVPR 2022. [PDF] Cited:81

TextFace: Text-to-Style Mapping based Face Generation and Manipulation
Hou, Xianxu, Zhang Xiaokang, Li Yudong, Shen Linlin
TMM 2022. [PDF]

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization
Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su, Qiang Liu
arxiv 2021. [PDF] Cited:71

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, Timo Aila
arxiv 2023. [PDF] Cited:141

GigaGANScaling up GANs for Text-to-Image Synthesis
Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park CVPR 2023. [PDF] Cited:289

DALLE Zero-Shot Text-to-Image Generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever.
ICML 2021. [PDF] Cited:3626

GLIDE GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen
arxiv 2021. [PDF] [Pytorch]

DALLE2 Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
OpenAI 2022. [PDF]

L-Verse: Bidirectional Generation Between Image and Text
Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae
CVPR 2022. [PDF] Cited:21

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP
Zihao Wang, Wei Liu, Qian He, Xinglong Wu, Zili Yi
arxiv 2022. [PDF] Cited:59

Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan
arxiv 2023. [PDF] Cited:360

Multimodal Pretraining Based

Pretraining is All You Need for Image-to-Image Translation
Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, Fang Wen
arxiv 2022. [PDF] Cited:142

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan
ECCV 2022. [PDF] Cited:246

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan
NIPS 2022. [PDF] Cited:52

Text-guided Image Editing

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon
ICLR 2022. [PDF] Cited:904

Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami, Dani Lischinski, Ohad Fried
CVPR 2022. [PDF] Cited:648

DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Models
Gwanghyun Kim, Taesung Kwon, Jong Chul Ye
CVPR 2022. [PDF] Cited:443

Text2LIVE: text-driven layered image and video editing.
Bar-Tal, Omer and Ofri-Amar, Dolev and Fridman, Rafail and Kasten, Yoni and Dekel, Tali
arxiv 2022. [PDF] Cited:244

Textual Inversion An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
arxiv 2022. [PDF] Cited:1166

DreamBooth DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman
arxiv 2022. [PDF] Cited:1706

Prompt-to-Prompt Image Editing with Cross-Attention Control
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or
ICLR 2023. [PDF]

Imagic: Text-Based Real Image Editing with Diffusion Models
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, Michal Irani
arxiv 2022. [PDF] Cited:718

UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image
Dani Valevski, Matan Kalman, Yossi Matias, Yaniv Leviathan
arxiv 2022. [PDF] Cited:19

InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks, Aleksander Holynski, Alexei A. Efros
arxiv 2022. [PDF] Cited:988

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
Qiucheng Wu, Yujian Liu, Handong Zhao, Ajinkya Kale, Trung Bui, Tong Yu, Zhe Lin, Yang Zhang, Shiyu Chang
arxiv 2022. [PDF] Cited:62

Multi-Concept Customization of Text-to-Image Diffusion
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu
arxiv 2022. [PDF] Cited:498

Zero-shot Image-to-Image Translation
[Project]

Null-text Inversion for Editing Real Images using Guided Diffusion Models
[PDF] [Project] Cited:476

Text-to-Video

Imagen video: High definition video generation with diffusion models
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans
arxiv 2022. [PDF] [Project] Cited:979

Video diffusion models.
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet
arxiv 2022. [PDF] Cited:888

Make-a-video: Text-to-video generation without text-video data
Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman
arxiv 2022. [PDF] Cited:854

Tune-A-Video: Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou
arxiv 2022. [PDF] Cited:431

Others

Single Image Generation

DIP Deep Image Prior
Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky.
CVPR 2018 [PDF] [Project] Cited:2732

SinGAN SinGAN: Learning a Generative Model from a Single Natural Image
Tamar Rott Shaham, Tali Dekel, Tomer Michaeli.
ICCV 2019 Best Paper. [PDF] [Project] Cited:754

TuiGAN TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images
Jianxin Lin, Yingxue Pang, Yingce Xia, Zhibo Chen, Jiebo Luo.
ECCV 2020. [PDF] Cited:54

DeepSIM Image Shape Manipulation from a Single Augmented Training Sample
Yael Vinker, Eliahu Horwitz, Nir Zabari , Yedid Hoshen.
ICCV 2021. [PDF] [Project] [Pytorch] Cited:16

Semi-supervised Learning with GAN

SemanticGAN Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, Sanja Fidler.
CVPR 2021. [PDF] Cited:151

DatasetGAN DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler.
CVPR 2021. [PDF] Cited:279

Miscellaneous

SemanticStyleGAN SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Yichun Shi, Xiao Yang, Yangyue Wan, Xiaohui Shen.
arxiv 2021. [PDF] Cited:67

Learning to generate line drawings that convey geometry and semantics
Caroline Chan, Fredo Durand, Phillip Isola.
arxiv 2022. [PDF] Cited:58

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks.
Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, Jeff Clune.
NIPS 2016. [PDF] Cited:651

Generating Images with Perceptual Similarity Metrics based on Deep Networks.
Alexey Dosovitskiy, Thomas Brox
NIPS 2016. [PDF] Cited:1082

VectorFusion Text-to-SVG by Abstracting Pixel-Based Diffusion Models
Ajay Jain, Amber Xie, Pieter Abbeel
arxiv 2022. [PDF] Cited:56