first commit

hanzhanggit · Dec 22, 2016 · aeb60b0 · aeb60b0
commit aeb60b0
Show file tree

Hide file tree

Showing 49 changed files with 3,861 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,3 @@
+*.pyc
+ckt_logs
+backup
diff --git a/Data/.gitignore b/Data/.gitignore
@@ -0,0 +1,3 @@
+*
+!README.md
+!.gitignore
diff --git a/Data/README.md b/Data/README.md
@@ -0,0 +1,12 @@
+**Data**
+
+1. Download our preprocessed char-CNN-RNN text embeddings for [birds](https://drive.google.com/open?id=0B3y_msrWZaXLT1BZdVdycDY5TEE) and [flowers](https://drive.google.com/open?id=0B3y_msrWZaXLaUc0UXpmcnhaVmM) and save them to `Data/`.
+  - [Optional] Follow the instructions [here](https://github.com/reedscot/icml2016) to download the pretrained char-CNN-RNN text encoders and extract your own text embeddings.
+2. Download the [birds](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) and [flowers](http://www.robots.ox.ac.uk/~vgg/data/flowers/102/) image data. Extract them to `Data/birds/` and `Data/flowers/`, respectively.
+3. Preprocess images.
+  - For birds: `python ./misc/preprocess_birds.py`
+  - For flowers: `python ./misc/preprocess_flowers.py`
+
+
+**Skip-thought Vocabulary**
+- [Download](https://github.com/ryankiros/skip-thoughts) vocabulary for skip-thought vectors to `Data/`.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2016 hanzhanggit
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,91 @@
+# StackGAN
+Code for reproducing main results in the paper [StackGAN: Text to Photo-realistic Image Synthesis
+with Stacked Generative Adversarial Networks](https://arxiv.org/pdf/1612.03242v1.pdf) by Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, Dimitris Metaxas.
+
+<img src="examples/framework.png" width="700px" height="370px"/>
+
+
+### Dependencies
+[TensorFlow](https://www.tensorflow.org/get_started/os_setup)
+
+[Optional] [Torch](http://torch.ch/docs/getting-started.html#_) is needed, if use the pre-trained char-CNN-RNN text encoder.
+
+[Optional] [skip-thought](https://github.com/ryankiros/skip-thoughts) is needed, if use the skip-thought text encoder.
+
+In addition, please add the project folder to PYTHONPATH and `pip install` the following packages:
+- `prettytensor`
+- `progressbar`
+- `python-dateutil`
+- `easydict`
+- `pandas`
+- `torchfile`
+
+
+
+**Data**
+
+1. Download our preprocessed char-CNN-RNN text embeddings for [birds](https://drive.google.com/open?id=0B3y_msrWZaXLT1BZdVdycDY5TEE) and [flowers](https://drive.google.com/open?id=0B3y_msrWZaXLaUc0UXpmcnhaVmM) and save them to `Data/`.
+  - [Optional] Follow the instructions [reedscot/icml2016](https://github.com/reedscot/icml2016) to download the pretrained char-CNN-RNN text encoders and extract text embeddings.
+2. Download the [birds](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) and [flowers](http://www.robots.ox.ac.uk/~vgg/data/flowers/102/) image data. Extract them to `Data/birds/` and `Data/flowers/`, respectively.
+3. Preprocess images.
+  - For birds: `python misc/preprocess_birds.py`
+  - For flowers: `python misc/preprocess_flowers.py`
+
+
+
+**Training**
+- The steps to train a StackGAN model on the CUB dataset using our preprocessed data for birds.
+  - Step 1: train Stage-I GAN (e.g., for 600 epochs) `python stageI/run_exp.py --cfg stageI/cfg/birds.yml --gpu 0`
+  - Step 2: train Stage-II GAN (e.g., for another 600 epochs) `python stageII/run_exp.py --cfg stageII/cfg/birds.yml --gpu 1`
+- Change `birds.yml` to `flowers.yml` to train a StackGAN model on Oxford-102 dataset using our preprocessed data for flowers.
+- `*.yml` files are example configuration files for training/testing our models.
+- If you want to try your own datasets, [here](https://github.com/soumith/ganhacks) are some good tips about how to train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex datasets.
+
+
+
+**Pretrained Model**
+- [StackGAN for birds](https://drive.google.com/open?id=0B3y_msrWZaXLNUNKa3BaRjAyTzQ) trained from char-CNN-RNN text embeddings. Download and save it to `models/`.
+- [StackGAN for flowers](https://drive.google.com/open?id=0B3y_msrWZaXLX01FMC1JQW9vaFk) trained from char-CNN-RNN text embeddings. Download and save it to `models/`.
+- [StackGAN for birds](https://drive.google.com/open?id=0B3y_msrWZaXLZVNRNFg4d055Q1E) trained from skip-thought text embeddings. Download and save it to `models/` (Just used the same setting as the char-CNN-RNN. We assume better results can be achieved by playing with the hyper-parameters).
+
+
+
+**Run Demos**
+- Run `sh demo/flowers_demo.sh` to generate flower samples from sentences. The results will be saved to `Data/flowers/example_captions/`. (Need to [download](https://drive.google.com/file/d/0B0ywwgffWnLLZUt0UmQ1LU1oWlU/view) the char-CNN-RNN text encoder for flowers to `models/text_encoder/`. Note: this text encoder is provided by [reedscot/icml2016](https://github.com/reedscot/icml2016)).
+- Run `sh demo/birds_demo.sh` to generate bird samples from sentences. The results will be saved to `Data/birds/example_captions/`.(Need to [download](https://drive.google.com/file/d/0B0ywwgffWnLLU0F3UHA3NzFTNEE/view) the char-CNN-RNN text encoder for birds to `models/text_encoder/`. Note: this text encoder is provided by [reedscot/icml2016](https://github.com/reedscot/icml2016)).
+- Run `python demo/birds_skip_thought_demo.py --cfg demo/cfg/birds-skip-thought-demo.yml --gpu 2` to generate bird samples from sentences. The results will be saved to `Data/birds/example_captions-skip-thought/`. (Need to [download](https://github.com/ryankiros/skip-thoughts) vocabulary for skip-thought vectors to `Data/skipthoughts/`).
+
+Examples for birds (char-CNN-RNN embeddings), more on [youtube](https://youtu.be/93yaf_kE0Fg):
+![](examples/bird1.jpg)
+![](examples/bird2.jpg)
+![](examples/bird4.jpg)
+![](examples/bird3.jpg)
+
+
+Examples for flowers (char-CNN-RNN embeddings), more on [youtube](https://youtu.be/SuRyL5vhCIM):
+![](examples/flower1.jpg)
+![](examples/flower2.jpg)
+![](examples/flower3.jpg)
+![](examples/flower4.jpg)
+
+Save your favorite pictures generated by our models since the randomness from noise z and conditioning augmentation makes them creative enough to generate objects with different poses and viewpoints from the same discription :smiley:
+
+
+
+### Citing StackGAN
+If you find StackGAN useful in your research, please consider citing:
+
+```
+@article{han2016stackgan,
+  title={StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
+  author={Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaolei Huang and Xiaogang Wang and Dimitris Metaxas},
+  journal={arXiv:1612.03242},
+  year={2016}
+}
+```
+
+
+**References**
+
+- Generative Adversarial Text-to-Image Synthesis [Paper](https://arxiv.org/abs/1605.05396) [Code](https://github.com/reedscot/icml2016)
+- Learning Deep Representations of Fine-grained Visual Descriptions [Paper](https://arxiv.org/abs/1605.05395) [Code](https://github.com/reedscot/cvpr2016)
diff --git a/demo/birds_demo.sh b/demo/birds_demo.sh
@@ -0,0 +1,21 @@
+#
+# Extract text embeddings from the encoder
+#
+CUB_ENCODER=lm_sje_nc4_cub_hybrid_gru18_a1_c512_0.00070_1_10_trainvalids.txt_iter30000.t7 \
+CAPTION_PATH=Data/birds/example_captions \
+GPU=0 \
+
+export CUDA_VISIBLE_DEVICES=${GPU}
+
+net_txt=models/text_encoder/${CUB_ENCODER} \
+queries=${CAPTION_PATH}.txt \
+filenames=${CAPTION_PATH}.t7 \
+th demo/get_embedding.lua
+
+#
+# Generate image from text embeddings
+#
+python demo/demo.py \
+--cfg demo/cfg/birds-demo.yml \
+--gpu ${GPU} \
+--caption_path ${CAPTION_PATH}.t7
diff --git a/demo/birds_skip_thought_demo.py b/demo/birds_skip_thought_demo.py
@@ -0,0 +1,223 @@
+from __future__ import division
+from __future__ import print_function
+
+import prettytensor as pt
+import tensorflow as tf
+import numpy as np
+import scipy.misc
+import os
+import argparse
+from PIL import Image, ImageDraw, ImageFont
+
+from misc.config import cfg, cfg_from_file
+from misc.utils import mkdir_p
+from misc import skipthoughts
+from stageII.model import CondGAN
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='Train a GAN network')
+    parser.add_argument('--cfg', dest='cfg_file',
+                        help='optional config file',
+                        default=None, type=str)
+    parser.add_argument('--gpu', dest='gpu_id',
+                        help='GPU device id to use [0]',
+                        default=-1, type=int)
+    parser.add_argument('--caption_path', type=str, default=None,
+                        help='Path to the file with text sentences')
+    # if len(sys.argv) == 1:
+    #    parser.print_help()
+    #    sys.exit(1)
+    args = parser.parse_args()
+    return args
+
+
+def sample_encoded_context(embeddings, model, bAugmentation=True):
+    '''Helper function for init_opt'''
+    # Build conditioning augmentation structure for text embedding
+    # under different variable_scope: 'g_net' and 'hr_g_net'
+    c_mean_logsigma = model.generate_condition(embeddings)
+    mean = c_mean_logsigma[0]
+    if bAugmentation:
+        # epsilon = tf.random_normal(tf.shape(mean))
+        epsilon = tf.truncated_normal(tf.shape(mean))
+        stddev = tf.exp(c_mean_logsigma[1])
+        c = mean + stddev * epsilon
+    else:
+        c = mean
+    return c
+
+
+def build_model(sess, embedding_dim, batch_size):
+    model = CondGAN(
+        lr_imsize=cfg.TEST.LR_IMSIZE,
+        hr_lr_ratio=int(cfg.TEST.HR_IMSIZE/cfg.TEST.LR_IMSIZE))
+
+    embeddings = tf.placeholder(
+        tf.float32, [batch_size, embedding_dim],
+        name='conditional_embeddings')
+    with pt.defaults_scope(phase=pt.Phase.test):
+        with tf.variable_scope("g_net"):
+            c = sample_encoded_context(embeddings, model)
+            z = tf.random_normal([batch_size, cfg.Z_DIM])
+            fake_images = model.get_generator(tf.concat(1, [c, z]))
+        with tf.variable_scope("hr_g_net"):
+            hr_c = sample_encoded_context(embeddings, model)
+            hr_fake_images = model.hr_get_generator(fake_images, hr_c)
+
+    ckt_path = cfg.TEST.PRETRAINED_MODEL
+    if ckt_path.find('.ckpt') != -1:
+        print("Reading model parameters from %s" % ckt_path)
+        saver = tf.train.Saver(tf.all_variables())
+        saver.restore(sess, ckt_path)
+    else:
+        print("Input a valid model path.")
+    return embeddings, fake_images, hr_fake_images
+
+
+def drawCaption(img, caption):
+    img_txt = Image.fromarray(img)
+    # get a font
+    fnt = ImageFont.truetype('Pillow/Tests/fonts/FreeMono.ttf', 50)
+    # get a drawing context
+    d = ImageDraw.Draw(img_txt)
+
+    # draw text, half opacity
+    d.text((10, 256), 'Stage-I', font=fnt, fill=(255, 255, 255, 255))
+    d.text((10, 512), 'Stage-II', font=fnt, fill=(255, 255, 255, 255))
+    if img.shape[0] > 832:
+        d.text((10, 832), 'Stage-I', font=fnt, fill=(255, 255, 255, 255))
+        d.text((10, 1088), 'Stage-II', font=fnt, fill=(255, 255, 255, 255))
+
+    idx = caption.find(' ', 60)
+    if idx == -1:
+        d.text((256, 10), caption, font=fnt, fill=(255, 255, 255, 255))
+    else:
+        cap1 = caption[:idx]
+        cap2 = caption[idx+1:]
+        d.text((256, 10), cap1, font=fnt, fill=(255, 255, 255, 255))
+        d.text((256, 60), cap2, font=fnt, fill=(255, 255, 255, 255))
+
+    return img_txt
+
+
+def save_super_images(sample_batchs, hr_sample_batchs,
+                      captions_batch, batch_size,
+                      startID, save_dir):
+    if not os.path.isdir(save_dir):
+        print('Make a new folder: ', save_dir)
+        mkdir_p(save_dir)
+
+    # Save up to 16 samples for each text embedding/sentence
+    img_shape = hr_sample_batchs[0][0].shape
+    for j in range(batch_size):
+        padding = np.zeros(img_shape)
+        row1 = [padding]
+        row2 = [padding]
+        # First row with up to 8 samples
+        for i in range(np.minimum(8, len(sample_batchs))):
+            lr_img = sample_batchs[i][j]
+            hr_img = hr_sample_batchs[i][j]
+            hr_img = (hr_img + 1.0) * 127.5
+            re_sample = scipy.misc.imresize(lr_img, hr_img.shape[:2])
+            row1.append(re_sample)
+            row2.append(hr_img)
+        row1 = np.concatenate(row1, axis=1)
+        row2 = np.concatenate(row2, axis=1)
+        superimage = np.concatenate([row1, row2], axis=0)
+
+        # Second 8 samples with up to 8 samples
+        if len(sample_batchs) > 8:
+            row1 = [padding]
+            row2 = [padding]
+            for i in range(8, len(sample_batchs)):
+                lr_img = sample_batchs[i][j]
+                hr_img = hr_sample_batchs[i][j]
+                hr_img = (hr_img + 1.0) * 127.5
+                re_sample = scipy.misc.imresize(lr_img, hr_img.shape[:2])
+                row1.append(re_sample)
+                row2.append(hr_img)
+            row1 = np.concatenate(row1, axis=1)
+            row2 = np.concatenate(row2, axis=1)
+            super_row = np.concatenate([row1, row2], axis=0)
+            superimage2 = np.zeros_like(superimage)
+            superimage2[:super_row.shape[0],
+                        :super_row.shape[1],
+                        :super_row.shape[2]] = super_row
+            mid_padding = np.zeros((64, superimage.shape[1], 3))
+            superimage =\
+                np.concatenate([superimage, mid_padding, superimage2], axis=0)
+
+        top_padding = np.zeros((128, superimage.shape[1], 3))
+        superimage =\
+            np.concatenate([top_padding, superimage], axis=0)
+
+        fullpath = '%s/sentence%d.jpg' % (save_dir, startID + j)
+        superimage = drawCaption(np.uint8(superimage), captions_batch[j])
+        scipy.misc.imsave(fullpath, superimage)
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    if args.cfg_file is not None:
+        cfg_from_file(args.cfg_file)
+    if args.gpu_id != -1:
+        cfg.GPU_ID = args.gpu_id
+    if args.caption_path is not None:
+        cfg.TEST.CAPTION_PATH = args.caption_path
+
+    cap_path = cfg.TEST.CAPTION_PATH
+    with open(cap_path) as f:
+        captions = f.read().split('\n')
+    captions_list = [cap for cap in captions if len(cap) > 0]
+    print('Successfully load sentences from: ', cap_path)
+    print('Total number of sentences:', len(captions_list))
+    # path to save generated samples
+    save_dir = cap_path[:cap_path.find('.txt')] + '-skip-thought'
+
+    if len(captions_list) > 0:
+        # Load skipthoughts model and generate embeddings from text sentences
+        print('Load skipthoughts as encoder:')
+        model = skipthoughts.load_model()
+        embeddings = skipthoughts.encode(model, captions_list, verbose=False)
+        num_embeddings = len(embeddings)
+        print('num_embeddings:', num_embeddings, embeddings.shape)
+        batch_size = np.minimum(num_embeddings, cfg.TEST.BATCH_SIZE)
+
+        # Build StackGAN and load the model
+        config = tf.ConfigProto(allow_soft_placement=True)
+        with tf.Session(config=config) as sess:
+            with tf.device("/gpu:%d" % cfg.GPU_ID):
+                embeddings_holder, fake_images_opt, hr_fake_images_opt =\
+                    build_model(sess, embeddings.shape[-1], batch_size)
+
+                count = 0
+                while count < num_embeddings:
+                    iend = count + batch_size
+                    if iend > num_embeddings:
+                        iend = num_embeddings
+                        count = num_embeddings - batch_size
+                    embeddings_batch = embeddings[count:iend]
+                    captions_batch = captions_list[count:iend]
+
+                    samples_batchs = []
+                    hr_samples_batchs = []
+                    # Generate up to 16 images for each sentence with
+                    # randomness from noise z and conditioning augmentation.
+                    for i in range(np.minimum(16, cfg.TEST.NUM_COPY)):
+                        hr_samples, samples =\
+                            sess.run([hr_fake_images_opt, fake_images_opt],
+                                     {embeddings_holder: embeddings_batch})
+                        samples_batchs.append(samples)
+                        hr_samples_batchs.append(hr_samples)
+                    save_super_images(samples_batchs,
+                                      hr_samples_batchs,
+                                      captions_batch,
+                                      batch_size,
+                                      count, save_dir)
+                    count += batch_size
+
+        print('Finish generating samples for %d sentences:' % num_embeddings)
+        print('Example sentences:')
+        for i in xrange(np.minimum(10, num_embeddings)):
+            print('Sentence %d: %s' % (i, captions_list[i]))
diff --git a/demo/cfg/birds-demo.yml b/demo/cfg/birds-demo.yml
@@ -0,0 +1,15 @@
+CONFIG_NAME: 'stageII'
+
+DATASET_NAME: 'birds'
+GPU_ID: 0
+Z_DIM: 100
+
+TEST:
+    PRETRAINED_MODEL: './models/birds_model_164000.ckpt'
+    BATCH_SIZE: 64
+    NUM_COPY: 8
+
+GAN:
+    EMBEDDING_DIM: 128
+    DF_DIM: 64
+    GF_DIM: 128