Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Solution] How to run this project with Python 3.x and TensorFlow 1.x #30

Open
Lotayou opened this issue Jan 3, 2018 · 29 comments
Open

Comments

@Lotayou
Copy link

Lotayou commented Jan 3, 2018

I spent 5 hours getting the program running, which is a great waste of time. I hereby summarize all the necessary changes for this project to run in Python 3.x and TensorFlow r1.x environment.

I assume your working directory is ~/StackGAN/StageI.

1. Python 3.x compatibility issues

In addition to minor changes mentioned in #2, there are still a major issue:

Pickle Issue: The original pickle files are created in Python 2.7, and open it with Python 3 could lead to the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
The solution can be found here: Unpickle Python 2 object in Python 3

2. TensorFlow r1.x compatibility issues

tf.concat() Issue #11: If you encounter error message like this:
TypeError: Expected int32, got <prettytensor.pretty_tensor_class.Layer object at 0x7f74d41abd90> of type 'Layer' instead.
In TensorFlow r0.12, the function is like
tf.concat(axis, value)
while in TensorFlow r1.x version the argument order has been changed:
tf.concat(value, axis)

PrettyTensor Issue #27: This issue is cause in PrettyTensor module with error message like this:
File ".../site-packages/prettytensor/pretty_tensor_class.py", line 1335, in _strip_unnecessary_contents_from_stack for f, line_no, method, _ in result._traceback: ValueError: too many values to unpack (expected 4)

This issue has nothing to do with PrettyTensor package version, I use the latest 0.7.4 but 0.6.2 should also work.

The main cause of this problem is in _traceback format, in TensorFlow r1.3 the _traceback object is a list with each entry a 6-tuple like this:
('D:\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\spyder\\utils\\ipython\\start_kernel.py', 241, '<module>', {'__name__': '__main__', '__doc__': '\nFile used to start kernels for the IPython Console\n', '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x0000021474E75CF8>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': 'D:\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\spyder\\utils\\ipython\\start_kernel.py', '__cached__': None, 'os': <module 'os' from 'D:\\Anaconda3\\envs\\tensorflow\\lib\\os.py'>, 'osp': <module 'ntpath' from 'D:\\Anaconda3\\envs\\tensorflow\\lib\\ntpath.py'>, 'sys': <module 'sys' (built-in)>, 'IS_EXT_INTERPRETER': True, 'sympy_config': <function sympy_config at 0x00000214799891E0>, 'kernel_config': <function kernel_config at 0x0000021479989268>, 'varexp': <function varexp at 0x00000214799892F0>, 'main': <function main at 0x0000021479989378>}, 9, None)

I guess in TensorFlow r0.12 the entry only contains 4 elements. But anyway here's a quick workaround:

Change
for f, line_no, method, _ in result._traceback:
to
for f, line_no, method, *_ in result._traceback:
*_ takes any number of arguments and resolve whatever left in the unpacked tuple.

3. Summary Issue:
TensorFlow r1.3 has a new summary class so many code should be adapted like this:

tf.merge_all_summaries() -> tf.summary.merge_all()
tf.scalar_summary(k,v) -> tf.summary.scalar(k,v)
summary_writer = tf.train.SummaryWriter(self.log_dir, sess.graph) ->
summary_writer = tf.summary.FileWriter(self.log_dir, sess.graph)

4. Slicing Index Issue:
The index must be integer, so in dataset.py line 80 something should be changed:
# cropped_image =\ # images[i][w1: w1 + self._imsize, h1: h1 + self._imsize, :] original_image = images[i] cropped_image = original_image[int(w1): int(w1 + imsize),\ int(h1): int(h1 + imsize), :]

That's all the major compatibility issues that are necessary for training. Enjoy :)
image

@Lotayou
Copy link
Author

Lotayou commented Jan 3, 2018

@hanzhanggit Can you please mention this in readme.md? Thanks!

@SpadesQ
Copy link

SpadesQ commented Jan 11, 2018

@Lotayou

After Change
for f, line_no, method, _ in result.traceback:
to
for f, line_no, method, *
in result.traceback:
*
takes any number of arguments and resolve whatever left in the unpacked tuple.

I got:
for f, line_no, method, *_ in result._traceback:
^
SyntaxError: invalid syntax

How to solve?

I use python2.7,how to solve

@Lotayou
Copy link
Author

Lotayou commented Jan 21, 2018

@SpadesQ I use Python 3.6 myself so I don't know much about Python 2.7.

Maybe You should check if your _traceback file has the same format as mine (by print out its first entry like I did).

My traceback file contains 6 items per entry, but for loop only expectd 4 items, so I have to resolve the final items with *. If *_ does not work for you, just use some random variables to fill in the gap like this:

for f, line_no, method, blah1, blah2, blah3 in result._traceback:

BTW, This TensorFlow versions is a mess, I now use StackGANv2 PyTorch version.

@KelvinBull
Copy link

Hello,why doesn't Prettytensor library include customs_fully_connected/custom_conv2d.
The version of tensorflow/Prettytensor is a bug?

@ningning32
Copy link

i am counter the same question @Lotayou , have you solve the question? really thank you

@ningning32
Copy link

ningning32 commented Apr 7, 2018

for f, line_no, method, blah1, blah2, blah3 in result._traceback:
ValueError: need more than 4 values to unpack
for f, line_no, method, blah1, blah2, blah3, blah4 in result._traceback:
ValueError: need more than 6 values to unpack
i used python2.7
@SpadesQ

@KelvinBull
Copy link

KelvinBull commented May 3, 2018

if you use python2.7 , you can do it by change your code curvely:
for all in result._traceback:
allist = list(all)[:3]
f = allist[0]
line_no = allist[1]
method = allist[2]

so, you can run ...

@AnwarUllahKhan
Copy link

@Lotayou Dear Sir,
I am facing this problem

(base) C:\Users\anwar\Downloads\Programs\Text-to-Image-HighResolution>python run_exp.py --cfg cfg/birds.yml --gpu 0
Using config:
{'CONFIG_NAME': 'stageI',
'DATASET_NAME': 'birds',
'EMBEDDING_TYPE': 'cnn-rnn',
'GAN': {'DF_DIM': 64,
'EMBEDDING_DIM': 128,
'GF_DIM': 128,
'NETWORK_TYPE': 'default'},
'GPU_ID': 0,
'TEST': {'BATCH_SIZE': 64,
'CAPTION_PATH': '',
'HR_IMSIZE': 256,
'LR_IMSIZE': 64,
'NUM_COPY': 16,
'PRETRAINED_MODEL': ''},
'TRAIN': {'BATCH_SIZE': 64,
'B_WRONG': True,
'COEFF': {'KL': 2.0},
'COND_AUGMENTATION': True,
'DISCRIMINATOR_LR': 0.0002,
'FINETUNE_LR': False,
'FLAG': True,
'FT_LR_RETIO': 0.1,
'GENERATOR_LR': 0.0002,
'LR_DECAY_EPOCH': 50,
'MAX_EPOCH': 600,
'NUM_COPY': 4,
'NUM_EMBEDDING': 4,
'PRETRAINED_EPOCH': 600,
'PRETRAINED_MODEL': '',
'SNAPSHOT_INTERVAL': 2000},
'Z_DIM': 100}
images: (2933, 76, 76, 3)
embeddings: (2933, 10, 1024)
list_filenames: 2933 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18
images: (8855, 76, 76, 3)
embeddings: (8855, 10, 1024)
list_filenames: 8855 002.Laysan_Albatross/Laysan_Albatross_0002_1027
Traceback (most recent call last):
File "run_exp.py", line 59, in
image_shape=dataset.image_shape
File "C:\Users\anwar\Downloads\Programs\Text-to-Image-HighResolution\model.py", line 31, in init
self.d_encode_img_template = self.d_encode_image()
File "C:\Users\anwar\Downloads\Programs\Text-to-Image-HighResolution\model.py", line 161, in d_encode_image
custom_conv2d(self.df_dim, k_h=4, k_w=4).
File "C:\ProgramData\Anaconda3\lib\site-packages\prettytensor\pretty_tensor_class.py", line 1965, in method
with _method_scope(input_layer, scope_name) as (scope, _):
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 81, in enter
return next(self.gen)
File "C:\ProgramData\Anaconda3\lib\site-packages\prettytensor\pretty_tensor_class.py", line 1776, in _method_scope
scopes.var_and_name_scope((name, None)) as (scope, var_scope):
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 81, in enter
return next(self.gen)
File "C:\ProgramData\Anaconda3\lib\site-packages\prettytensor\scopes.py", line 55, in var_and_name_scope
vs_key = tf.get_collection_ref(variable_scope._VARSCOPE_KEY)
AttributeError: module 'tensorflow.python.ops.variable_scope' has no attribute '_VARSCOPE_KEY'

@AnwarUllahKhan
Copy link

@Lotayou @SpadesQ dear, I was training the model and putout the charger and goes out side when i came the my system is switch off, so now how can I continue my model again from that checkpoint? help me please, Thank you very much

@Lotayou
Copy link
Author

Lotayou commented Dec 24, 2018

@AnwarUllahKhan I guess there must be a parameter in config file where you can designate the ckpt file to be loaded for subseqeuent training. However if you cannot find one, try convert your tensorflow checkpoint to a pytorch one, and go to the pytorch implementation instead:)

@AnwarUllahKhan
Copy link

@Lotayou thank you I solve that. I successfully train this now but how can I run demo which is .sh file and I am on the windows....?

@guwalgiya
Copy link

saved me so much time! thanks!

@ankit01ojha
Copy link

ankit01ojha commented Feb 21, 2019

@AnwarUllahKhan could you please elaborate on how you fixed it, I am also facing the same problem. And if you have made this project work on windows could you also tell me how you ran the shell script.
@Lotayou could you also help me.

@akhilvasvani
Copy link

akhilvasvani commented Jun 22, 2019

@Lotayou @AnwarUllahKhan @ankit01ojha, I am also facing the same problem with prettytensor for python3.6.

Problem:

python3 stageI/run_exp.py --cfg stageI/cfg/birds.yml --gpu 0
./misc/config.py:100: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yaml_cfg = edict(yaml.load(f))
Using config:
{'CONFIG_NAME': 'stageI',
 'DATASET_NAME': 'birds',
 'EMBEDDING_TYPE': 'cnn-rnn',
 'GAN': {'DF_DIM': 64,
         'EMBEDDING_DIM': 128,
         'GF_DIM': 128,
         'NETWORK_TYPE': 'default'},
 'GPU_ID': 0,
 'TEST': {'BATCH_SIZE': 64,
          'CAPTION_PATH': '',
          'HR_IMSIZE': 256,
          'LR_IMSIZE': 64,
          'NUM_COPY': 16,
          'PRETRAINED_MODEL': ''},
 'TRAIN': {'BATCH_SIZE': 64,
           'B_WRONG': True,
           'COEFF': {'KL': 2.0},
           'COND_AUGMENTATION': True,
           'DISCRIMINATOR_LR': 0.0002,
           'FINETUNE_LR': False,
           'FLAG': True,
           'FT_LR_RETIO': 0.1,
           'GENERATOR_LR': 0.0002,
           'LR_DECAY_EPOCH': 50,
           'MAX_EPOCH': 600,
           'NUM_COPY': 4,
           'NUM_EMBEDDING': 4,
           'PRETRAINED_EPOCH': 600,
           'PRETRAINED_MODEL': '',
           'SNAPSHOT_INTERVAL': 2000},
 'Z_DIM': 100}
images:  (2933, 76, 76, 3)
embeddings:  (2933, 10, 1024)
list_filenames:  2933 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18
images:  (8855, 76, 76, 3)
embeddings:  (8855, 10, 1024)
list_filenames:  8855 002.Laysan_Albatross/Laysan_Albatross_0002_1027
Traceback (most recent call last):
  File "stageI/run_exp.py", line 63, in <module>
    image_shape=dataset.image_shape
  File "/home/akhil/StackGAN/stageI/model.py", line 35, in __init__
    self.d_encode_img_template = self.d_encode_image()
  File "/home/akhil/StackGAN/stageI/model.py", line 165, in d_encode_image
    custom_conv2d(self.df_dim, k_h=4, k_w=4).
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/pretty_tensor_class.py", line 1965, in method
    with _method_scope(input_layer, scope_name) as (scope, _):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/pretty_tensor_class.py", line 1776, in _method_scope
    scopes.var_and_name_scope((name, None)) as (scope, var_scope):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/scopes.py", line 55, in var_and_name_scope
    vs_key = tf.get_collection_ref(variable_scope._VARSCOPE_KEY)
AttributeError: module 'tensorflow.python.ops.variable_scope' has no attribute '_VARSCOPE_KEY'

How did you fix it?

@AnwarUllahKhan
Copy link

@akhilvasvani @ankit01ojha you both are using python 3+ so follow the instruction of @Lotayou first message...

@akhilvasvani
Copy link

@AnwarUllahKhan, @Lotayou does not mention how to solve the problem. Notice how my error and your error are exactly the same.

What did you do to solve your error?

@AnwarUllahKhan
Copy link

@akhilvasvani you can try this one too https://www.twblogs.net/a/5c713446bd9eee68dc3f25a0
or downgrade your tensorflow

@AnwarUllahKhan
Copy link

#51

@akhilvasvani
Copy link

Awesome. Thanks man. Much appreciated

@akhilvasvani
Copy link

Ok, so following the link you posted @AnwarUllahKhan, I changed:
tf.get_collection_ref(variable_scope._VARSCOPE_KEY) to tf.get_collection_ref(variable_scope._VARSCOPESTORE_KEY)

However, I then hit another error:

File "/home/akhil/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1341, in get_variable_scope
    return get_variable_scope_store().current_scope
AttributeError: 'VariableScope' object has no attribute 'current_scope' 

Following the solution from the link, get_variable_scope() and get_variable_scope_store() will be called each other continuously and forces the main code to stop running. I didn't know how to add in "current_scope" without messing up the rest of variable_scope.py. So this didn't work.

Then I went back to the original problem and changed:
tf.get_collection_ref(variable_scope._VARSCOPE_KEY) to tf.get_collection_ref(variable_scope.__VARSTORE_KEY).

However, when I reach the "custom_fully_connected_", I hit the ipdb debugger. Is this a similar path you went down?

@akhilvasvani
Copy link

akhilvasvani commented Jun 26, 2019

In the ipdb debugger, it finds an error in custom_ops.py with the class custom_fully_connected, specifically with the matrix and and bias variables. I get the error:

 File "/home/akhil/StackGAN/stageI/model.py", line 48, in generate_condition
    conditions = (pt.wrap(c_var).flatten().custom_fully_connected(self.ef_dim * 2).
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/pretty_tensor_class.py", line 1972, in method
    result = func(non_seq_layer, *args, **kwargs)
  File "./misc/custom_ops.py", line 158, in __call__
    init=tf.random_normal_initializer(stddev=stddev))
  File "/usr/local/lib/python3.6/dist-packages/prettytensor/pretty_tensor_class.py", line 1673, in variable
    collections=variable_collections)
  File "/home/akhil/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1479, in get_variable
    aggregation=aggregation)
File "/home/akhil/.local/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1220, in get_variable
    aggregation=aggregation)
TypeError: get_variable() missing 1 required positional argument: 'name'

This is what is written in the file:

        try:
            if len(shape) == 4:
                input_ = tf.reshape(input_, tf.stack([tf.shape(input_)[0], np.prod(shape[1:])]))
                input_.set_shape([None, np.prod(shape[1:])])
                shape = input_.get_shape().as_list()

            with tf.variable_scope(scope or "Linear"):
                matrix = self.variable("Matrix", [in_dim or shape[1], output_size],
                                       tf.random_normal_initializer(stddev=stddev))
                bias = self.variable("bias", [output_size], tf.constant_initializer(bias_start))
                return input_layer.with_tensor(tf.matmul(input_, matrix) + bias, parameters=self.vars)
        except Exception:
            import ipdb; ipdb.set_trace()

Is there a way around this problem?

@akhilvasvani
Copy link

#59 Solved it without using Pretty Tensor

@AllenGe666
Copy link

image
how to address this problem?

@akhilvasvani
Copy link

akhilvasvani commented Sep 5, 2019

Oh, I did not train a model for the flower dataset, so you cannot use Han Zhang's pretrained model on my (flower) demo script.

Working on training that!

@AllenGe666
Copy link

Oh, I did not train a model for the flower dataset, so you cannot use Han Zhang's pretrained model on my (flower) demo script.

Working on training that!

Could you please tell me where fo find your pre-trained model?

@akhilvasvani
Copy link

akhilvasvani commented Sep 10, 2019

Unfortunately, I have not posted my pretrained model at the time. At the moment, my focus is training the StackGAN model with skip_thoughts vectors for birds. Once I am done with that, I will get back to training the model for flowers

@rs2309
Copy link

rs2309 commented Sep 18, 2019

Easy Solution to run on windows

  1. python 3.5

  2. prettytensor=0.7.1

  3. solve pickle issue : Unpickle Python 2 object in Python 3

  4. for tensorflow,
    cpu:
    pip install --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-0.12.0rc0-cp35-cp35m-win_amd64.whl
    gpu:
    pip install --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl

  5. install rest of the packages

@ast1997
Copy link

ast1997 commented Feb 13, 2020

@akhilvasvani I got error while performing training for updated StackGAN project in your github. https://github.com/akhilvasvani/StackGAN. Can you please help me out

@ast1997
Copy link

ast1997 commented Feb 13, 2020

I am trying to run the sh demo/flowers_demo.sh file and I get an error "Command not found". This is the error upon running the command sh demo/flowers_demo.sh.
demo/flowers_demo.sh: line 10: th: command not found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests