Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble getting train_baseline to work #180

Open
maunhb opened this issue Jun 23, 2020 · 1 comment
Open

Trouble getting train_baseline to work #180

maunhb opened this issue Jun 23, 2020 · 1 comment

Comments

@maunhb
Copy link

maunhb commented Jun 23, 2020

Hi, I have managed to install and get the tests working, but the train_baseline gives errors when run. I have tried updating the ray version but this caused other problems. At the moment I'm using ray 0.6.1 and have added a symlink to experimental as this seemed to be required.

Exception in thread ray_listen_error_messages:
Traceback (most recent call last):
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/worker.py", line 1818, in listen_error_messages_raylet
error_messages = global_state.error_messages(worker.task_driver_id)
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/experimental/state.py", line 897, in error_messages
assert isinstance(job_id, ray.DriverID)
AttributeError: module 'ray' has no attribute 'DriverID'

Commencing experiment cleanup_A3C
Did not find checkpoint file in /home/charlotte/ray_results/cleanup_A3C.
Starting a new experiment.
Traceback (most recent call last):
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 295, in _update_avail_resources
resources = ray.global_state.cluster_resources()
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/experimental/state.py", line 767, in cluster_resources
clients = self.client_table()
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/experimental/state.py", line 404, in client_table
return parse_client_table(self.redis_client)
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/experimental/state.py", line 29, in parse_client_table
NIL_CLIENT_ID = ray.ObjectID.nil().binary()
AttributeError: type object 'common.ObjectID' has no attribute 'nil'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_baseline.py", line 183, in
tf.app.run(main)
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train_baseline.py", line 177, in main
"config": config,
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/tune/tune.py", line 164, in run_experiments
trial_executor=trial_executor)
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 95, in init
RayTrialExecutor(queue_trials=queue_trials)
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 44, in init
self._update_avail_resources()
File "/home/charlotte/anaconda3/envs/causal/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 301, in _update_avail_resources
None, None, None)
TypeError: check_and_update_resources() takes 1 positional argument but 3 were given

When I updated to the current ray version, the function get_global_worker wasn't recognized as this only exists in Natasha's version of ray in worker.py but I don't understand how the symlinks work enough to get one which directs to this file rather than a folder. Do you know what the mistake might be? Thanks

Charlotte

@internetcoffeephone
Copy link
Contributor

Currently, @eugenevinitsky is in the process of merging in my fork, which is several ray versions ahead. Until it's merged in, you can use that instead. It contains many bugfixes, and should work out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants