-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results using RLBench as the environment #15
Comments
Hi, Potential ways of solving that would be starting from a dense reward metric for RLBench I guess, or using reward shaping (e.g. paper here) and other auxiliary techniques. As for results from our side, ideally we will try to provide some successful policy, but it may take a while. Zihan |
I have run a similar test in rlbench. I found the first 5 episode is normal and the computation is run on GPU as expected. But after that, the computation is extremely slow and the GPU usage is decreased from 30% to almost 0%. The output in terminal: I turned on the vrep GUI and found that the robot arm explored around during the first 5 episode and then stop exploring after that... Any suggestion to debug why the computations on GPU suddenly stopped almost? |
I have a similar problem using the CPU and around the 7th episode. |
Hi guys, I tried to replicate the problem you met, but it doesn't happen from my side. I use PPO-Clip algorithm on ReachTarget environment in RLBench and the robot is still moving around after 50 episodes without a drop in GPU usage. The code I used is as follow:
The package verison:
Could you please check your packages version and update if they are not consistent with what I used? If the problem still exists, please specify which algorithm and environment name you are testing. Thanks |
I'm testing the I tried to profile an execution of the training stage for 100 episodes (100 max steps) and this is the result. |
Any updates on RL baseline performances? @quantumiracle |
Hi,
first of all let me say that I appreciate a lot the work made in this repo.
I would like to know if you have had success in training any algorithm using RLBench as the environment.
I'm currently trying to train the DDPG algorithm on the
ReachTarget
task using all the observations available withstate_type='vision'
. As suggested in the issue #6 I modified the default params for DDPG lowering themax_steps
and increasing thetrain_episodes
, but I can't seem to get any result.Any feedback is really much appreciated.
Mirko
Edit:
I noticed that RLBench doesn't provide "usable" reward metrics, am I wrong? All the episodes rewards are either 0.000 or 1.000. Any insight on this problem?
The text was updated successfully, but these errors were encountered: