Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results using RLBench as the environment #15

Open
mirkomorati opened this issue May 7, 2020 · 6 comments
Open

Results using RLBench as the environment #15

mirkomorati opened this issue May 7, 2020 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@mirkomorati
Copy link
Contributor

mirkomorati commented May 7, 2020

Hi,
first of all let me say that I appreciate a lot the work made in this repo.
I would like to know if you have had success in training any algorithm using RLBench as the environment.
I'm currently trying to train the DDPG algorithm on the ReachTarget task using all the observations available with state_type='vision'. As suggested in the issue #6 I modified the default params for DDPG lowering the max_steps and increasing the train_episodes, but I can't seem to get any result.
Any feedback is really much appreciated.

Mirko

Edit:
I noticed that RLBench doesn't provide "usable" reward metrics, am I wrong? All the episodes rewards are either 0.000 or 1.000. Any insight on this problem?

@quantumiracle
Copy link
Member

Hi,
I would expect that the end-to-end training with RLzoo algorithm on RLBench can be hard in practice. As you said, it seems RLBench provides the reward value of either 1. or 0. as a signal of task success or not. I wouldn't say it's a not 'usable' reward metrics, it's just too sparse for RL algorithm to learn. So unless you've got a very efficient RL algorithm with some luck in exploration, it may take extremely long time to learn a good policy.

Potential ways of solving that would be starting from a dense reward metric for RLBench I guess, or using reward shaping (e.g. paper here) and other auxiliary techniques.

As for results from our side, ideally we will try to provide some successful policy, but it may take a while.

Zihan

@ancorasir
Copy link

I have run a similar test in rlbench. I found the first 5 episode is normal and the computation is run on GPU as expected. But after that, the computation is extremely slow and the GPU usage is decreased from 30% to almost 0%.

The output in terminal:
Episode: 1/100 | Episode Reward: 0.0000 | Running Time: 20.8774
Episode: 2/100 | Episode Reward: 0.0000 | Running Time: 39.9556
Episode: 3/100 | Episode Reward: 0.0000 | Running Time: 70.8135
Episode: 4/100 | Episode Reward: 0.0000 | Running Time: 112.0266
Episode: 5/100 | Episode Reward: 0.0000 | Running Time: 168.1843

I turned on the vrep GUI and found that the robot arm explored around during the first 5 episode and then stop exploring after that...

Any suggestion to debug why the computations on GPU suddenly stopped almost?
@quantumiracle

@mirkomorati
Copy link
Contributor Author

I have a similar problem using the CPU and around the 7th episode.

@quantumiracle
Copy link
Member

Hi guys,

I tried to replicate the problem you met, but it doesn't happen from my side. I use PPO-Clip algorithm on ReachTarget environment in RLBench and the robot is still moving around after 50 episodes without a drop in GPU usage.

The code I used is as follow:

from rlzoo.common.env_wrappers import *
from rlzoo.common.utils import *
from rlzoo.algorithms import *

EnvName = 'ReachTarget'
EnvType = 'rlbench'
env = build_env(EnvName, EnvType, state_type='state')

AlgName = 'PPO'
alg_params, learn_params = call_default_params(env, EnvType, AlgName)
alg = eval(AlgName+'(**alg_params)')
alg.learn(env=env, mode='train', render=True, **learn_params)
alg.learn(env=env, mode='test', render=True, **learn_params)

The package verison:

  • CoppeliaSim==4.0.0
  • PyRep==1.1
  • RLBench==1.0.6
  • tensorflow-gpu==2.0.1
  • Python 3.6

Could you please check your packages version and update if they are not consistent with what I used? If the problem still exists, please specify which algorithm and environment name you are testing.

Thanks

@mirkomorati
Copy link
Contributor Author

I'm testing the ReachTarget task using the DDPG algorithm. Also I'm using the vision state type. Using only the robot state doesn't produce any performance drop.
Also I have tensorflow-gpu==2.1.0 but I'm running on the CPU.

I tried to profile an execution of the training stage for 100 episodes (100 max steps) and this is the result.

Screenshot from 2020-05-29 19-00-31

@modanesh
Copy link

Any updates on RL baseline performances? @quantumiracle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants