Results using RLBench as the environment #15

mirkomorati · 2020-05-07T14:55:16Z

Hi,
first of all let me say that I appreciate a lot the work made in this repo.
I would like to know if you have had success in training any algorithm using RLBench as the environment.
I'm currently trying to train the DDPG algorithm on the ReachTarget task using all the observations available with state_type='vision'. As suggested in the issue #6 I modified the default params for DDPG lowering the max_steps and increasing the train_episodes, but I can't seem to get any result.
Any feedback is really much appreciated.

Mirko

Edit:
I noticed that RLBench doesn't provide "usable" reward metrics, am I wrong? All the episodes rewards are either 0.000 or 1.000. Any insight on this problem?

The text was updated successfully, but these errors were encountered:

quantumiracle · 2020-05-07T16:43:18Z

Hi,
I would expect that the end-to-end training with RLzoo algorithm on RLBench can be hard in practice. As you said, it seems RLBench provides the reward value of either 1. or 0. as a signal of task success or not. I wouldn't say it's a not 'usable' reward metrics, it's just too sparse for RL algorithm to learn. So unless you've got a very efficient RL algorithm with some luck in exploration, it may take extremely long time to learn a good policy.

Potential ways of solving that would be starting from a dense reward metric for RLBench I guess, or using reward shaping (e.g. paper here) and other auxiliary techniques.

As for results from our side, ideally we will try to provide some successful policy, but it may take a while.

Zihan

ancorasir · 2020-05-26T02:52:35Z

I have run a similar test in rlbench. I found the first 5 episode is normal and the computation is run on GPU as expected. But after that, the computation is extremely slow and the GPU usage is decreased from 30% to almost 0%.

I turned on the vrep GUI and found that the robot arm explored around during the first 5 episode and then stop exploring after that...

Any suggestion to debug why the computations on GPU suddenly stopped almost?
@quantumiracle

mirkomorati · 2020-05-26T15:38:02Z

I have a similar problem using the CPU and around the 7th episode.

quantumiracle · 2020-05-27T03:03:16Z

Hi guys,

I tried to replicate the problem you met, but it doesn't happen from my side. I use PPO-Clip algorithm on ReachTarget environment in RLBench and the robot is still moving around after 50 episodes without a drop in GPU usage.

The code I used is as follow:

from rlzoo.common.env_wrappers import *
from rlzoo.common.utils import *
from rlzoo.algorithms import *

EnvName = 'ReachTarget'
EnvType = 'rlbench'
env = build_env(EnvName, EnvType, state_type='state')

AlgName = 'PPO'
alg_params, learn_params = call_default_params(env, EnvType, AlgName)
alg = eval(AlgName+'(**alg_params)')
alg.learn(env=env, mode='train', render=True, **learn_params)
alg.learn(env=env, mode='test', render=True, **learn_params)

The package verison:

CoppeliaSim==4.0.0
PyRep==1.1
RLBench==1.0.6
tensorflow-gpu==2.0.1
Python 3.6

Could you please check your packages version and update if they are not consistent with what I used? If the problem still exists, please specify which algorithm and environment name you are testing.

Thanks

mirkomorati · 2020-05-30T15:17:35Z

I'm testing the ReachTarget task using the DDPG algorithm. Also I'm using the vision state type. Using only the robot state doesn't produce any performance drop.
Also I have tensorflow-gpu==2.1.0 but I'm running on the CPU.

I tried to profile an execution of the training stage for 100 episodes (100 max steps) and this is the result.

modanesh · 2024-05-30T20:12:56Z

Any updates on RL baseline performances? @quantumiracle

quantumiracle added the help wanted Extra attention is needed label Jan 12, 2021

Leilasjd mentioned this issue Dec 17, 2022

ImportError: cannot import name 'ArmActionMode' #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results using RLBench as the environment #15

Results using RLBench as the environment #15

mirkomorati commented May 7, 2020 •

edited

Loading

quantumiracle commented May 7, 2020

ancorasir commented May 26, 2020

mirkomorati commented May 26, 2020

quantumiracle commented May 27, 2020

mirkomorati commented May 30, 2020

modanesh commented May 30, 2024

Results using RLBench as the environment #15

Results using RLBench as the environment #15

Comments

mirkomorati commented May 7, 2020 • edited Loading

quantumiracle commented May 7, 2020

ancorasir commented May 26, 2020

mirkomorati commented May 26, 2020

quantumiracle commented May 27, 2020

mirkomorati commented May 30, 2020

modanesh commented May 30, 2024

mirkomorati commented May 7, 2020 •

edited

Loading