Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAC: ValueError: setting an array element with a sequence (stable-baselines 2.10) #852

Closed
marianophielipp opened this issue May 12, 2020 · 17 comments
Labels
custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely No tech support We do not do tech support question Further information is requested

Comments

@marianophielipp
Copy link

Code:
[Custom made environment]
import gym
import numpy as np
from stable_baselines.sac.policies import MlpPolicy
from stable_baselines import SAC

model = SAC(MlpPolicy, env, verbose=1)

if train:
model.learn(total_timesteps=total_timesteps, log_interval=10)

Output:

| current_lr | 0.0003 |
| episodes | 10 |
| fps | 0 |
| mean 100 episode reward | -4 |
| n_updates | 0 |
| time_elapsed | 151 |
| total timesteps | 92 |

TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train_real_arm_perception_1.py", line 43, in
model.learn(total_timesteps=total_timesteps, log_interval=10)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/sac/sac.py", line 464, in learn
mb_infos_vals.append(self._train_step(step, writer, current_lr))
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/sac/sac.py", line 343, in _train_step
out = self.sess.run(self.step_ops, feed_dict)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1142, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

@Miffyli Miffyli added custom gym env Issue related to Custom Gym Env question Further information is requested labels May 12, 2020
@Miffyli
Copy link
Collaborator

Miffyli commented May 12, 2020

Use env checker to see if your environment works correctly.

@araffin araffin added the PR template not filled Please fill the pull request template label May 12, 2020
@araffin
Copy link
Collaborator

araffin commented May 12, 2020

PS: Please use and read the issue template (the env checker is mentioned there too)

@araffin araffin added more information needed Please fill the issue template completely and removed PR template not filled Please fill the pull request template labels May 12, 2020
@araffin
Copy link
Collaborator

araffin commented May 12, 2020

I think we need to output a better error message in SB3 (see #707 and #712)
Currently, we cannot do that properly because of the Unvecwrapper...

EDIT: the mentioned issue is not the same but it is related in term of unclear message

@marianophielipp
Copy link
Author

marianophielipp commented May 12, 2020

check_env(env)
<class 'numpy.ndarray'>
Traceback (most recent call last):
File "", line 1, in
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/common/env_checker.py", line 214, in check_env
_check_returned_values(env, observation_space, action_space)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/common/env_checker.py", line 99, in _check_returned_values
_check_obs(obs, observation_space, 'reset')
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/common/env_checker.py", line 89, in _check_obs
"method does not match the given observation space".format(method_name))
AssertionError: The observation returned by the reset() method does not match the given observation space

a=env.observation_space.sample()
<class 'numpy.ndarray'>
b=env.reset()
<class 'numpy.ndarray'>
a.shape
(6,)
b.shape
(6,)
a
array([0.30428773, 0.42360216, 0.8966984 , 0.4622259 , 0.6768906 ,
0.5416117 ], dtype=float32)
b
array([ 0.74211503, 0.34441176, 0.33516484, 0.2 , -0.25 ,
0.1 ], dtype=float32)

@Miffyli Miffyli added the No tech support We do not do tech support label May 12, 2020
@Miffyli
Copy link
Collaborator

Miffyli commented May 12, 2020

As the error message says, observation from reset() differs from the one set by self.observation_space.

This is not a place for technical support, though. Please close this issue if there are no further enhancements/issues related to stable-baselines.

@marianophielipp
Copy link
Author

ok. but seems to be the same. I debug everything. even compare with CartPole.

@Miffyli
Copy link
Collaborator

Miffyli commented May 12, 2020

That is because SAC does not support discrete actions, only continuous ones (see docs).

@marianophielipp
Copy link
Author

I'm using continuous actions

@Miffyli
Copy link
Collaborator

Miffyli commented May 12, 2020

That is because SAC does not support discrete actions, only continuous ones (see docs). Indeed this error should be clarified in future updates.

Edit: Github derped my messages.

@marianophielipp
Copy link
Author

the action space is self.action_space = spaces.Box(low=low_action, high=high_action, dtype=np.float32)

@marianophielipp
Copy link
Author

but the complain in the state space which seems completely fine

@Miffyli
Copy link
Collaborator

Miffyli commented May 12, 2020

CartPole uses discrete actions, that's why it is not working. Your example does not work because reset() function is wrong.

I am closing this issue as this is not a stable-baselines bug or enhancement suggestion, and the check for action spaces has already been noted.

@Miffyli Miffyli closed this as completed May 12, 2020
@marianophielipp
Copy link
Author

I know it use discrete action. My reset function seems fine, rigth data types, etc. as I posted the results up there.

@araffin
Copy link
Collaborator

araffin commented May 12, 2020

Please fill the issue template completely next time, notably by formatting your code using markdown codeblock and giving a minimal working example, e.g.:

import gym
import numpy as np

class CustomEnv(gym.Env):
    def __init__(self):
        self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(6,))
        self.action_space = gym.spaces.Box(low=-1, high=1, shape=(6,))

    def reset(self):
        return self.observation_space.sample()

    def step(self, action):
        return self.observation_space.sample(), 0.0, False, {}

from stable_baselines.common.env_checker import check_env

check_env(CustomEnv())

@araffin
Copy link
Collaborator

araffin commented May 12, 2020

Related issues: #595 and #283

@marianophielipp
Copy link
Author

ok. I will do that. BTW I see in the code of the check_env and print the arguments and I still dont understand what's wrong, but I suppose is my problem now:
CartPole:
print("Checking env", check_env(envg))
Box(4,) : obs [ 0.01070996 -0.04723248 -0.02073532 -0.027894 ]
<class 'gym.spaces.box.Box'> : t obs <class 'numpy.ndarray'>

My Custom Env:
print("Checking env", check_env(env))
<class 'numpy.ndarray'>
Box(6,) : obs [ 0.14489795 0.75911766 0.21703297 0.2 -0.25 0.1 ]
<class 'gym.spaces.box.Box'> : t obs <class 'numpy.ndarray'>

@marianophielipp
Copy link
Author

Does works in stable-baselines 2.8.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely No tech support We do not do tech support question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants