You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Understand what the observations returned by the Lunar Lander env. mean (position, velocity, angle, angular velocity, ground contact), see doc linked below
Write a wrapper class ObsToNewRewardEnvWrapper derived from gymnasium.env that wraps any existing gymnasium.env object (such as an instance of the Lunar Lander env) into a new env object that replaces the original env's rewards by some function obs_to_new_reward(last_obs, this_obs). To this end:
make the __init__ method of the wrapper class accept two parameters; original_env: gymnasium.env and obs_to_new_reward: Callable.
For each relevant attribute of gymnasium.env (see docs linked below), in particular for action_space and observation_space, give the wrapper class a read-only python Property of that name and make that property return the corresponding value from the original env at runtime.
Implement the methods reset() and step() by calling the original env's method, replacing the reward part in their return by obs_to_new_reward(self._last_obs, obs), and putting self._last_obs = obs.
Also wrap the other methods (like render() and close()) without modification.
Implement an obs_to_new_reward function for each of the following maximization tasks, so that maximizing the total sum of rewards given by that function leads to maximizing the given maximization task:
maximize the x coordinate of the lander
e.g., max_x_func = lambda (last_obs, this_obs) : lunar_lander_obs(this_obs)["x"] - lunar_lander_obs(last_obs)["x"] if last_obs is not None else 0, where lunar_lander_obs(obs) returns a dict keyed by "x", "y", "vx", "vy", etc.
minimize the x coordinate
maximize the y coordinate
minimize the y coordinate
maximize the angle
minimize the angle
maximize the absolute velocity
minimize the absolute velocity
maximize the angular velocity
minimize the angular velocity
Change make_env() in test_dqn.py so that it replaces the original lunar lander env by env = ObsToNewRewardEnvWrapper(env, obs_to_new_reward) and make it use one of the implemented versions of the obs_to_new_reward functions, depending on some command-line parameter.
For each of these versions of obs_to_new_reward, run an experiment on the cluster:
Write a corresponding slurm script that requests one GPU, on the basis of the existing test_dqn.slurm
Also write a fall-back script that uses CPU instead (should GPU jobs not get scheduled fast enough), based on test_dqn_cpu.slurm
Submit the scripts
After they have finished, document their resource usage into a table via
sacct -a -j <job_id> --format=user%10,jobname%10,node%10,start%10,end%10,elapsed%10,MaxRS
Tasks:
ObsToNewRewardEnvWrapper
derived fromgymnasium.env
that wraps any existinggymnasium.env
object (such as an instance of the Lunar Lander env) into a new env object that replaces the original env's rewards by some functionobs_to_new_reward(last_obs, this_obs)
. To this end:__init__
method of the wrapper class accept two parameters;original_env: gymnasium.env
andobs_to_new_reward: Callable
.gymnasium.env
(see docs linked below), in particular foraction_space
andobservation_space
, give the wrapper class a read-only python Property of that name and make that property return the corresponding value from the original env at runtime.reset()
andstep()
by calling the original env's method, replacing thereward
part in their return byobs_to_new_reward(self._last_obs, obs)
, and puttingself._last_obs = obs
.render()
andclose()
) without modification.obs_to_new_reward
function for each of the following maximization tasks, so that maximizing the total sum of rewards given by that function leads to maximizing the given maximization task:max_x_func = lambda (last_obs, this_obs) : lunar_lander_obs(this_obs)["x"] - lunar_lander_obs(last_obs)["x"] if last_obs is not None else 0
, wherelunar_lander_obs(obs)
returns a dict keyed by "x", "y", "vx", "vy", etc.make_env()
intest_dqn.py
so that it replaces the original lunar landerenv
byenv = ObsToNewRewardEnvWrapper(env, obs_to_new_reward)
and make it use one of the implemented versions of theobs_to_new_reward
functions, depending on some command-line parameter.obs_to_new_reward
, run an experiment on the cluster:test_dqn.slurm
test_dqn_cpu.slurm
Resources:
Lunar Lander env doc: https://gymnasium.farama.org/environments/box2d/lunar_lander/
Gymnasium.Env doc: https://gymnasium.farama.org/api/env/
Logging into the PIK cluster and fetching the latest upstream code version:
and interrupt it as soon as it says
test_dqn.slurm
:The text was updated successfully, but these errors were encountered: