Lunar Lander experiments on the cluster #23

rqc1 · 2024-09-09T15:31:06Z

Tasks:

Understand what the observations returned by the Lunar Lander env. mean (position, velocity, angle, angular velocity, ground contact), see doc linked below
Write a wrapper class ObsToNewRewardEnvWrapper derived from gymnasium.env that wraps any existing gymnasium.env object (such as an instance of the Lunar Lander env) into a new env object that replaces the original env's rewards by some function obs_to_new_reward(last_obs, this_obs). To this end:
- make the __init__ method of the wrapper class accept two parameters; original_env: gymnasium.env and obs_to_new_reward: Callable.
- For each relevant attribute of gymnasium.env (see docs linked below), in particular for action_space and observation_space, give the wrapper class a read-only python Property of that name and make that property return the corresponding value from the original env at runtime.
- Implement the methods reset() and step() by calling the original env's method, replacing the reward part in their return by obs_to_new_reward(self._last_obs, obs), and putting self._last_obs = obs.
- Also wrap the other methods (like render() and close()) without modification.
Implement an obs_to_new_reward function for each of the following maximization tasks, so that maximizing the total sum of rewards given by that function leads to maximizing the given maximization task:
- maximize the x coordinate of the lander
  - e.g., max_x_func = lambda (last_obs, this_obs) : lunar_lander_obs(this_obs)["x"] - lunar_lander_obs(last_obs)["x"] if last_obs is not None else 0, where lunar_lander_obs(obs) returns a dict keyed by "x", "y", "vx", "vy", etc.
- minimize the x coordinate
- maximize the y coordinate
- minimize the y coordinate
- maximize the angle
- minimize the angle
- maximize the absolute velocity
- minimize the absolute velocity
- maximize the angular velocity
- minimize the angular velocity
Change make_env() in test_dqn.py so that it replaces the original lunar lander env by env = ObsToNewRewardEnvWrapper(env, obs_to_new_reward) and make it use one of the implemented versions of the obs_to_new_reward functions, depending on some command-line parameter.
For each of these versions of obs_to_new_reward, run an experiment on the cluster:
- Write a corresponding slurm script that requests one GPU, on the basis of the existing test_dqn.slurm
- Also write a fall-back script that uses CPU instead (should GPU jobs not get scheduled fast enough), based on test_dqn_cpu.slurm
- Submit the scripts
- After they have finished, document their resource usage into a table via

sacct -a -j <job_id> --format=user%10,jobname%10,node%10,start%10,end%10,elapsed%10,MaxRS

Resources:

Lunar Lander env doc: https://gymnasium.farama.org/environments/box2d/lunar_lander/
Gymnasium.Env doc: https://gymnasium.farama.org/api/env/
Logging into the PIK cluster and fetching the latest upstream code version:

ssh foote
cd /p/projects/ou/labs/gane/satisfia/satisfia/
git pull
module load anaconda
source activate ./.conda-env

Checking that the code works:

python scripts/test_dqn.py

and interrupt it as soon as it says

agents:   0%|

test_dqn.slurm:

#!/bin/bash
#SBATCH --job-name=satisfia_dqn
#SBATCH --output=output_%j.log
#SBATCH --error=error_%j.log
#SBATCH --ntasks-per-node=1
#SBATCH --partition=gpu
#SBATCH --qos=gpushort
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
#SBATCH --time=1:00:00

module load anaconda
source activate ./.conda-env
.conda-env/bin/python scripts/test_dqn.py

The text was updated successfully, but these errors were encountered:

rqc1 self-assigned this Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lunar Lander experiments on the cluster #23

Lunar Lander experiments on the cluster #23

rqc1 commented Sep 9, 2024

Lunar Lander experiments on the cluster #23

Lunar Lander experiments on the cluster #23

Comments

rqc1 commented Sep 9, 2024

Tasks:

Resources: