Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lunar Lander experiments on the cluster #23

Open
rqc1 opened this issue Sep 9, 2024 · 0 comments
Open

Lunar Lander experiments on the cluster #23

rqc1 opened this issue Sep 9, 2024 · 0 comments
Assignees

Comments

@rqc1
Copy link
Contributor

rqc1 commented Sep 9, 2024

Tasks:

  • Understand what the observations returned by the Lunar Lander env. mean (position, velocity, angle, angular velocity, ground contact), see doc linked below
  • Write a wrapper class ObsToNewRewardEnvWrapper derived from gymnasium.env that wraps any existing gymnasium.env object (such as an instance of the Lunar Lander env) into a new env object that replaces the original env's rewards by some function obs_to_new_reward(last_obs, this_obs). To this end:
    • make the __init__ method of the wrapper class accept two parameters; original_env: gymnasium.env and obs_to_new_reward: Callable.
    • For each relevant attribute of gymnasium.env (see docs linked below), in particular for action_space and observation_space, give the wrapper class a read-only python Property of that name and make that property return the corresponding value from the original env at runtime.
    • Implement the methods reset() and step() by calling the original env's method, replacing the reward part in their return by obs_to_new_reward(self._last_obs, obs), and putting self._last_obs = obs.
    • Also wrap the other methods (like render() and close()) without modification.
  • Implement an obs_to_new_reward function for each of the following maximization tasks, so that maximizing the total sum of rewards given by that function leads to maximizing the given maximization task:
    • maximize the x coordinate of the lander
      • e.g., max_x_func = lambda (last_obs, this_obs) : lunar_lander_obs(this_obs)["x"] - lunar_lander_obs(last_obs)["x"] if last_obs is not None else 0, where lunar_lander_obs(obs) returns a dict keyed by "x", "y", "vx", "vy", etc.
    • minimize the x coordinate
    • maximize the y coordinate
    • minimize the y coordinate
    • maximize the angle
    • minimize the angle
    • maximize the absolute velocity
    • minimize the absolute velocity
    • maximize the angular velocity
    • minimize the angular velocity
  • Change make_env() in test_dqn.py so that it replaces the original lunar lander env by env = ObsToNewRewardEnvWrapper(env, obs_to_new_reward) and make it use one of the implemented versions of the obs_to_new_reward functions, depending on some command-line parameter.
  • For each of these versions of obs_to_new_reward, run an experiment on the cluster:
    • Write a corresponding slurm script that requests one GPU, on the basis of the existing test_dqn.slurm
    • Also write a fall-back script that uses CPU instead (should GPU jobs not get scheduled fast enough), based on test_dqn_cpu.slurm
    • Submit the scripts
    • After they have finished, document their resource usage into a table via
sacct -a -j <job_id> --format=user%10,jobname%10,node%10,start%10,end%10,elapsed%10,MaxRS

Resources:

ssh foote
cd /p/projects/ou/labs/gane/satisfia/satisfia/
git pull
module load anaconda
source activate ./.conda-env
  • Checking that the code works:
python scripts/test_dqn.py

and interrupt it as soon as it says

agents:   0%|
  • test_dqn.slurm:
#!/bin/bash
#SBATCH --job-name=satisfia_dqn
#SBATCH --output=output_%j.log
#SBATCH --error=error_%j.log
#SBATCH --ntasks-per-node=1
#SBATCH --partition=gpu
#SBATCH --qos=gpushort
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
#SBATCH --time=1:00:00

module load anaconda
source activate ./.conda-env
.conda-env/bin/python scripts/test_dqn.py
@rqc1 rqc1 self-assigned this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant