Code for training and sim2real transfer of the reinforcement learning agents from our paper "Precision-Focused Reinforcement Learning Model for Robotic Object Pushing".
conda create -n precise_pushing
conda activate precise_pushing
cd PATH_TO_THIS_REPO
pip install -e .
This repos contains two main Gymnasium environments:
-
MujocoPandaPushSimpleEnv
:A simple Gymnasium environment mainly used for debugging. An observation consists of the
(x,y)
end-effector position, the(x,y)
position of the object (achieved goal) and the(x,y)
position of the target (desired goal). -
MujocoPandaPushEnv
:Gymnasium environment that is based on the vision-proprioception model. However, the environment provides more functionality than suggested in the paper (for example objects with variable height and the full episode history of observations).
An agent can be trained using the following Python script (Please note that you must first train the autoencoder!):
python3 PATH_TO_PUSHING_REPO/panda_push_rl_sb3/train.py
For example, to train an agent that automatically adjusts the number of simulation steps use:
python3 PATH_TO_PUSHING_REPO/panda_push_rl_sb3/train.py --numSimSteps -1
There are many environment and training parameters that can be adjusted. For a complete overview of all configurable parameters use:
python3 PATH_TO_PUSHING_REPO/panda_push_rl_sb3/train.py --help
In general, evaluation parameters begin with "e", for example "eObjType". They are not used in the training script.
Log and evaluation files and are saved in the directory PATH_TO_PUSHING_REPO/panda_push_data/rl/RUN_NAME
.
RUN_NAME
is determined by the parameters used to train an agent.
If a parameter differs from its default value, it will appear in RUN_NAME, except for the evaluation parameters (starting with "e")
and some special parameters that do not influence the training results, for example the log path.
The adjustable parameters are similar to the ones used to train an agent, except that evaluation parameters starting with "e" are not ignored. For a complete overview of all configurable parameters use:
python3 PATH_TO_PUSHING_REPO/panda_push_rl_sb3/evaluate_policy.py --help
In general, parameters not beginning with "e" are used to determine the policy to load, i.e. the training configuration, whereas parameters beginning with "e" determine the test configuration. For example, to test the behavior only for cuboids with square base over 100 evaluation episodes use:
python3 PATH_TO_PUSHING_REPO/panda_push_rl_sb3/evaluate_policy.py --eObjType box --eObjSize1 -2 --eNumEvalEpisodes 100
This repository is currently maintained by Lara Bergmann (@lbergmann1).