Playing 2048 using Proximal Policy Optimization. Lots of TODO's before it really performs well. However, the agent already learns to beat the game after approx. 8 hours of training:
I use pytest
. Run the following commands:
pytest
pytest --doctest-modules
- Use model wrappers to convert observation space instead of what is done now.
- Add setup.py for installation of dependencies.
- One-hot encoding of observation space instead of numerical values. Probably will be a better representation?