Several agents able to play StarCraft II will be built in this repository!
First of all, you need to download and install the game. Then follow the instructions below to build awesome battle bots!
The install packages from requirements file are all tested on windows 10, conda 4.9.2 and python 3.7.3.
Create a new conda environment with
conda create -n SCII_Bots python=3.7.3
Install git for python with
conda install git
Install requirements with (note that some of the packages may not need to be used)
pip install -r requirements.txt
If everything goes well, the followed result should be shown in the powershell/prompt.
PS: the newest version of PySC2 package is 3.0.0, which most of the codes published on websites are based on 2.x.x, so most of them are no longer available to run.
PPS: PySC2 is very different with and more complex than python-sc2. In this repository, we mainly focus on PySC2.
State: obtained from env.observation, including the feature screen, feature minimap and player info.
Action: try to determine what to do and where to go to win the game. There are two types of the actions include several basic action (currently 11) and a coordinate position with 64*64 points.
Reward:
version 1: $$ (score + total_value_units + total_value_structures + 10killed_value_units + 10killed_value_structures + collected_minerals + collected_rate_minerals + 5spent_minerals - 8idle_work_time) * 10e-6 $$ need further adjust.
version 2:
use *spent_minerals*
to reward the action; use *killed_value_units + killed_value_structures*
to reward attack point.
In summary, $$ reward = [reward_a, reward_p] $$ where $$ reward_a = spent_minerals * (10e^{-2}) $$
In addition, the reward will adjusted further to simulate the returns from environment more precisely.
- if action is available, actual_action is action, else expect UnboundLocalError and return actual_action as actions.FUNCTIONS.no_op.
if actual_action == action:
reward_a = reward_a * 10
- if win in an episode:
reward = list(np.array(reward) + 10000)
- if done is True (forces are equal in the match of each side):
reward = list(np.array(reward) - 5000)
Run the environment test script as follows with
python runner_basic_test.py
I also use the supervised value network to validate if the gradient update is worked, just run with
python runner_nn_test.py
Train an DQN agent to play the game with
python runner_dqn.py
Train an A2C agent to play the game with (Coming Soon)
python runner_a2c.py
The structure of value neural agent can be trained through DQN algorithm. The value neural agent takes three different types of input tensors include 27 channels screen features, 11channels mini-map features and 11 channels player information features.
Also, two different functional models, named as operation model and warfare model, shared several layers of the whole network and the inputs. The two models output action value and value of attack position respectively.
The DQN algorithm is expressed as follows.
The structure of policy neural agent is constructed as follows with pytorch 1.2.0. Try to use A2C algorhthm to search better policy distribution (Coming Soon).
Build value neural agent to learn high-value action based on DQN algorithm, currently the state of convergence as follows:
- The most important, the agent has learned available action sequence like how to select army and attack (select_scv-->build_supply_deport-->build_barrack-->train marines in multiple times-->select_all_troops-->reach to attack_point and attack);
- available to train new battle units when army is losing, build around 10 marines in each attack wave;
- available to attack specific position.
Several replays are saved in here. Best score in one episode: 8339872.5. Average learning losses after 1068 batch_pools.
And finally,
En Taro Adun !!!
En Taro Tassadar !!!
En Taro Zeratul !!!
En Taro Artanis !!!
TODO List:
- Standardize the game rules in environment;
- Use Keyframe rather than all frame as inputs;
- Use recurrent block to extract temporal features;
- Optimize the attack position to select from the positions of known enemies, rather than all positions on the mini-map;
- add residual block to reinforce the ability of image feature extraction;
- add attention module to refine the functions in each module.
https://github.com/deepmind/pysc2
https://github.com/skjb/pysc2-tutorial
https://github.com/Dentosal/python-sc2