Skip to content

Several agents able to play StarCraft II will be built in this repository

License

Notifications You must be signed in to change notification settings

divergent63/SCII_Bots-WIP

Repository files navigation

SCII_Bots

Several agents able to play StarCraft II will be built in this repository!

Page Views Count

Initializing

Build basic develop environment

First of all, you need to download and install the game. Then follow the instructions below to build awesome battle bots!

The install packages from requirements file are all tested on windows 10, conda 4.9.2 and python 3.7.3.

Create a new conda environment with

conda create -n SCII_Bots python=3.7.3

Install git for python with

conda install git

Install requirements with (note that some of the packages may not need to be used)

pip install -r requirements.txt

If everything goes well, the followed result should be shown in the powershell/prompt.

PS: the newest version of PySC2 package is 3.0.0, which most of the codes published on websites are based on 2.x.x, so most of them are no longer available to run.

PPS: PySC2 is very different with and more complex than python-sc2. In this repository, we mainly focus on PySC2.

Learning

introduction of game environment

State: obtained from env.observation, including the feature screen, feature minimap and player info.

Action: try to determine what to do and where to go to win the game. There are two types of the actions include several basic action (currently 11) and a coordinate position with 64*64 points.

Reward:

version 1: $$ (score + total_value_units + total_value_structures + 10killed_value_units + 10killed_value_structures + collected_minerals + collected_rate_minerals + 5spent_minerals - 8idle_work_time) * 10e-6 $$ need further adjust.

version 2:

use *spent_minerals* to reward the action; use *killed_value_units + killed_value_structures* to reward attack point.

In summary, $$ reward = [reward_a, reward_p] $$ where $$ reward_a = spent_minerals * (10e^{-2}) $$

$$ reward_p = killed_value_units + killed_value_structures $$

In addition, the reward will adjusted further to simulate the returns from environment more precisely.

  • if action is available, actual_action is action, else expect UnboundLocalError and return actual_action as actions.FUNCTIONS.no_op.
if actual_action == action:
    reward_a = reward_a * 10
  • if win in an episode:
reward = list(np.array(reward) + 10000)
  • if done is True (forces are equal in the match of each side):
reward = list(np.array(reward) - 5000)

Run the environment test script as follows with

python runner_basic_test.py

I also use the supervised value network to validate if the gradient update is worked, just run with

python runner_nn_test.py

Train an DQN agent to play the game with

python runner_dqn.py

Train an A2C agent to play the game with (Coming Soon)

python runner_a2c.py
details of neural agents and algorithms

The structure of value neural agent can be trained through DQN algorithm. The value neural agent takes three different types of input tensors include 27 channels screen features, 11channels mini-map features and 11 channels player information features.

Also, two different functional models, named as operation model and warfare model, shared several layers of the whole network and the inputs. The two models output action value and value of attack position respectively.

The DQN algorithm is expressed as follows.

The structure of policy neural agent is constructed as follows with pytorch 1.2.0. Try to use A2C algorhthm to search better policy distribution (Coming Soon).

evaluating

Build value neural agent to learn high-value action based on DQN algorithm, currently the state of convergence as follows:

  • The most important, the agent has learned available action sequence like how to select army and attack (select_scv-->build_supply_deport-->build_barrack-->train marines in multiple times-->select_all_troops-->reach to attack_point and attack);
  • available to train new battle units when army is losing, build around 10 marines in each attack wave;
  • available to attack specific position.

Several replays are saved in here. Best score in one episode: 8339872.5. Average learning losses after 1068 batch_pools.

And finally,

En Taro Adun !!!

En Taro Tassadar !!!

En Taro Zeratul !!!

En Taro Artanis !!!

TODO List:

  1. Standardize the game rules in environment;
  2. Use Keyframe rather than all frame as inputs;
  3. Use recurrent block to extract temporal features;
  4. Optimize the attack position to select from the positions of known enemies, rather than all positions on the mini-map;
  5. add residual block to reinforce the ability of image feature extraction;
  6. add attention module to refine the functions in each module.

references

https://github.com/deepmind/pysc2

https://github.com/skjb/pysc2-tutorial

https://github.com/Dentosal/python-sc2

https://github.com/ClausewitzCPU0/SC2AI

Home · Dentosal/python-sc2 Wiki · GitHub

About

Several agents able to play StarCraft II will be built in this repository

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages