AlphaZero-based Proof Cost Network to Aid Game Solving

This repository is the official implementation for ICLR 2022 paper [AlphaZero-based Proof Cost Network to Aid Game Solving].

Requirements

All the experiments, including the training part and the evaluation part, are done in a container. The dockerfile for building the container is placed in the directory "docker/Dockerfile". You can use either docker or podman commands to set up/attach the environment.

In this instruction, we use podman commands to demonstrate how to reproduce our experiment results. If you need to install podman, please refer to the commands below (skip this step if you already installed podman):

sudo apt-get update 
sudo apt-get install -y curl

source /etc/os-release
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/Release.key | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install podman

To setup the container, run the commands below to build the image for our container and run the container in the background.

cd docker
podman build . --tag minizero --no-cache
cd ..
podman run --name minizero --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host --ipc=host --rm -it -d -v .:/workspace minizero bash

Training

Before training, you need to select a game (either 9x9 Killall-Go or 15x15 Gomoku) to experiment, and compile the corresponding executables. Note that if you compile for one game, the executables for the other game will be removed.

To compile 15x15 Gomoku version:

podman exec -it minizero ./scripts/clean-up.sh
podman exec -it minizero ./scripts/setup-cmake.sh GOMOKU release
podman exec -it minizero make -j

To compile 9x9 Killall-Go version:

podman exec -it minizero ./scripts/clean-up.sh
podman exec -it minizero ./scripts/setup-cmake.sh GO release
podman exec -it minizero make -j

To train the model(s) in the paper, run the following three processes at the same time. (Server, Optimizer, and Self-Play Worker.) You may use three terminals to run three processes at the same time.

Run Server

The server listens to the 9999 port, please make sure that this port is not being used by other programs. Note that you can only train one model at the same time. To train a model in the paper, run the corresponding commands below (only run the one you are training). The second last argument (e.g. "training/gomoku_AZ") indicates the directory to place the training logs and models.

15x15 Gomoku (NDK) α0:

podman exec -it minizero ./scripts/zero-server.sh training_cfg/gomoku_AZ.cfg training/gomoku_AZ 300
# press 'y'

15x15 Gomoku (NDK) PCN-b_max:

podman exec -it minizero ./scripts/zero-server.sh training_cfg/gomoku_max.cfg training/gomoku_max 300
# press 'y'

15x15 Gomoku (NDK) PCN-b_heur:

podman exec -it minizero ./scripts/zero-server.sh training_cfg/gomoku_heur.cfg training/gomoku_heur 300
# press 'y'

15x15 Gomoku (4T) α0:

podman exec -it minizero ./scripts/zero-server.sh training_cfg/gomoku_AZ-4T.cfg training/gomoku_AZ-4T 300
# press 'y'

15x15 Gomoku (4T) PCN-b_max:

podman exec -it minizero ./scripts/zero-server.sh training_cfg/gomoku_max-4T.cfg training/gomoku_max-4T 300
# press 'y'

15x15 Gomoku (4T) PCN-b_heur:

podman exec -it minizero ./scripts/zero-server.sh training_cfg/gomoku_heur-4T.cfg training/gomoku_heur-4T 300
# press 'y'

9x9 Killall-Go α0:

podman exec -it minizero ./scripts/zero-server.sh training_cfg/go_AZ.cfg training/go_AZ 300
# press 'y'

9x9 Killall-Go PCN-b_max (OR: White):

podman exec -it minizero ./scripts/zero-server.sh training_cfg/go_W-max.cfg training/go_W-max 300
# press 'y'

9x9 Killall-Go PCN-b_heur (OR: White):

podman exec -it minizero ./scripts/zero-server.sh training_cfg/go_W-heur.cfg training/go_W-heur 300
# press 'y'

9x9 Killall-Go PCN-b_max (OR: Black):

podman exec -it minizero ./scripts/zero-server.sh training_cfg/go_B-max.cfg training/go_B-max 300
# press 'y'

9x9 Killall-Go PCN-b_heur (OR: Black):

podman exec -it minizero ./scripts/zero-server.sh training_cfg/go_B-heur.cfg training/go_B-heur 300
# press 'y'

Run Optimizer

To run the Optimizer, run the command below.

podman exec -it minizero ./scripts/worker.sh localhost 9999 op

Run Self-Play Worker

You can run multiple Self-play Workers on different machines to speed up training.

To run a Self-Play Worker, run the command below. You may replace "localhost" by the hostname of the machine where the server is running if you are using multiple machines.

podman exec -it minizero ./scripts/worker.sh localhost 9999 sp

The training results will be placed in the directory "training/". For example, if you trained gomoku_AZ, you can find the model under "training/gomoku_AZ/model/". Training logs including "Training.log" & "sgf/" files can also be found under "training/gomoku_AZ/".

Evaluation

The models we used in this paper are placed under the directory "models/". To reproduce the experiment results from these models, run the commands below.

Part A. 15x15 Gomoku

To evaluate each model on solving problems with MCTS and FDFPN solvers, run:

podman exec -it minizero ./scripts/run_gomoku.sh

To display MCTS solver results, run:

podman exec -it minizero ./scripts/display_solver_results_gomoku.sh /workspace/problems/Gomoku /workspace/result/Gomoku/MCTS f

To display FDFPN solver results, run:

podman exec -it minizero ./scripts/display_solver_results_gomoku.sh /workspace/problems/Gomoku /workspace/result/Gomoku/FDFPN f

To evaluate the playing strength of PCNs against AlphaZero, run:

podman exec -it minizero ./scripts/fight_against_AZ_gomoku.sh

To display the playing results of PCNs against AlphaZero, run:

podman exec -it minizero ./scripts/display_fight_results_gomoku.sh

Part B. 9x9 Killall-Go

To evaluate each model on solving problems with MCTS and FDFPN solvers, run:

podman exec -it minizero ./scripts/run_go.sh

To display MCTS solver results, run:

podman exec -it minizero ./scripts/display_solver_results_go.sh /workspace/problems/9x9Killall-Go /workspace/result/9x9Killall-Go/MCTS f

To display FDFPN solver results, run:

podman exec -it minizero ./scripts/display_solver_results_go.sh /workspace/problems/9x9Killall-Go /workspace/result/9x9Killall-Go/FDFPN f

To evaluate the playing strength of PCNs against AlphaZero, run:

podman exec -it minizero ./scripts/fight_against_AZ_go.sh

To display the playing results of PCNs against AlphaZero, run:

podman exec -it minizero ./scripts/display_fight_results_go.sh

To evaluate the model trained by yourself, simply replace the model under "models/" with the one under "training/". For example, replacing "models/gomoku_AZ/weight_iter_150000.pt" with "training/gomoku_AZ/model/weight_iter_150000.pt".

Results

For both 15x15 Gomoku and 9x9 Killall-Go, PCN outperforms AlphaZero on problem sets placed under the folder "problems/". In terms of playing strength, PCN has win rates near or even higher than 50% against AlphaZero. The following tables show the number of problems that can be solved within 30 minutes with each model.

MCTS

Game & Setup \ Model	No Network	α0	PCN-b_max	PCN-b_heur
15x15 Gomoku (NDK)	1 / 77	23 / 77	43 / 77	38 / 77
15x15 Gomoku (4T)	22 / 77	64 / 77	77 / 77	73 / 77
9x9 Killall-Go (OR: White)	1 / 81	28 / 81	79 / 81	76 / 81
9x9 Killall-Go (OR: Black)	1 / 81	28 / 81	38 / 81	46 / 81

FDFPN

Game & Setup \ Model	No Network	α0	PCN-b_max	PCN-b_heur
15x15 Gomoku (NDK)	6 / 77	15 / 77	45 / 77	48 / 77
15x15 Gomoku (4T)	34 / 77	41 / 77	71 / 77	69 / 77
9x9 Killall-Go (OR: White)	31 / 81	76 / 81	79 / 81	77 / 81
9x9 Killall-Go (OR: Black)	31 / 81	76 / 81	66 / 81	68 / 81

Playing Strength

Game & Our Model \ Result against α0	Win Rate (WR)	Black WR	White WR
15x15 Gomoku (NDK) PCN-b_max	50.40% ± 6.21%	100.00%	0.80%
15x15 Gomoku (NDK) PCN-b_heur	50.80% ± 6.21%	100.00%	1.60%
15x15 Gomoku (4T) PCN-b_max	50.40% ± 6.21%	96.80%	4.00%
15x15 Gomoku (4T) PCN-b_heur	50.00% ± 6.21%	100.00%	0.80%
9x9 Killall-Go PCN-b_max (OR: White)	50.00% ± 6.21%	34.40%	65.60%
9x9 Killall-Go PCN-b_heur (OR: White)	60.80% ± 6.07%	66.40%	55.20%
9x9 Killall-Go PCN-b_max (OR: White)	62.25% ± 6.03%	60.00%	64.52%
9x9 Killall-Go PCN-b_heur (OR: White)	60.08% ± 6.06%	53.23%	66.94%

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Games		Games
MiniZero		MiniZero
docker		docker
fight_cfg		fight_cfg
fight_results		fight_results
models		models
problems		problems
py		py
scripts		scripts
solver_cfg		solver_cfg
training_cfg		training_cfg
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaZero-based Proof Cost Network to Aid Game Solving

Requirements

Training

Run Server

Run Optimizer

Run Self-Play Worker

Evaluation

Part A. 15x15 Gomoku

Part B. 9x9 Killall-Go

Results

MCTS

FDFPN

Playing Strength

About

Releases

Packages

Languages

kds285/proof-cost-network

Folders and files

Latest commit

History

Repository files navigation

AlphaZero-based Proof Cost Network to Aid Game Solving

Requirements

Training

Run Server

Run Optimizer

Run Self-Play Worker

Evaluation

Part A. 15x15 Gomoku

Part B. 9x9 Killall-Go

Results

MCTS

FDFPN

Playing Strength

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages