An implementation of the Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning
This document describes how to run the simulation of D3Q Agent, please also check the example.sh
.
main required packages:
- Python 2.7
- PyTorch 0.3.1
- seaborn
- matplotlib
If you are using conda as package/environment management tool, you can create a environment by the spec-file.txt
.
$ conda create --name d3q --file spec-file.txt
all the data is under this folder: ./src/deep_dialog/data
-
Movie Knowledge Bases
movie_kb.1k.p
--- 94% success rate (foruser_goals_first_turn_template_subsets.v1.p
)
movie_kb.v2.p
--- 36% success rate (foruser_goals_first_turn_template_subsets.v1.p
) -
User Goals
user_goals_first_turn_template.v2.p
--- user goals extracted from the first user turn
user_goals_first_turn_template.part.movie.v1.p
--- a subset of user goals [Please use this one, the upper bound success rate on movie_kb.1k.json is 0.9765.] -
NLG Rule Template
dia_act_nl_pairs.v6.json
--- some predefined NLG rule templates for both User simulator and Agent. -
Dialog Act Intent
dia_acts.txt
-
Dialog Act Slot
slot_set.txt
(Note: these are the key difference between the models (DQN, DDQ, and D3Q)
--boosted
: boost the world model with examles generated by rule agent [0, 1]
--train_world_model
: train the world model or not [0, 1]
--discriminator_nn_type
: NN struture of the discriminator (default: RNN) [RNN, MLP]
--train_discriminator
: train the discriminator or not [0, 1]
--model_type
: model type [DQN, DDQ, D3Q]
--agt
: the agent id
--usr
: the user (simulator) id
--max_turn
: maximum turns
--episodes
: how many dialogues to run
--slot_err_prob
: slot level err probability
--slot_err_mode
: which kind of slot err mode
--intent_err_prob
: intent level err probability
--movie_kb_path
: the movie kb path for agent side
--goal_file_path
: the user goal file path for user simulator side
--dqn_hidden_size
: hidden size for RL agent
--batch_size
: batch size for DDQ training
--simulation_epoch_size
: how many dialogue to be simulated in one epoch
--warm_start
: use rule policy to fill the experience replay buffer at the beginning
--warm_start_epochs
: how many dialogues to run in the warm start
--run_mode
: 0 for display mode (NL); 1 for debug mode (Dia_Act); 2 for debug mode (Dia_Act and NL); 3 for no display (i.e. training)
--act_level
: 0 for user simulator is Dia_Act level; 1 for user simulator is NL level
--auto_suggest
: 0 for no auto_suggest; 1 for auto_suggest
--cmd_input_mode
: 0 for NL input; 1 for Dia_Act input. (this parameter is for AgentCmd only)
--write_model_dir
: the directory to write the models
--trained_model_path
: the path of the trained RL agent model; load the trained model for prediction purpose.
--learning_phase
: train/test/all, default is all. You can split the user goal set into train and test set, or do not split (all); We introduce some randomness at the first sampled user action, even for the same user goal, the generated dialogue might be different.
DQN
Basic DQN (DQN(1)):
python run.py --agt 9 --usr 1
--max_turn 40 --movie_kb_path ./deep_dialog/data/movie_kb.1k.p --dqn_hidden_size 80
--experience_replay_pool_size 10000 --episodes 500 --simulation_epoch_size 1 --run_mode 3
--act_level 0 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16
--goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p
--warm_start 1 --warm_start_epochs 50
--planning_steps 0 --boosted 1 --train_world_model 0
--model_type DQN --write_model_dir ./deep_dialog/checkpoints/dqn_1
DQN(5):
python run.py --agt 9 --usr 1
--max_turn 40 --movie_kb_path ./deep_dialog/data/movie_kb.1k.p --dqn_hidden_size 80
--experience_replay_pool_size 10000 --episodes 500 --simulation_epoch_size 1 --run_mode 3
--act_level 0 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16
--goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p
--warm_start 1 --warm_start_epochs 50
--planning_steps 4 --boosted 1 --train_world_model 0
--model_type DQN --write_model_dir ./deep_dialog/checkpoints/dqn_5
Train DQN Agent with k planning steps:
python run.py --agt 9 --usr 1
--max_turn 40 --movie_kb_path ./deep_dialog/data/movie_kb.1k.p --dqn_hidden_size 80
--experience_replay_pool_size 10000 --episodes 500 --simulation_epoch_size 1 --run_mode 3
--act_level 0 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16
--goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p
--warm_start 1 --warm_start_epochs 50
--planning_steps k-1 --boosted 1 --train_world_model 0
--model_type DQN --write_model_dir ./deep_dialog/checkpoints/dqn_k
DDQ
DDQ(5):
python run.py --agt 9 --usr 1
--max_turn 40 --movie_kb_path ./deep_dialog/data/movie_kb.1k.p --dqn_hidden_size 80
--experience_replay_pool_size 10000 --episodes 500 --simulation_epoch_size 1 --run_mode 3
--act_level 0 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16
--goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p
--warm_start 1 --warm_start_epochs 50 --planning_steps 4 --boosted 1 --train_world_model 1
--model_type DDQ --write_model_dir ./deep_dialog/checkpoints/ddq_5_1
DDQ(k):
python run.py --agt 9 --usr 1
--max_turn 40 --movie_kb_path ./deep_dialog/data/movie_kb.1k.p --dqn_hidden_size 80
--experience_replay_pool_size 10000 --episodes 500 --simulation_epoch_size 1 --run_mode 3
--act_level 0 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16
--goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p
--warm_start 1 --warm_start_epochs 50 --planning_steps k-1 --boosted 1 --train_world_model 1
--model_type DDQ --write_model_dir ./deep_dialog/checkpoints/ddq_k_1
D3Q
D3Q(5):
python run.py --agt 9 --usr 1
--max_turn 40 --movie_kb_path ./deep_dialog/data/movie_kb.1k.p --dqn_hidden_size 80
--experience_replay_pool_size 10000 --episodes 500 --simulation_epoch_size 1 --run_mode 3
--act_level 0 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16
--goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p
--warm_start 1 --warm_start_epochs 50 --planning_steps 4 --boosted 1 --train_world_model 1
--model_type D3Q --discriminator_nn_type RNN --write_model_dir ./deep_dialog/checkpoints/d3q_rnn_5_1
D3Q(k):
python run.py --agt 9 --usr 1
--max_turn 40 --movie_kb_path ./deep_dialog/data/movie_kb.1k.p --dqn_hidden_size 80
--experience_replay_pool_size 10000 --episodes 500 --simulation_epoch_size 1 --run_mode 3
--act_level 0 --slot_err_prob 0.00 --intent_err_prob 0.00 --batch_size 16
--goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p
--warm_start 1 --warm_start_epochs 50 --planning_steps k-1 --boosted 1 --train_world_model 1
--model_type D3Q --discriminator_nn_type RNN --write_model_dir ./deep_dialog/checkpoints/d3q_rnn_k_1
You can train the model by the example commands above or check the example.sh
.
This work focuses on training efficiency, therefore we evaluate the performance by learning curves. Please check the example code in the draw_figure.py
.
$ python draw_figure.py
Main papers to be cited
@inproceedings{Su2018D3Q,
title={Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning},
author={Su, Shang-Yu and Li, Xiujun and Gao, Jianfeng and Liu, Jingjing and Chen, Yun-Nung},
booktitle={EMNLP},
year={2018}
}
@inproceedings{Peng2018DeepDynaQ,
title={Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning},
author={Peng, Baolin and Li, Xiujun and Gao, Jianfeng and Liu, Jingjing and Wong, Kam-Fai and Su, Shang-Yu},
booktitle={ACL},
year={2018}
}
@article{li2016user,
title={A User Simulator for Task-Completion Dialogues},
author={Li, Xiujun and Lipton, Zachary C and Dhingra, Bhuwan and Li, Lihong and Gao, Jianfeng and Chen, Yun-Nung},
journal={arXiv preprint arXiv:1612.05688},
year={2016}
}