This project implements a RL agent for doing Dynamic Channel Allocation in a simulated mobile caller environment.
The implementation is in Haskell and uses Accelerate for numerical work. It is a near-complete port of the best-performing agent (AA-VNet) from https://github.com/tsoernes/dca. The agent uses a linear neural network as state value function approximator, which is trained using a newly proposed average-reward variant of TDC gradients, originally defined for discounted returns in Sutton et al. 2009: "Fast gradient-descent methods for temporal-difference learning with linear function approximation."
For an introduction to the channel allocation problem and how RL is applied to solving it, see: Torstein Sørnes 2018: Contributions to centralized dynamic channel allocation reinforcement learning agents
See also the version written in Rust and Python.
The following builds with O2 and other optimizations.
stack build --stack-yaml stack-release.yaml
To build without optimizations but with profiling flags, drop the --stack-yaml ..
option.
stack exec --stack-yaml stack-release.yaml dca-exe -- --backend cpu
Which will run the project, and on startup generate a full computational graph which
contains both the call network simulator and the agent's neural network.
The computational graph is compiled using Accelerate.LLVM.Native
, and executed
on the CPU. To use Accelerate's build-in interpreter instead, skip the --backend cpu
flag.
Support for compiling to GPU can be obtained by adding the dependency
accelerate-llvm-ptx
and switching out the imports in AccUtils.hs
.
To see available options, run:
stack exec --stack-yaml stack-release.yaml dca-exe -- --help
Available options:
--call_dur MINUTES Call duration for new calls. (default: 3.0)
--call_dur_hoff MINUTES Call duration for handed-off calls. (default: 1.0)
--call_rate PER_HOUR Call arrival rate (new calls). (default: 200.0)
--hoff_prob PROBABILITY Hand-off probability. Set to 0 to disable
hand-offs. (default: 0.0)
--n_events N Simulation duration, in number of processed
events. (default: 10000)
--log_iter N How often to show run time statistics such as call
blocking probability. (default: 1000)
--learning_rate F For neural net, i.e. state value
update. (default: 2.52e-6)
--learning_rate_avg F Learning rate for the average reward
estimate. (default: 6.0e-2)
--learning_rate_grad F Learning rate for gradient
correction. (default: 5.0e-6)
--backend ARG Accepted backends are 'interp' for 'Interpreter' and
'cpu' for 'LLVM.Native'.The interpreter yields better
error messages. (default: Interpreter)
--min_loss F Abort simulation if loss goes below given absolute
value. Set to 0 to disable. (default: 0.0)
--fixed_rng Use a fixed (at 0) seed for the RNG. If this switch
is not enabled, the seed is selected at random.
-h,--help Show this help text
- Implement hand-off look-ahead