Skip to content

Commit

Permalink
2nd to last patch.
Browse files Browse the repository at this point in the history
  • Loading branch information
ludgerpaehler committed Dec 15, 2023
1 parent 14adc2d commit cf453f4
Show file tree
Hide file tree
Showing 18 changed files with 90 additions and 2 deletions.
Binary file added .DS_Store
Binary file not shown.
Binary file added EnvironmentsFigure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added KSAC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Koopman_RL_NeurIPS_WS.pdf
Binary file not shown.
Binary file added Koopman_Value_Iteration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Koopman_operator_for_nonlinear_systems.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions _layouts/default.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ <h2 class="project-tagline">{{ page.description | default: site.description | de
{% endif %}
{% if site.show_downloads %}
<a href="" class="btn">ArXiv Preprint (coming soon)</a>
<a href="./Koopman_RL_NeurIPS_WS.pdf" class="btn">NeurIPS WS Paper</a>
<a href="https://openreview.net/forum?id=IaUDEYN48p" class="btn">OpenReview</a>
<a href="https://github.com/Pdbz199/koopman-rl" class="btn">Code</a>
{% endif %}
Expand Down
Binary file added _site/EnvironmentsFigure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _site/KSAC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _site/Koopman_RL_NeurIPS_WS.pdf
Binary file not shown.
Binary file added _site/Koopman_Value_Iteration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _site/Koopman_operator_for_nonlinear_systems.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 47 additions & 1 deletion _site/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ <h2 class="project-tagline">Fusing Koopman operators with maximum entropy RL alg


<a href="" class="btn">ArXiv Preprint (coming soon)</a>
<a href="./Koopman_RL_NeurIPS_WS.pdf" class="btn">NeurIPS WS Paper</a>
<a href="https://openreview.net/forum?id=IaUDEYN48p" class="btn">OpenReview</a>
<a href="https://github.com/Pdbz199/koopman-rl" class="btn">Code</a>

Expand All @@ -47,13 +48,58 @@ <h2 id="abstract">Abstract</h2>

<p>The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory contexts due, in part, to their guaranteed convergence towards a system’s optimal value function. However, this approach has severe limitations. This paper explores the connection between the data-driven Koopman operator and Bellman Markov Decision Processes, resulting in the development of two new RL algorithms to address these limitations. In particular, we focus on Koopman operator methods that reformulate a nonlinear system by lifting into new coordinates where the dynamics become linear, and where HJB-based methods are more tractable. These transformations enable the estimation, prediction, and control of strongly nonlinear dynamics. Viewing the Bellman equation as a controlled dynamical system, the Koopman operator is able to capture the expectation of the time evolution of the value function in the given systems via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a new <em>Koopman tensor</em> that facilitates the estimation of the optimal value function. Then, a transformation of Bellman’s framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft-value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on three controlled dynamical systems: the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing. It does this all while maintaining an interpretability that shows how inputs tend to affect outputs, what we call <em>input-output</em> interpretability.</p>

<h2 id="koopman-reinforcement-learning">Koopman Reinforcement Learning</h2>

<h3 id="the-construction-of-the-koopman-tensor">The Construction of the Koopman Tensor</h3>

<p><img src="Koopman_operator_for_nonlinear_systems.png" alt="KoopmanOp" /></p>

<p><img src="koopman_tensor.jpeg" alt="KoopmanTensor" /></p>

<h3 id="deriving-koopman-variants-of-maximum-entropy-reinforcement-learning-algorithms">Deriving Koopman-Variants of Maximum Entropy Reinforcement Learning Algorithms</h3>

<p>Using the Koopman tensor machinery we are then able to fuse the Koopman tensor approach with existing maximum entropy reinforcement learning algorithms:</p>

<h4 id="soft-actor-koopman-critic">Soft Actor Koopman Critic</h4>

<p><img src="KSAC.png" alt="KSAC" /></p>

<h4 id="soft-koopman-value-iteration">Soft Koopman Value Iteration</h4>

<p><img src="Koopman_Value_Iteration.png" alt="KVI" /></p>

<h3 id="experimental-evaluation">Experimental Evaluation</h3>

<p>The set of environments to evaluate the performance of our Koopman-infused reinforcement learning algorithms</p>

<p><img src="EnvironmentsFigure.png" alt="Experiments" /></p>

<p>On which we then compare our reinforcement learning algorithms with the <a href="">CleanRL</a> implementations of</p>

<ul>
<li>The Q-function based Soft Actor-Critic, in our graphs called <em>SAC (Q)</em></li>
<li>The Value-function based Soft Actor-Critic, in our graphs called <em>SAC (V)</em></li>
</ul>

<p>and the classical control baseline of the linear quadratic controller (LRQ). Giving us the following performance across environments</p>

<p><img src="results.png" alt="Results" /></p>

<p>To briefly summarize these results:</p>

<ul>
<li>We reach SOTA on the linear system after only 5,000 environment steps, outpacing the Q-function based SAC</li>
<li>The Soft Actor Koopman Critic (SAKC) consistently converges, showcasing adaptability and closely tracking existing SAC implementations</li>
<li>The pre-trained Soft Koopman Value Iteration (SKVI) consistently achieves optimal returns alongside Soft Actor Koopman Critic, and the Soft Actor-Critic baselines</li>
</ul>

<h2 id="authors">Authors</h2>

<center>
<div class="row1">
<div style="float:left;margin-right:20px;">
<img src="rozwood.png" height="200" width="200" alt="preston" />
<p style="text-align:center;"><a href="https://github.com/Pdbz199">Preston Rozwood</a></p>
<p style="text-align:center;"><a href="https://www.linkedin.com/in/preston-rozwood/">Preston Rozwood</a></p>
</div>
<div style="float:left;margin-right:20px;">
<img class="middle-img" src="mehrez.jpg" height="200" width="200" alt="ludger" />
Expand Down
Binary file added _site/koopman_tensor.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _site/results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 42 additions & 1 deletion index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,54 @@ title: Koopman-Assisted Reinforcement Learning

The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory contexts due, in part, to their guaranteed convergence towards a system’s optimal value function. However, this approach has severe limitations. This paper explores the connection between the data-driven Koopman operator and Bellman Markov Decision Processes, resulting in the development of two new RL algorithms to address these limitations. In particular, we focus on Koopman operator methods that reformulate a nonlinear system by lifting into new coordinates where the dynamics become linear, and where HJB-based methods are more tractable. These transformations enable the estimation, prediction, and control of strongly nonlinear dynamics. Viewing the Bellman equation as a controlled dynamical system, the Koopman operator is able to capture the expectation of the time evolution of the value function in the given systems via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a new _Koopman tensor_ that facilitates the estimation of the optimal value function. Then, a transformation of Bellman’s framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft-value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on three controlled dynamical systems: the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing. It does this all while maintaining an interpretability that shows how inputs tend to affect outputs, what we call _input-output_ interpretability.

## Koopman Reinforcement Learning

### The Construction of the Koopman Tensor

![KoopmanOp](Koopman_operator_for_nonlinear_systems.png)

![KoopmanTensor](koopman_tensor.jpeg)

### Deriving Koopman-Variants of Maximum Entropy Reinforcement Learning Algorithms

Using the Koopman tensor machinery we are then able to fuse the Koopman tensor approach with existing maximum entropy reinforcement learning algorithms:

#### Soft Actor Koopman Critic

![KSAC](KSAC.png)

#### Soft Koopman Value Iteration

![KVI](Koopman_Value_Iteration.png)

### Experimental Evaluation

The set of environments to evaluate the performance of our Koopman-infused reinforcement learning algorithms

![Experiments](EnvironmentsFigure.png)

On which we then compare our reinforcement learning algorithms with the [CleanRL]() implementations of

* The Q-function based Soft Actor-Critic, in our graphs called _SAC (Q)_
* The Value-function based Soft Actor-Critic, in our graphs called _SAC (V)_

and the classical control baseline of the linear quadratic controller (LRQ). Giving us the following performance across environments

![Results](results.png)

To briefly summarize these results:

* We reach SOTA on the linear system after only 5,000 environment steps, outpacing the Q-function based SAC
* The Soft Actor Koopman Critic (SAKC) consistently converges, showcasing adaptability and closely tracking existing SAC implementations
* The pre-trained Soft Koopman Value Iteration (SKVI) consistently achieves optimal returns alongside Soft Actor Koopman Critic, and the Soft Actor-Critic baselines

## Authors

<center>
<div class="row1">
<div style="float:left;margin-right:20px;">
<img src="rozwood.png" height="200" width="200" alt="preston" />
<p style="text-align:center;"><a href="https://github.com/Pdbz199">Preston Rozwood</a></p>
<p style="text-align:center;"><a href="https://www.linkedin.com/in/preston-rozwood/">Preston Rozwood</a></p>
</div>
<div style="float:left;margin-right:20px;">
<img class="middle-img" src="mehrez.jpg" height="200" width="200" alt="ludger" />
Expand Down
Binary file added koopman_tensor.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit cf453f4

Please sign in to comment.