diff --git a/Readme.md b/Readme.md index edf0896..acaf0ff 100644 --- a/Readme.md +++ b/Readme.md @@ -2,18 +2,18 @@ ![Build Status](https://github.com/tfjgeorge/nngeometry/actions/workflows/nngeometry.yml/badge.svg) [![codecov](https://codecov.io/gh/tfjgeorge/nngeometry/branch/master/graph/badge.svg)](https://codecov.io/gh/tfjgeorge/nngeometry) [![DOI](https://zenodo.org/badge/208082966.svg)](https://zenodo.org/badge/latestdoi/208082966) [![PyPI version](https://badge.fury.io/py/nngeometry.svg)](https://badge.fury.io/py/nngeometry) - - NNGeometry allows you to: - - compute **Fisher Information Matrices** (FIM) or derivates, using efficient approximations such as low-rank matrices, KFAC, diagonal and so on. + - compute Gauss-Newton or **Fisher Information Matrices** (FIM), as well as any matrix that is written as the covariance of gradients w.r.t. parameters, using efficient approximations such as low-rank matrices, KFAC, EKFAC, diagonal and so on. - compute finite-width **Neural Tangent Kernels** (Gram matrices), even for multiple output functions. - compute **per-examples jacobians** of the loss w.r.t network parameters, or of any function such as the network's output. - easily and efficiently compute linear algebra operations involving these matrices **regardless of their approximation**. - compute **implicit** operations on these matrices, that do not require explicitely storing large matrices that would not fit in memory. +It offers a high level abstraction over the parameter and function spaces described by neural networks. As a simple example, a parameter space vector `PVector` actually contains weight matrices, bias vectors, or convolutions kernels of the whole neural network (a set of tensors). Using NNGeometry's API, performing a step in parameter space (e.g. an update of your favorite optimization algorithm) is abstracted as a python addition: `w_next = w_previous + epsilon * delta_w`. + ## Example -In the Elastic Weight Consolidation continual learning technique, you want to compute . It can be achieved with a diagonal approximation for the FIM using: +In the Elastic Weight Consolidation continual learning technique, you want to compute $`\left(\mathbf{w}-\mathbf{w}_{A}\right)^{\top}F\left(\mathbf{w}-\mathbf{w}_{A}\right)`$. It can be achieved with a diagonal approximation for the FIM using: ```python F = FIM(model=model, loader=loader, @@ -22,6 +22,8 @@ F = FIM(model=model, regularizer = F.vTMv(w - w_a) ``` +The first statement instantiates a diagonal matrix, and populates it with the diagonal coefficients of the FIM of the model `model` computed using the examples from the dataloader `loader`. + If diagonal is not sufficiently accurate then you could instead choose a KFAC approximation, by just changing `PMatDiag` to `PMatKFAC` in the above. Note that it internally involves very different operations, depending on the chosen representation (e.g. KFAC, EKFAC, ...). ## Documentation