Skip to content

Commit

Permalink
fix doc fishr display
Browse files Browse the repository at this point in the history
  • Loading branch information
lisab00 committed Sep 14, 2023
1 parent 503354e commit ef8a306
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions docs/docFishr.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@ Intuitively, two domains are locally inconsistent around a minimizer, if a small
perturbation of the minimizer highly affects its optimality in one domain, but only
minimally affects its optimality in the other domain. Under certain assumptions, most importantly
the Hessians being positive definite, it is possible to measure the inconsistency between two domains
$$A$$ and $$B$$ with the following inconsistency score:
$A$ and $B$ with the following inconsistency score:

$$
\mathcal{I}^\epsilon (\theta^*) = \textnormal{max}_{(A,B)\in\mathcal{E}^2} \biggl( \mathcal{R}_B(\theta^*) - \mathcal{R}_A(\theta^*) + \textnormal{max}_{\frac{1}{2}\theta^T H_A \theta\leq\epsilon}\frac{1}{2}\theta^T H_B \theta \biggl)
$$

, whereby $$\theta^*$$ denotes the minimizer, $$\mathcal{E}$$ denotes the set of training domains,
$$H_e$$ denotes the Hessian for $$e\in\mathcal{E}$$, §§\theta$$ denote the network parameters
and $$\mathcal{R}_e$$ for $$e\in\mathcal{E}$$ denotes the domain-level ERM objective.
, whereby $\theta^*$ denotes the minimizer, $\mathcal{E}$ denotes the set of training domains,
$H_e$ denotes the Hessian for $e\in\mathcal{E}$, $\theta$ denote the network parameters
and $\mathcal{R}_e$ for $e\in\mathcal{E}$ denotes the domain-level ERM objective.
The Fishr regularization method forces both terms on the right hand side
of the inconsistency score to become small. The first term represents the difference
between the domain-level risks and is implicitly forced to be small by applying
Expand All @@ -39,7 +39,7 @@ domain-level Hessians, matching the variances across domains.


### Matching the Variances during training
Let $$\mathcal{E}$$ be the space of all training domains, and let $$\mathcal{R}_e(\theta) be the ERM
Let $\mathcal{E}$ be the space of all training domains, and let $\mathcal{R}_e(\theta)$ be the ERM
objective. Fishr minimizes the following objective function during training:

$$
Expand All @@ -52,9 +52,9 @@ $$
\mathcal{L}_\textnormal{Fishr}(\theta) = \frac{1}{|\mathcal{E}|}\sum_{e\in\mathcal{E}} \| v_e -v \|^2_2
$$

with $$v_e$$ denoting the variance between the gradients of domain $$e\in\mathcal{E}$$ and
$$v$$ denoting the average variance of the gradients across all domains, i.e.
$$<v = \frac{1}{|\mathcal{E}|}\sum_{e\in\mathcal{E}} v_e$$.
with $v_e$ denoting the variance between the gradients of domain $e\in\mathcal{E}$ and
$v$ denoting the average variance of the gradients across all domains, i.e.
$v = \frac{1}{|\mathcal{E}|}\sum_{e\in\mathcal{E}} v_e$.



Expand All @@ -64,7 +64,7 @@ $$<v = \frac{1}{|\mathcal{E}|}\sum_{e\in\mathcal{E}} v_e$$.
The variance of the gradients within each domain can be computed with the
BACKPACK package (see: Dangel, Felix, Frederik Kunstner, and Philipp Hennig.
"Backpack: Packing more into backprop." https://arxiv.org/abs/1912.10985).
Further on, we use $$ \textnormal{Var}(G) \approx \textnormal{diag}(H) $$.
Further on, we use $ \textnormal{Var}(G) \approx \textnormal{diag}(H) $.
The Hessian is then approximated by the Fisher Information Matrix, which
again is approximated by an empirical estimator for computational efficiency.
For more details, see the reference below or the domainlab code.
Expand Down

0 comments on commit ef8a306

Please sign in to comment.