fix doc fishr display

marrlab · Sep 14, 2023 · ef8a306 · ef8a306
1 parent 503354e
commit ef8a306
Showing 1 changed file with 9 additions and 9 deletions.
diff --git a/docs/docFishr.md b/docs/docFishr.md
@@ -20,15 +20,15 @@ Intuitively, two domains are locally inconsistent around a minimizer, if a small
 perturbation of the minimizer highly affects its optimality in one domain, but only
 minimally affects its optimality in the other domain. Under certain assumptions, most importantly 
 the Hessians being positive definite, it is possible to measure the inconsistency between two domains
-$$A$$ and $$B$$ with the following inconsistency score:
+$A$ and $B$ with the following inconsistency score:
 
 $$
 \mathcal{I}^\epsilon (\theta^*) = \textnormal{max}_{(A,B)\in\mathcal{E}^2} \biggl( \mathcal{R}_B(\theta^*) -  \mathcal{R}_A(\theta^*) + \textnormal{max}_{\frac{1}{2}\theta^T H_A \theta\leq\epsilon}\frac{1}{2}\theta^T H_B \theta \biggl) 
 $$
 
-, whereby $$\theta^*$$ denotes the minimizer, $$\mathcal{E}$$ denotes the set of training domains,
-$$H_e$$ denotes the Hessian for $$e\in\mathcal{E}$$, §§\theta$$ denote the network parameters
-and $$\mathcal{R}_e$$ for $$e\in\mathcal{E}$$ denotes the domain-level ERM objective.
+, whereby $\theta^*$ denotes the minimizer, $\mathcal{E}$ denotes the set of training domains,
+$H_e$ denotes the Hessian for $e\in\mathcal{E}$, $\theta$ denote the network parameters
+and $\mathcal{R}_e$ for $e\in\mathcal{E}$ denotes the domain-level ERM objective.
 The Fishr regularization method forces both terms on the right hand side 
 of the inconsistency score to become small. The first term represents the difference
 between the domain-level risks and is implicitly forced to be small by applying
@@ -39,7 +39,7 @@ domain-level Hessians, matching the variances across domains.
 
 
 ### Matching the Variances during training
-Let $$\mathcal{E}$$ be the space of all training domains, and let $$\mathcal{R}_e(\theta) be the ERM
+Let $\mathcal{E}$ be the space of all training domains, and let $\mathcal{R}_e(\theta)$ be the ERM
 objective. Fishr minimizes the following objective function during training:
 
 $$
@@ -52,9 +52,9 @@ $$
 \mathcal{L}_\textnormal{Fishr}(\theta) = \frac{1}{|\mathcal{E}|}\sum_{e\in\mathcal{E}} \| v_e -v \|^2_2
 $$
 
-with $$v_e$$ denoting the variance between the gradients of domain $$e\in\mathcal{E}$$ and
-$$v$$ denoting the average variance of the gradients across all domains, i.e.
-$$<v = \frac{1}{|\mathcal{E}|}\sum_{e\in\mathcal{E}} v_e$$.
+with $v_e$ denoting the variance between the gradients of domain $e\in\mathcal{E}$ and
+$v$ denoting the average variance of the gradients across all domains, i.e.
+$v = \frac{1}{|\mathcal{E}|}\sum_{e\in\mathcal{E}} v_e$.
 
 
 
@@ -64,7 +64,7 @@ $$<v = \frac{1}{|\mathcal{E}|}\sum_{e\in\mathcal{E}} v_e$$.
 The variance of the gradients within each domain can be computed with the
 BACKPACK package (see: Dangel, Felix, Frederik Kunstner, and Philipp Hennig.
 "Backpack: Packing more into backprop." https://arxiv.org/abs/1912.10985). 
-Further on, we use $$ \textnormal{Var}(G) \approx \textnormal{diag}(H) $$.
+Further on, we use $ \textnormal{Var}(G) \approx \textnormal{diag}(H) $.
 The Hessian is then approximated by the Fisher Information Matrix, which
 again is approximated by an empirical estimator for computational efficiency.
 For more details, see the reference below or the domainlab code.