diff --git a/README.md b/README.md index 367cdd6..e6f81bf 100644 --- a/README.md +++ b/README.md @@ -58,7 +58,7 @@ In the following derivations, I assume $x_t$, $y_t$, and $A_{t, :}$ are zeros fo $\mathcal{L}$ represents the loss evaluated with a chosen function. -### Propagating gradients to the input $x_t$ +### Propagating gradients to the input $`x_t`$ Firstly, let me introduce $`\hat{A}_{t,i} = -A_{t,i}`$ so we can get rid of the (a bit annoying) minus sign and write the filter as (equation 1): ```math @@ -105,7 +105,7 @@ Moreover, we can get $B_{t + i, i}$ by setting $`A_{t,i} := A_{t+i,i}`$, implies In summary, getting the gradients for the time-varying IIR filter inputs is easy as filtering the backpropagated gradients backwards with the coefficient matrix shifted column-wised. -### Propagating gradients to the coefficients $\mathbf{A}$ +### Propagating gradients to the coefficients $`\mathbf{A}`$ The explanation of this section is based on a high-level view of backpropagation. @@ -145,7 +145,7 @@ y_{T-1} & y_{T - 2} & \dots & y_{T - N} . ``` -### Gradients for the initial condition $y_t|_{t \leq 0}$ +### Gradients for the initial condition $`y_t|_{t \leq 0}`$ The algorithm could be extended for modelling initial conditions based on the same idea from the previous [section](#propagating-gradients-to-the-coefficients). The initial conditions are the inputs to the system when $t \leq 0$, so their gradients equal $`\frac{\partial \mathcal{L}}{\partial x_t}|_{-N < t \leq 0}`$.