Formal Algorithms for Transformers: link (gotta love DeepMind).
- Einstein notation: link (thx Dani!)
- Also, here it gives a good idea on how to use ellipsis: link
- Backpropagation: link
- ReLU derivative takes value 0 in 0 by convention: link
- RMS Normalization paper: link
- Install folder as packages to share code between them (more info here).