From 632b28f76b3536644b1ef8b2ecbf8e251342f3e9 Mon Sep 17 00:00:00 2001
From: Sebastian Raschka <mail@sebastianraschka.com>
Date: Mon, 15 Apr 2024 08:45:53 -0500
Subject: [PATCH] Add Pre-training Small Base LMs with Fewer Tokens paper to
 community projects

---
 README.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/README.md b/README.md
index ef39c4fd4f..d8ee06cecf 100644
--- a/README.md
+++ b/README.md
@@ -474,6 +474,11 @@ LitGPT powered the [TinyLlama project](https://github.com/jzhang38/TinyLlama) an
 
 [MicroLlama](https://github.com/keeeeenw/MicroLlama) is a 300M Llama model pretrained on 50B tokens powered by TinyLlama and LitGPT.
 
+&nbsp;
+
+**🔬 Pre-training Small Base LMs with Fewer Tokens**
+
+The research paper ["Pre-training Small Base LMs with Fewer Tokens"](https://arxiv.org/abs/2404.08634), which utilizes LitGPT, develops smaller base language models by inheriting a few transformer blocks from larger models and training on a tiny fraction of the data used by the larger models. It demonstrates that these smaller models can perform comparably to larger models despite using significantly less training data and resources.
 
 &nbsp;