From 1b9ed9cfb969ffdf9b926b27bda54d4f18e597ad Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlos=20Mochol=C3=AD?= Date: Thu, 7 Mar 2024 19:23:18 +0100 Subject: [PATCH] Words --- tutorials/pretrain_tinyllama.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/pretrain_tinyllama.md b/tutorials/pretrain_tinyllama.md index 00e1329cf6..27db2b0d20 100644 --- a/tutorials/pretrain_tinyllama.md +++ b/tutorials/pretrain_tinyllama.md @@ -44,7 +44,7 @@ Around 1.2 TB of disk space is required to store both datasets. ## Prepare the datasets for training -In order to start pretraining litgpt on it, you need to read, tokenize, and write the data in binary chunks. This will leverage the `litdata` optimization pipeline and streaming dataset that comes with Lightning. +In order to start pretraining litgpt on it, you need to read, tokenize, and write the data in binary chunks. This will leverage the `litdata` optimization pipeline and streaming dataset. First, install additional dependencies for preprocessing: