Skip to content

Commit

Permalink
Words
Browse files Browse the repository at this point in the history
  • Loading branch information
carmocca committed Mar 7, 2024
1 parent 8230b28 commit 1b9ed9c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion tutorials/pretrain_tinyllama.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Around 1.2 TB of disk space is required to store both datasets.

## Prepare the datasets for training

In order to start pretraining litgpt on it, you need to read, tokenize, and write the data in binary chunks. This will leverage the `litdata` optimization pipeline and streaming dataset that comes with Lightning.
In order to start pretraining litgpt on it, you need to read, tokenize, and write the data in binary chunks. This will leverage the `litdata` optimization pipeline and streaming dataset.

First, install additional dependencies for preprocessing:

Expand Down

0 comments on commit 1b9ed9c

Please sign in to comment.