You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the difference between the 0724 and 0424 model? I can't find documentation any where. It seems like the official config files are identical. Looking at the intermediate checkpoints, it looks like 0724 is a continuation of 0424, resuming from the preannealing checkpoint. If so, what is the LR schedule for the continuation, and what is the additional dataset?
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered:
We trained OLMo 7B 0424 with a two-stage curriculum:
In the first stage, we trained the model from scratch on the Dolma 1.7 dataset. We set a cosine learning rate schedule with a warmup of 2500 steps, a peak learning rate of 3e-4, and a cosine decay to 3e-5 after 3T tokens. We cut off this stage after 2.7T tokens.
We switch to the second stage, in which we train on a higher-quality subset of Dolma 1.7 for another 50B tokens, while linearly decaying the learning rate to 0.
Our high-quality Dolma 1.7 subset includes (1) using all available Wikipedia, OpenWebMath and Flan data, (2) removing Dolma CC, CC News, and Megawika, and (3) rebalancing remaining sources to achieve approximately equal proportions of each. See exact token counts and relative proportions of this second stage mix below. Both stages contribute equally to the final performance of the OLMo model.
📚 The doc issue
What is the difference between the 0724 and 0424 model? I can't find documentation any where. It seems like the official config files are identical. Looking at the intermediate checkpoints, it looks like 0724 is a continuation of 0424, resuming from the preannealing checkpoint. If so, what is the LR schedule for the continuation, and what is the additional dataset?
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: