Skip to content

Commit

Permalink
One more hint for what's going on.
Browse files Browse the repository at this point in the history
  • Loading branch information
dirkgr committed Nov 28, 2024
1 parent d74e835 commit b41634f
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ For the 7B model, we train three times with different data order on 50B high qua
| random seed 666 | [stage2-ingredient3-step11931-tokens50B](https://huggingface.co/allenai/OLMo-2-1124-7B/tree/stage2-ingredient3-step11931-tokens50B) | [OLMo2-7B-stage2-seed666.yaml](configs/official-1124/OLMo2-7B-stage2-seed666.yaml) | link to come |
| **final souped model** | [main](https://huggingface.co/allenai/OLMo-2-1124-7B/tree/main) | no config, we just averaged the weights in Python | |

The training configs linked here are set up to download the latest checkpoint after stage 1, and start training from there.

### Stage 2 for the 13B

For the 13B model, we train three times with different data order on 100B high quality tokens, and one more time
Expand All @@ -107,6 +109,8 @@ on 300B high quality tokens. Then we average ("soup") the models.
| random seed 2662, 300B | [stage2-ingredient4-step11931-tokens300B](https://huggingface.co/allenai/OLMo-2-1124-13B/tree/stage2-ingredient4-step35773-tokens300B) | [OLMo2-13B-stage2-seed2662-300B.yaml](configs/official-1124/OLMo2-13B-stage2-seed2662-300B.yaml) | link to come |
| **final souped model** | [main](https://huggingface.co/allenai/OLMo-2-1124-13B/tree/main) | no config, we just averaged the weights in Python | |

The training configs linked here are set up to download the latest checkpoint after stage 1, and start training from there.

## Instruction tuned variants

For instruction tuned variants of these models, go to
Expand Down

0 comments on commit b41634f

Please sign in to comment.