You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the release there is a link to "training code" but it just leads to the github that doesn't document how can one reproduce training
Suggest a potential alternative/fix
Is it a reasonable thing for someone to rerun your stuff? is it possible? If so, can you document? (even documenting and saying it is complicated and one should prefer e.g. pythia is good to know...)
The text was updated successfully, but these errors were encountered:
Hey @borgr, at the moment the biggest obstacle is accessing the preprocessed training data. Without that you'd have to preprocess it on your own using tools in Dolma, which takes some time. So to address that we're copying the preprocessed data from a private S3 bucket to a public R2 bucket (no egress costs). Once that's done we'll update the paths in the training configs and add an example to the README.
📚 The doc issue
In the release there is a link to "training code" but it just leads to the github that doesn't document how can one reproduce training
Suggest a potential alternative/fix
Is it a reasonable thing for someone to rerun your stuff? is it possible? If so, can you document? (even documenting and saying it is complicated and one should prefer e.g. pythia is good to know...)
The text was updated successfully, but these errors were encountered: