You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a new URL where we could download these global_indices.npy files for the pre-trained models, or do we have to rebuild the global_indices.npy locally? Also, are there any tools to sample training data for a particular batch without downloading the entire training corpus?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
❓ The question
I would like to inspect training data in a particular batch as described here. I also noticed that there is another script inspect_train_data.py doing similar things. Unfortunately, both scripts require the
global_indices.npy
file, which is no longer available on the server, e.g., for the OLMo-7B model, https://olmo-checkpoints.org/ai2-llm/olmo-medium/wvc30anm/train_data/global_indices.npy is no longer accessible.Is there a new URL where we could download these
global_indices.npy
files for the pre-trained models, or do we have to rebuild theglobal_indices.npy
locally? Also, are there any tools to sample training data for a particular batch without downloading the entire training corpus?Thanks in advance!
The text was updated successfully, but these errors were encountered: