Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ladder 1xC #677

Open
wants to merge 266 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
266 commits
Select commit Hold shift + click to select a range
7843207
Jointly fit N and D; Switch to Huber loss and L-BFGS
liujch1998 Jul 30, 2024
e4d5953
add 530M
AkshitaB Aug 1, 2024
014a2f6
Fix 700m model size
liujch1998 Aug 2, 2024
474705b
Merge branch 'main' into ladder-1xC
AkshitaB Aug 5, 2024
0eb302d
make it amberish
AkshitaB Aug 5, 2024
bde12b6
update device batch size
AkshitaB Aug 5, 2024
ce126fc
smaller device batch
AkshitaB Aug 5, 2024
d6bbf23
add 3b
AkshitaB Aug 5, 2024
6d5abb4
fsdp
AkshitaB Aug 5, 2024
d344502
Merge branch 'scaling-laws' into ladder-1xC
AkshitaB Aug 6, 2024
25de998
isort scripts
AkshitaB Aug 6, 2024
58989b9
make batch size configurable
AkshitaB Aug 6, 2024
e2b4a15
notebook
AkshitaB Aug 6, 2024
85356d8
update priority to high
AkshitaB Aug 6, 2024
73cea01
device batch size arg
AkshitaB Aug 6, 2024
8b9673f
bug fix
AkshitaB Aug 7, 2024
d5f6e5d
expected flops
AkshitaB Aug 7, 2024
13640f2
somewhat hacky fix
AkshitaB Aug 7, 2024
18e062c
Add 300m curve and remove 20m curve; Improve data; Tweak loss and opt…
liujch1998 Aug 18, 2024
053b4e7
New correction term
liujch1998 Aug 18, 2024
bc69b11
Merge branch 'main' into ladder-1xC
liujch1998 Aug 23, 2024
86223e1
Add bpb evals (less csqa and siqa); Change eval_interval to 100
liujch1998 Aug 23, 2024
bf62545
Remove sciq_rc_0shot_bpb
liujch1998 Aug 23, 2024
ae394bc
Add power_correction; Add fitting to final loss
liujch1998 Aug 26, 2024
21a5180
add notebook
AkshitaB Aug 27, 2024
8701299
Update CHANGELOG.md
AkshitaB Aug 27, 2024
acd21d0
Merge branch 'scaling-laws' into ladder-1xC
AkshitaB Aug 28, 2024
e3e02ec
Loss curve fitting
liujch1998 Aug 28, 2024
514910d
Add tissue function fitting
liujch1998 Aug 28, 2024
74d61f1
Merge branch 'scaling-laws' of github.com:allenai/OLMo into scaling-laws
liujch1998 Aug 28, 2024
6d80e93
Merge branch 'scaling-laws' into ladder-1xC
liujch1998 Aug 28, 2024
9e156d2
move ladder to olmo so that functions can be imported
AkshitaB Aug 28, 2024
990a38b
add parse_run_name
AkshitaB Aug 28, 2024
e180bea
add flops based plots
AkshitaB Aug 29, 2024
1134317
Split loss curve figures
liujch1998 Aug 29, 2024
d63622d
simplify plots for final loss prediction
AkshitaB Aug 29, 2024
6a8bac9
Merge remote-tracking branch 'origin/main' into ladder-1xC
liujch1998 Aug 31, 2024
d10fc35
Add 5-shot evals to ladder
liujch1998 Aug 31, 2024
4b01f8a
Change eval interval to 200
liujch1998 Aug 31, 2024
de8721b
Fix ladder path
liujch1998 Aug 31, 2024
6d6eeb9
Fix ladder bug
liujch1998 Aug 31, 2024
1bff5f8
Move ladder.py back to scripts/
liujch1998 Aug 31, 2024
44a863a
Change save interval to 200
liujch1998 Aug 31, 2024
2b37b33
Add 5-shot evals
liujch1998 Sep 3, 2024
cc5316c
add params, add bpb to task score
AkshitaB Sep 3, 2024
f7fb4ea
fix lint
AkshitaB Sep 3, 2024
762be69
ruff check
AkshitaB Sep 3, 2024
d9aff15
Make alpha_f configurable
liujch1998 Sep 3, 2024
7ac4191
cleanup notebooks, add downloads for downstream
AkshitaB Sep 4, 2024
8ceb4c0
piqa and mmlu
AkshitaB Sep 4, 2024
7e4886b
add remaining downstream metrics
AkshitaB Sep 5, 2024
b74a021
Final loss: allow averaging over last N ckpts; Loss curve: plotting r…
liujch1998 Sep 5, 2024
1447fea
Make num_to_avg a CLI arg
liujch1998 Sep 5, 2024
ec7b875
fix names
AkshitaB Sep 5, 2024
29f7c04
Make save/eval_intervals CLI arguments in ladder
liujch1998 Sep 5, 2024
6e0b790
Update to amberish-rulebased
liujch1998 Sep 8, 2024
0c816eb
Import Vera evals
liujch1998 Sep 8, 2024
7f30694
Update amberish1 configs
liujch1998 Sep 8, 2024
6d690e4
Fix amberish1 launch script
liujch1998 Sep 8, 2024
249fe1e
Fix amberish1 config to match old run
liujch1998 Sep 8, 2024
8424f46
update downstream notebook
AkshitaB Sep 11, 2024
f6466f8
final plots
AkshitaB Sep 11, 2024
f15af87
Add newline format of MMLU
liujch1998 Sep 11, 2024
2ee876d
Debugging inf bpb
liujch1998 Sep 11, 2024
b3f3934
Fix off-by-one error in byte len computation
liujch1998 Sep 12, 2024
c3e4d43
Bug fix on MMLU newline
liujch1998 Sep 12, 2024
f2f3a30
Update plotting
liujch1998 Sep 15, 2024
ddf9dbd
Update amberish1 training and wandb download
liujch1998 Sep 15, 2024
48b17e2
Add newline format for oe-eval tasks
liujch1998 Sep 15, 2024
499ac18
Bug fix: Switch order of OEEvalTask and OEEvalTaskWithNewlin
liujch1998 Sep 15, 2024
df15f3c
Fix off-by-one error in byte len computation
liujch1998 Sep 15, 2024
7b5f1c3
Fix wandb download
liujch1998 Sep 15, 2024
2797ac8
fix arc c
AkshitaB Sep 16, 2024
ae2e3e6
temp notebook to debug step1 predictions
AkshitaB Sep 16, 2024
c2d7ff6
update
AkshitaB Sep 16, 2024
630378a
make print optional
AkshitaB Sep 17, 2024
6916410
downstream notebooks
AkshitaB Sep 17, 2024
898c148
updated notebooks
AkshitaB Sep 17, 2024
89e2f2a
Update notebooks: Made ideal points and p0 task-specific; Added avg u…
liujch1998 Sep 19, 2024
e5b3fb4
Update notebooks: Add back sciq; Remove moving average for score
liujch1998 Sep 20, 2024
7ce1424
Update plotting
liujch1998 Sep 25, 2024
31aa023
Add Peteish ladder code
liujch1998 Sep 25, 2024
790ae81
Fix weka path of data and checkpoints
liujch1998 Sep 25, 2024
a9fe456
Fix save folder
liujch1998 Sep 25, 2024
42e28a5
wip: stacked predictions
AkshitaB Sep 25, 2024
172bd2e
Add --batch_size_divisor option
liujch1998 Sep 26, 2024
9d67aae
Update param count
liujch1998 Sep 26, 2024
1f1fa6a
Update model names
liujch1998 Sep 26, 2024
eb627dd
clean stacked predictions notebook, token-based hard setting
AkshitaB Sep 27, 2024
3bef95f
Update peteish config
liujch1998 Sep 30, 2024
b54d802
deal with 4k seq len
AkshitaB Sep 30, 2024
bc9d770
add temp failsafe
AkshitaB Oct 1, 2024
1bfb456
debug
AkshitaB Oct 2, 2024
c1f2cd9
revert
AkshitaB Oct 2, 2024
2840367
debug statements
AkshitaB Oct 2, 2024
37dfe5e
fix
AkshitaB Oct 2, 2024
9ce96e6
fix again
AkshitaB Oct 2, 2024
0689fe8
additional sanity check
AkshitaB Oct 2, 2024
8670241
forward slash
AkshitaB Oct 2, 2024
c027e36
revert
AkshitaB Oct 2, 2024
d1b5838
weird slash issue
AkshitaB Oct 2, 2024
3d0c901
weird slash issue 2
AkshitaB Oct 2, 2024
ae55c7c
revert
AkshitaB Oct 2, 2024
3baee24
print save_folder
AkshitaB Oct 2, 2024
7881c9c
fix
AkshitaB Oct 2, 2024
2218d82
fix down the line instead
AkshitaB Oct 2, 2024
9d31a0c
more logging
AkshitaB Oct 2, 2024
9afc8f2
fix ladder code for s3
AkshitaB Oct 2, 2024
2fc3e05
save sequentially
AkshitaB Oct 2, 2024
5ead531
fix backwards compatibility
AkshitaB Oct 2, 2024
0445efd
increase retries to be safe
AkshitaB Oct 2, 2024
bae32b7
add a normal priority script
AkshitaB Oct 4, 2024
53d50ad
executable
AkshitaB Oct 4, 2024
51dc7dd
move to script
AkshitaB Oct 16, 2024
f453883
fix formatting, etc
AkshitaB Oct 16, 2024
538d330
more lint fixes
AkshitaB Oct 16, 2024
f0d6c92
Merge branch 'main' into ladder-1xC
AkshitaB Oct 16, 2024
16698a1
move file
AkshitaB Oct 16, 2024
f179641
remove old notebooks
AkshitaB Oct 16, 2024
ea63c8c
fix linting in notebooks
AkshitaB Oct 17, 2024
e7b59de
eval scripts
AkshitaB Oct 18, 2024
2f2ed09
simplify
AkshitaB Oct 18, 2024
bfb4c8b
change workspace
AkshitaB Oct 18, 2024
93083c2
add missing script
AkshitaB Oct 18, 2024
3929e20
update downstream scripts
AkshitaB Oct 22, 2024
654c60c
configure last n points
AkshitaB Oct 22, 2024
c6f4495
updates
AkshitaB Oct 22, 2024
5f4e26f
Fix PPL val data
liujch1998 Oct 25, 2024
a89c9d8
Add WSD schedule
liujch1998 Oct 25, 2024
9f389ed
Update
liujch1998 Oct 25, 2024
efe2bac
Fix ladder_peteish.sh
liujch1998 Oct 25, 2024
9b47dad
Debug NCCL timeout
liujch1998 Oct 27, 2024
93d2c2a
Changing to another port
liujch1998 Oct 28, 2024
283caaa
Try c10d backedn
liujch1998 Oct 28, 2024
bd0065d
Remove printing peak_gpu_memory()
liujch1998 Oct 28, 2024
72b1173
Debug peak_gpu_memory()
liujch1998 Oct 28, 2024
0da84ef
Inspecting barrier
liujch1998 Oct 28, 2024
1b8192d
Inspecting barrier
liujch1998 Oct 28, 2024
d4a1a04
Inspecting barrier
liujch1998 Oct 28, 2024
d864fbc
Disable TORCH_DIST_INIT_BARRIER
liujch1998 Oct 28, 2024
3f6a552
Inspecting barrier
liujch1998 Oct 28, 2024
7b71d42
Cleanup debugging stuff
liujch1998 Oct 28, 2024
a40ed1e
Peteish curve fitting
liujch1998 Nov 5, 2024
4cd8bc1
simplify tasks
AkshitaB Nov 7, 2024
030ab17
fix lint
AkshitaB Nov 7, 2024
163d3be
flops baseline
AkshitaB Nov 11, 2024
caad5c3
flops script
AkshitaB Nov 11, 2024
446107d
results for multiple tasks
AkshitaB Nov 11, 2024
b6169f3
stacked script
AkshitaB Nov 13, 2024
1b8959a
refactor
AkshitaB Nov 13, 2024
dca97a7
add paper configs
AkshitaB Nov 13, 2024
a21992b
Massage for plotting
liujch1998 Nov 13, 2024
aa036e7
remove old code
AkshitaB Nov 13, 2024
6e68283
Remove unused vera eval data
liujch1998 Nov 14, 2024
f35a4b2
Move things around a bit
liujch1998 Nov 14, 2024
15d6cea
update
AkshitaB Nov 14, 2024
c10deab
Merge branch 'main' into ladder-1xC
AkshitaB Nov 14, 2024
ce15bae
variance analysis
AkshitaB Nov 14, 2024
7edf3ca
fix
AkshitaB Nov 14, 2024
662db73
bug fix
AkshitaB Nov 14, 2024
bad8674
Add a single-step prediction script
liujch1998 Nov 15, 2024
04eed46
Add a predict.py script that can predict for settings which we don't …
liujch1998 Nov 15, 2024
ab97e87
Move amberish and older stuff into a deeper folder
liujch1998 Nov 15, 2024
0b4800a
Support predicting MC
liujch1998 Nov 16, 2024
19ce054
Predict for Peteish13
liujch1998 Nov 17, 2024
26f733d
Merge remote-tracking branch 'origin/main' into ladder-1xC
liujch1998 Nov 19, 2024
d7e0476
Update eval script (sync from backfill branch)
liujch1998 Nov 19, 2024
b5327a0
Lint things
liujch1998 Nov 19, 2024
6e254d7
Merge remote-tracking branch 'origin/main' into ladder-1xC
liujch1998 Nov 19, 2024
643c1ed
Add new evals to ladder
liujch1998 Nov 19, 2024
cb3e938
Remove eval on train set
liujch1998 Nov 19, 2024
8951dbb
Make device_eval_batch_size an argument
liujch1998 Nov 19, 2024
272ef4a
Remove MC and var evals
liujch1998 Nov 19, 2024
9b80ad9
merge
AkshitaB Nov 15, 2024
e5ebb23
prediction interval bounds
AkshitaB Nov 19, 2024
6e3a1c2
separate functions
AkshitaB Nov 19, 2024
4cd12ef
flops analysis
AkshitaB Nov 19, 2024
ea01e28
add fitting error
AkshitaB Nov 20, 2024
36539f7
variance analysis update
AkshitaB Nov 20, 2024
74e44bd
mark target and pred clearly
AkshitaB Nov 20, 2024
e88233a
add pred intervals, but commented
AkshitaB Nov 20, 2024
1233662
Update single-step prediction
liujch1998 Nov 21, 2024
6c78b4d
Add flags to debug NCCL error
liujch1998 Nov 21, 2024
b79c6c0
Bake bpb and soft_score computation into len_norm, saves compute
liujch1998 Nov 21, 2024
2a851bf
Remove old task evals
liujch1998 Nov 21, 2024
2c51d2e
Increase NCCL timeout to 30min
liujch1998 Nov 21, 2024
5b91f0f
more bounds
AkshitaB Nov 21, 2024
a508eb9
Add ladder eval stuff
liujch1998 Nov 21, 2024
09b5a8e
Upgrade to urgent
liujch1998 Nov 21, 2024
5b7a894
Eval peteish7 and peteish13 with new eval
liujch1998 Nov 22, 2024
6b1a7bc
Add new peteish ladder data (up to 5xC)
liujch1998 Nov 22, 2024
13a0fc4
Support fitting and prediction on new evals
liujch1998 Nov 22, 2024
72e8e7b
Eval all ckpts of peteish7
liujch1998 Nov 24, 2024
694de79
eval.py bug fix
liujch1998 Nov 24, 2024
3e9608f
Remove old peteish7 eval tasks
liujch1998 Nov 24, 2024
4250a8d
Change eval to every 5000 steps
liujch1998 Nov 24, 2024
d07617f
Add eval_hf.py that evaluates external models
liujch1998 Nov 24, 2024
f77945e
Add 10xC wandb logs
liujch1998 Nov 24, 2024
8cc954a
Misc improvements
liujch1998 Nov 25, 2024
c2ab4da
Fix lint
liujch1998 Nov 25, 2024
f2df10b
Chinchilla: improve numeric stability
CodeCreator Nov 25, 2024
1b825ae
Log sigmoid fitting functions
CodeCreator Nov 25, 2024
e643734
Revert changing eval to every 5000 steps; Eval final 10 ckpts of pete…
liujch1998 Nov 25, 2024
f65c752
Add task display name information
CodeCreator Nov 25, 2024
9dfc15e
step1: improve plot layout
CodeCreator Nov 25, 2024
bdaccec
step1: improve plot layout
CodeCreator Nov 25, 2024
c04ae4a
log sigmoid fitting
CodeCreator Nov 25, 2024
9bd336b
Merge branch 'ladder-1xC-task_ce' into ladder-1xC
CodeCreator Nov 25, 2024
678519d
Change .png to .pdf and adjust padding between subplots
CodeCreator Nov 25, 2024
af1c54f
step1 & step2: Increase font size
CodeCreator Nov 25, 2024
dbafaf3
Fix the step 2 minor difference
liujch1998 Nov 25, 2024
8a9185e
Step1&2 predictions with task cross entropy
CodeCreator Nov 26, 2024
be5a22b
Step2: also take moving average of target model
liujch1998 Nov 26, 2024
e06ffc4
Figure aesthetics
liujch1998 Nov 26, 2024
cba738c
Chained figure
liujch1998 Nov 26, 2024
5f49408
combined pred var analysis
AkshitaB Nov 26, 2024
67cc7c4
figure 1
AkshitaB Nov 26, 2024
eae0365
save predictions
AkshitaB Nov 26, 2024
05c9127
Eval peteish13-highlr intermediate
liujch1998 Nov 26, 2024
86c5dc8
Add new evals to peteish13-google.yaml
liujch1998 Nov 26, 2024
eff892e
Continue evaluating
liujch1998 Nov 27, 2024
fefa639
Eval peteish13 ckpts on WEKA
liujch1998 Nov 27, 2024
0d98dc9
Eval peteish13 ckpts on WEKA
liujch1998 Nov 27, 2024
0c972d7
Eval peteish13 ckpts on WEKA
liujch1998 Nov 27, 2024
06bdbce
Eval peteish13 ckpts on S3
liujch1998 Nov 27, 2024
a28e283
Revert ad-hoc patches
liujch1998 Nov 27, 2024
f726a5e
Improve step2 mc
liujch1998 Nov 27, 2024
996c09f
Eval peteish7-medlr
liujch1998 Nov 27, 2024
7602724
Eval peteish7-medlr
liujch1998 Nov 27, 2024
821e56f
add variance analysis
davidheineman Nov 27, 2024
8c9f995
small typo fix
davidheineman Nov 27, 2024
6a9018f
Eval peteish7-medlr
liujch1998 Nov 28, 2024
1c746ff
Eval peteish7-medlr
liujch1998 Nov 29, 2024
52ad827
compute vs error analysis script
AkshitaB Dec 2, 2024
1db9034
add calls
AkshitaB Dec 2, 2024
57c1fbd
script for flops computation
AkshitaB Dec 3, 2024
4dc93f7
add output
AkshitaB Dec 3, 2024
3579833
Support 0.5xC
liujch1998 Dec 3, 2024
4d53f72
bug fix
AkshitaB Dec 3, 2024
d930b18
isort/black, etc
AkshitaB Dec 3, 2024
dc115bc
Update single_step, plotting
liujch1998 Dec 3, 2024
b7ab4a2
update variance analysis fig
davidheineman Dec 3, 2024
b1b8f82
reorganize variance analysis
davidheineman Dec 3, 2024
354078c
improve plot
AkshitaB Dec 4, 2024
41229be
full chained preds with flops
AkshitaB Dec 4, 2024
ec5d22b
bug fix
AkshitaB Dec 4, 2024
7e15ec6
predict.py: fix log sigmoid fitting
CodeCreator Dec 4, 2024
e030fc5
add option to not run prediction
davidheineman Dec 4, 2024
7cb9721
Revert ad-hoc patches
liujch1998 Dec 9, 2024
4c37f5b
Update 3B and 0.5xC data; Misc changes
liujch1998 Dec 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,4 @@ site/
/wandb/
/scratch/
core
/figure/
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added downstream eval task for requests dumped from oe-eval tasks
- Added `CosLinearEnvelope` scheduler, which is a pointwise product of a cosine schedule and a linear decay.
- Added ability to save outputs of submodules for debugging purposes.
- Added scripts and notebooks for predicting loss from power laws.
- Added a number of tasks from oe-eval to the downstream eval tasks.
- Version dolma flan change in named_data_mix.py

Expand Down Expand Up @@ -80,6 +81,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added ability to load tokenizers from `olmo_data` package data.
- Added a script that can run a series of models with predictable scaling properties.


### Changed

- Added original legacy unsharding implementation back, as the default. The new
Expand Down
Loading