Rewrite bash scripts into Python interfaces #2

rinapch · 2024-12-09T14:13:21Z

Changelog:

Rewrite 4 main .sh scripts needed for experiment reproduction to Python
Switch project to poetry
Remove hardcode from get_dataset function, now you just need to provide the name and the load method

Questions for discussion:

The main role of the 2 bash scripts is to just set up some variables names and run a Python subprocess with the specified variables. We can just call the Python subprocess directly, however, with a trade-off that we will need to specify a lot of the variables by hand in the CLI
Should I also convert the training script for LoRA (the get the final model finetuned on selected data)? I didn't use as we have our own training script, but may be useful?

less/data_selection/matching.py

mishaevtikhiev · 2024-12-12T09:53:19Z

less/scripts/get_info/grad/get_eval_lora_grads.py

+    parser = argparse.ArgumentParser()
+    parser.add_argument("--task", type=str, required=True, help="Task name (e.g. tydiqa, mmlu), will be used to store the gradients")
+    parser.add_argument("--data_dir", type=str, required=True, help="Path to data directory, can also be a full path or a HF repo name") 
+    parser.add_argument("--val_task_load_method", default=None,type=str, required=False, help="The method to load the validation data, can be 'hf', 'local_hf', 'local_json'")


Shouldn't this one be required? Looks like with None the script will fail at the dataset loading stage

You're right, with None it would fail (fixed it). I made it optional before because for hardcoded initial datasets like tydiqa we don't need to specify it, but realistically we will only run our own datasets, so it's required now

mishaevtikhiev · 2024-12-12T09:55:33Z

less/scripts/get_info/grad/get_train_lora_grads.py

+            "--model_path", model_checkpoint_path,
+            "--output_path", output_path,
+            "--gradient_projection_dimension", str(args.dims),
+            "--gradient_type", "adam"


Sanity check question: Adam and AdamW provide the same type of gradient data, am I right?

In this repo in particular AdamW is used across all computations (despite this variable being just "Adam") -- it's used for warming up the model and for getting the gradients for the train data. For obtaining the gradients of eval data, gradient_type is set to "sgd"

Here is the difference in computation:

SGD

def obtain_gradients(model, batch): """ obtain gradients. """ loss = model(**batch).loss loss.backward() vectorized_grads = torch.cat( [p.grad.view(-1) for p in model.parameters() if p.grad is not None]) return vectorized_grads

Adam(W)

def prepare_optimizer_state(model, optimizer_state, device): names = [n for n, p in model.named_parameters() if p.requires_grad] avg = torch.cat([optimizer_state[n]["exp_avg"].view(-1) for n in names]) avg_sq = torch.cat([optimizer_state[n]["exp_avg_sq"].view(-1) for n in names]) avg = avg.to(device) avg_sq = avg_sq.to(device) return avg, avg_sq def obtain_gradients_with_adam(model, batch, avg, avg_sq): """ obtain gradients with adam optimizer states. """ beta1 = 0.9 beta2 = 0.999 eps = 1e-08 loss = model(**batch).loss loss.backward() vectorized_grads = torch.cat( [p.grad.view(-1) for n, p in model.named_parameters() if p.grad is not None]) updated_avg = beta1 * avg + (1 - beta1) * vectorized_grads updated_avg_sq = beta2 * avg_sq + (1 - beta2) * vectorized_grads ** 2 vectorized_grads = updated_avg / torch.sqrt(updated_avg_sq + eps) return vectorized_grads

Looking at this, I think in principle this code would work with a normal Adam, but the gradient values produces by Adam and AdamW would be different

mishaevtikhiev · 2024-12-12T09:56:59Z

less/scripts/train/warmup_lora_train.py

+]
+
+    # Add FSDP config for large models
+    if model_name_or_path == "meta-llama/Llama-2-13b-hf":


This is the original code from their .sh script, right?

Yes, it seems like models larger than llama-7B would not fit into their GPUs (relatable) so they needed to enable FSDP for larger experiments

rinapch added 10 commits December 7, 2024 18:56

switch project to poetry

f7eb790

add warmup lora script

7c00498

add warmup lora script

f09a6cc

rewrite using argparse

67f687b

rewrite get_train_lora_gards.sh

6507012

rewrite get_eval_lora_grads.sh

03055f6

readme formatting

4140be0

README formatting

7e3534e

remove unnecessary code in readme

c572b46

fix all bugs found during testing

61e6477

rinapch changed the title ~~[WIP] Rewrite bash scripts into Python interfaces~~ Rewrite bash scripts into Python interfaces Dec 11, 2024

disambiguate variable name

16fc5f9

mishaevtikhiev approved these changes Dec 12, 2024

View reviewed changes

make val method not optional

ecb9b95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite bash scripts into Python interfaces #2

Rewrite bash scripts into Python interfaces #2

rinapch commented Dec 9, 2024 •

edited

Loading

mishaevtikhiev Dec 12, 2024

rinapch Dec 12, 2024

mishaevtikhiev Dec 12, 2024

rinapch Dec 12, 2024 •

edited

Loading

mishaevtikhiev Dec 12, 2024

rinapch Dec 12, 2024 •

edited

Loading

Rewrite bash scripts into Python interfaces #2

Are you sure you want to change the base?

Rewrite bash scripts into Python interfaces #2

Conversation

rinapch commented Dec 9, 2024 • edited Loading

mishaevtikhiev Dec 12, 2024

Choose a reason for hiding this comment

rinapch Dec 12, 2024

Choose a reason for hiding this comment

mishaevtikhiev Dec 12, 2024

Choose a reason for hiding this comment

rinapch Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

mishaevtikhiev Dec 12, 2024

Choose a reason for hiding this comment

rinapch Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

rinapch commented Dec 9, 2024 •

edited

Loading

rinapch Dec 12, 2024 •

edited

Loading

rinapch Dec 12, 2024 •

edited

Loading