Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborting training should delete output folder #12

Open
l0rinc opened this issue Apr 19, 2023 · 3 comments
Open

Aborting training should delete output folder #12

l0rinc opened this issue Apr 19, 2023 · 3 comments

Comments

@l0rinc
Copy link
Contributor

l0rinc commented Apr 19, 2023

Screenshot 2023-04-19 at PM 4 55 11

@zetavg
Copy link
Owner

zetavg commented Apr 19, 2023

Actually, this is somehow intended to force each train, once started, to have a unique name, which can solve a few things:

  • Ensures unique run name on Wandb (the model name is used as run name).
  • Avoid issues with model caching since the model name is used as the cache key (?).

It also has some benefits by not deleting the output folder for an aborted train:

  • Fine-tuning parameters stored in the output folder can be preserved, which can be loaded back to start the next train conveniently.
  • Checkpoints are preserved, making it possible to resume the training.

@l0rinc
Copy link
Contributor Author

l0rinc commented Apr 19, 2023

I personally just abort when I see something behaving differently - and rm -rfd from colab's terminal manually.
Maybe a "would you like to overwrite" or "rename old one" could make sense - if you don't think that's a good idea, please close the issue :)

@zetavg
Copy link
Owner

zetavg commented Apr 25, 2023

This makes sense, I'll add this along with the CLI interface!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants