Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update output command-line description #276

Closed
wants to merge 3 commits into from
Closed

Update output command-line description #276

wants to merge 3 commits into from

Conversation

bittremieux
Copy link
Collaborator

Fixes #241.

@bittremieux bittremieux requested a review from wsnoble December 25, 2023 11:48
@bittremieux bittremieux changed the base branch from main to dev December 25, 2023 11:49
Copy link

codecov bot commented Dec 25, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.54%. Comparing base (3c2d3f5) to head (ac60313).
Report is 109 commits behind head on dev.

Additional details and impacted files
@@           Coverage Diff           @@
##              dev     #276   +/-   ##
=======================================
  Coverage   89.54%   89.54%           
=======================================
  Files          12       12           
  Lines         918      918           
=======================================
  Hits          822      822           
  Misses         96       96           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@melihyilmaz melihyilmaz marked this pull request as draft January 9, 2024 17:23
@wsnoble
Copy link
Contributor

wsnoble commented Feb 22, 2024

How does Casanovo handle model file naming when fine-tuning a model? In this case, it seems like --model will be used to specify the name of the input file AND the output file. Problem! Do we need separate --model-in and --model-out options? Alternatively, is there some scheme whereby we name the trained model with the epoch number in the filename? If so, this should be documented here.

@melihyilmaz
Copy link
Collaborator

How does Casanovo handle model file naming when fine-tuning a model?

By default, checkpoint files are named in the epoch=40-step=2550000.ckpt format regardless of the input ckpt filename. This is the case for both fine-tuning and training from scratch.

@wsnoble
Copy link
Contributor

wsnoble commented Feb 22, 2024

OK, so this is obviously bad, because if I train two models from the same working directory, the files will overwrite one another. We should change this to be <model-name>.epoch=40.step=2500000.ckpt, where comes from the --model parameter. This behavior should also be documented.

@wsnoble
Copy link
Contributor

wsnoble commented Feb 28, 2024

I just ran into another problem which needs to be fixed, probably on this PR. It seems like Casanovo is trying to do something smart if I specify an --output option with a period in it. Anyway, the result is that if I do something like --output foo.bar, then the output is foo.log. Worse still, it seems like we have some logic in there that prevents it from overwriting existing files, but it does this silently. So I ran a big sequencing run using a name like "foo.filtered" but because "foo.mztab" and "foo.log" already existed, Casanovo simply ran to completion but did not produce any output files.

As we discussed, we should add an --overwrite flag that allows users to ovewrite existing files. Otherwise, Casanovo should halt with an error if it tries to overwrite an existing file. Ideally, when sequencing it should check for file existence before starting the run, rather than at the end.

The argument to --output should be used as the prefix for all output files, with no attempt to parse it. The argument may contain directory names, but we will make no attempt to verify that they exist.

@bittremieux
Copy link
Collaborator Author

Superseded by #372.

@bittremieux bittremieux deleted the output branch September 17, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix documentation of --output
3 participants