Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate evaluate Command #359

Merged
merged 45 commits into from
Aug 21, 2024
Merged

Eliminate evaluate Command #359

merged 45 commits into from
Aug 21, 2024

Conversation

Lilferrit
Copy link
Contributor

Eliminated the evaluate command in favor of a --evaluate command line option for the sequence command. Evaluation metrics will still be logged to the console as before if the --evaluate options is set. The model (Spec2Pep) will also log predictions to its out_writer in validation mode similar as it does in prediction mode.

@Lilferrit Lilferrit requested a review from bittremieux July 31, 2024 18:50
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but I find the changes in model.py a bit messy and unsatisfying.

If the goal of the evaluation is to get the peptide and amino acid metrics, can this not be simplified by:

  • First just do standard predictions.
  • After the whole inference has been finished, you then have all of the peptides for each spectrum. Then (at a higher level than inside of the model) you can read the peptide sequences from the annotated MGF separately, and compare these to each other.
  • This removes all of the validation related complexity from the model and should simplify the flow of the data considerably.
  • The evaluation part is also more maintainable, and it would for example be much easier to add another data source for evaluation (e.g. an mzTab or CSV file with "ground truth" rather than an annotated MGF).

Note that it's slightly different from the current validation approach, because that also gives the loss, which you wouldn't have in this approach. However, it seems to me that the loss is not that informative anyway, and not something a user would expect to get when specifying evaluate during prediction.

The best approach to tackle this should probably be discussed.

casanovo/casanovo.py Outdated Show resolved Hide resolved
casanovo/casanovo.py Show resolved Hide resolved
casanovo/denovo/dataloaders.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
@bittremieux bittremieux linked an issue Aug 2, 2024 that may be closed by this pull request
@Lilferrit
Copy link
Contributor Author

This works, but I find the changes in model.py a bit messy and unsatisfying.

If the goal of the evaluation is to get the peptide and amino acid metrics, can this not be simplified by:

  • First just do standard predictions.
  • After the whole inference has been finished, you then have all of the peptides for each spectrum. Then (at a higher level than inside of the model) you can read the peptide sequences from the annotated MGF separately, and compare these to each other.
  • This removes all of the validation related complexity from the model and should simplify the flow of the data considerably.
  • The evaluation part is also more maintainable, and it would for example be much easier to add another data source for evaluation (e.g. an mzTab or CSV file with "ground truth" rather than an annotated MGF).

Note that it's slightly different from the current validation approach, because that also gives the loss, which you wouldn't have in this approach. However, it seems to me that the loss is not that informative anyway, and not something a user would expect to get when specifying evaluate during prediction.

The best approach to tackle this should probably be discussed.

I agree that this approach makes more sense. Tentatively what I'm thinking is that ModelRunner should handle the validation metrics calculation using the model output and the spectra annotations, using the functionality in evaluate.py. This would probably leave the actual model itself unchanged.

bittremieux and others added 4 commits August 6, 2024 08:47
* bug report template

* punctuation, hardware description item

* Restrict NumPy to pre-2.0 (#344)

* Restrict NumPy to pre-2.0

* Update changelog

* Update paper reference (#361)

---------

Co-authored-by: Lilferrit <[email protected]>
@Lilferrit
Copy link
Contributor Author

Lilferrit commented Aug 6, 2024

I reimplemented evaluate mode by having ModelRunner handle the evaluation instead of using the validation functionality of the Spec2Pep model itself. My only concern at this point is how to go about testing the evaluation metric calculations. The simplest solution I can think of is to have log_metrics also return the aa_precision and pep_precision in addition to the logging operation. From there I could introduce some unit tests.

Copy link

codecov bot commented Aug 6, 2024

Codecov Report

Attention: Patch coverage is 97.22222% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.26%. Comparing base (a46f995) to head (2d882b7).
Report is 2 commits behind head on dev.

Files Patch % Lines
casanovo/denovo/model_runner.py 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #359      +/-   ##
==========================================
+ Coverage   94.03%   94.26%   +0.23%     
==========================================
  Files          12       12              
  Lines        1022     1029       +7     
==========================================
+ Hits          961      970       +9     
+ Misses         61       59       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern at this point is how to go about testing the evaluation metric calculations. The simplest solution I can think of is to have log_metrics also return the aa_precision and pep_precision in addition to the logging operation. From there I could introduce some unit tests.

I don't think that we need to check the actual metric values. We have dedicated unit tests that already do that.

What should be added is some tests that verify correct behavior without/with evaluation specified based on different types of input files (annotated vs simple MGF, mzML).

casanovo/casanovo.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
casanovo/denovo/model_runner.py Show resolved Hide resolved
casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
@Lilferrit
Copy link
Contributor Author

What should be added is some tests that verify correct behavior without/with evaluation specified based on different types of input files (annotated vs simple MGF, mzML).

Sounds good, I'll look into getting some tests implemented for these cases.

@Lilferrit
Copy link
Contributor Author

I did some experimenting while trying to set up the test cases, and it looks like in the current implementation of ModelRunner._get_index when the argument annotated is set to true unannotated peak files (for the most part) are silently ignored. From a quick glance this ultimately stems from the behavior of _get_peak_filenames. However, in the case of unannotated mgf files it looks like there is an uncaught exception from the Depth Charge library that gets kicked up to the ModelRunner when attempting to construct an AnnotatedSpectrumIndex. I've added the error itself to the end of this comment.

Imo silently ignoring unannotated files is not desirable behavior in the case of running model evaluation post sequencing (this also means that the unannotated files would simply not get sequenced). The best way that comes to mind of getting around this issue is to check that all of the peak files are annotated before sequencing begins, and throwing an exception if any of them aren't. However I'm not sure if there's a quick and easy way to do this.

I put the test cases in progress on the branch elim-eval-tests if anyone wants to take a look.

casanovo\denovo\model_runner.py:183: in predict
    test_index = self._get_index(peak_path, evaluate, "")
casanovo\denovo\model_runner.py:420: in _get_index
    return Index(index_fname, filenames, valid_charge=valid_charge)
D:\anaconda3\envs\casanovo_env\lib\site-packages\depthcharge\data\hdf5.py:446: in __init__
    super().__init__(
D:\anaconda3\envs\casanovo_env\lib\site-packages\depthcharge\data\hdf5.py:104: in __init__
    self.add_file(ms_file)
D:\anaconda3\envs\casanovo_env\lib\site-packages\depthcharge\data\hdf5.py:231: in add_file
    group.create_dataset(
D:\anaconda3\envs\casanovo_env\lib\site-packages\h5py\_hl\group.py:183: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
D:\anaconda3\envs\casanovo_env\lib\site-packages\h5py\_hl\dataset.py:166: in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
h5py\\_objects.pyx:54: in h5py._objects.with_phil.wrapper
    ???
h5py\\_objects.pyx:55: in h5py._objects.with_phil.wrapper
    ???
h5py\\h5d.pyx:282: in h5py.h5d.DatasetID.write
    ???
h5py\\_proxy.pyx:147: in h5py._proxy.dset_rw
    ???
h5py\\_conv.pyx:442: in h5py._conv.str2vlen
    ???
h5py\\_conv.pyx:96: in h5py._conv.generic_converter
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: Can't implicitly convert non-string objects to strings

h5py\\_conv.pyx:247: TypeError

@bittremieux
Copy link
Collaborator

Imo silently ignoring unannotated files is not desirable behavior in the case of running model evaluation post sequencing (this also means that the unannotated files would simply not get sequenced).

I agree.

The best way that comes to mind of getting around this issue is to check that all of the peak files are annotated before sequencing begins, and throwing an exception if any of them aren't. However I'm not sure if there's a quick and easy way to do this.

That's tricky, because there's indeed no elegant way to do this, so I'm not really in favor of trying to hack this in.

Instead, giving better error messages is a good starting point. Then at least users will know what the problem is and how to fix it if they want to run evaluation.

Lilferrit and others added 22 commits August 9, 2024 13:30
* save best model

* save best model

* updated unit tests

* remove save top k config item

* added save_top_k to deprecated config options

* changelog entry

* test case, formatting

* requested changes
* bug report template

* punctuation, hardware description item

* Restrict NumPy to pre-2.0 (#344)

* Restrict NumPy to pre-2.0

* Update changelog

* Update paper reference (#361)

---------

Co-authored-by: Lilferrit <[email protected]>
@Lilferrit
Copy link
Contributor Author

I added some light error handling such that if the AnnotatedSpectrumIndex throws a TypeError an additional error message will be thrown pointing the user in the direction of there being an unannotated MGF file in the validation peak path list. I think this provides a nice balance between diagnosing the reason for the error without relying too heavily on the underlying implementation of Depth Charge.

@Lilferrit Lilferrit requested a review from bittremieux August 15, 2024 04:27
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more final tweaks.

casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved
tests/test_integration.py Outdated Show resolved Hide resolved
tests/test_integration.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@Lilferrit Lilferrit requested a review from bittremieux August 21, 2024 05:33
@Lilferrit Lilferrit merged commit 67939b8 into dev Aug 21, 2024
7 checks passed
@Lilferrit Lilferrit deleted the elim-eval branch August 21, 2024 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eliminate the eval command
2 participants