Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash when multiple beams have identical peptide scores #306

Merged
merged 4 commits into from
Feb 27, 2024

Conversation

bittremieux
Copy link
Collaborator

Fixes #271.

The problem was that if there are different beams, with different predicted amino acid sequences (i.e. tokens at this phase in the code), but an identical peptide score, beam caching fails. This likely occurs when there are multiple predictions that can't be distinguished, and thus only occurs when using multiple beams for ambiguous spectra.

The exact failure was because the information that is cached for each beam is a tuple of (peptide score, array of amino acid scores, array of amino acid tokens). When comparing tuples, first the first element is used, in case of ties the second element is used, and so on. In this case, the second element is an array, which doesn't have an obvious "truthy" value, leading to the observed error. However, we actually don't even want to compare those arrays, but compare beams based on the peptide score only.

This is now addressed by adding a completely random float as the second element in those tuples, before the array of amino acid scores. It's vanishingly unlikely that those random numbers would ever be equal, so in effect this arbitrarily breaks ties in case of equal peptide scores.

@bittremieux bittremieux linked an issue Feb 26, 2024 that may be closed by this pull request
Copy link

codecov bot commented Feb 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.77%. Comparing base (cd29e4b) to head (666132d).

Additional details and impacted files
@@           Coverage Diff           @@
##              dev     #306   +/-   ##
=======================================
  Coverage   89.77%   89.77%           
=======================================
  Files          12       12           
  Lines         929      929           
=======================================
  Hits          834      834           
  Misses         95       95           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@melihyilmaz melihyilmaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the removal of an existing unit test, looks good to me.

tests/unit_tests/test_unit.py Show resolved Hide resolved
@melihyilmaz melihyilmaz merged commit 6eabd6e into dev Feb 27, 2024
6 checks passed
@melihyilmaz melihyilmaz deleted the heappush branch February 27, 2024 16:50
bittremieux added a commit that referenced this pull request May 14, 2024
* Remove `train_from_scratch` config option (#275)

Instead of having to specify `train_from_scratch` in the config file, training will proceed from an existing model weights file if this is given as an argument to `casanovo train`.

Fixes #263.

* Stabilize torch.topk() behavior (#290)

* Add epsilon to index zero

* Fix typo

* Use base PyTorch for repeating along the vocabulary size

* Combine masking steps

* Lint with updated black version

* Lint test files

* Add topk unit test

* Fix lint

* Add fixme comment for future

* Update changelog

* Generate new screengrabs with rich-codex

---------

Co-authored-by: Wout Bittremieux <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Rename max_iters to cosine_schedule_period_iters (#300)

* Rename max_iters to cosine_schedule_period_iters

* Add deprecated config option unit test

* Fix missed rename

* Proper linting

* Remove unnecessary logging

* Test that checkpoints with deprecated config options can be loaded

* Minor change

* Add test for fine-tuning with deprecated config options

* Remove deprecated hyperparameters during model loading

* Include deprecated hyperparameter warning

* Test whether the warning is issued

* Verify that the deprecated option is removed

* Fix comments

* Avoid defining deprecated options twice

* Remap previous renamed config option `every_n_train_steps`

* Update changelog

---------

Co-authored-by: melihyilmaz <[email protected]>

* Add FAQ entry about antibody sequencing

* Don't crash when multiple beams have identical peptide scores (#306)

* Test different beams with identical scores

* Randomly break ties for beams with identical peptide score

* Update changelog

* Don't remove unit test

* Allow csv to handle all newlines (#316)

* Add 9-species model weights link to FAQ (#303)

* Add model weights link

* Generate new screengrabs with rich-codex

* Clarify that these weights should only be used for benchmarking

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wout Bittremieux <[email protected]>

* Add FAQ entry about antibody sequencing (#304)

* Add FAQ entry about antibody sequencing

* Generate new screengrabs with rich-codex

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Melih Yilmaz <[email protected]>

* Allow csv to handle all newlines

The `csv` module tries to handle newlines itself. On Windows, this leads to line endings of `\r\r\n` instead of `\r\n`.

Setting `newline=''` produces the intended output on both platforms.

* Update CHANGELOG.md

* Fix linting issue

* Delete docs/images/help.svg

---------

Co-authored-by: Melih Yilmaz <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wout Bittremieux <[email protected]>
Co-authored-by: William Stafford Noble <[email protected]>
Co-authored-by: Wout Bittremieux <[email protected]>

* Don't test on macOS versions with MPS (#327)

* Prepare for release v4.2.0

* Update CHANGELOG.md (#332)

---------

Co-authored-by: Melih Yilmaz <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: melihyilmaz <[email protected]>
Co-authored-by: wsnoble <[email protected]>
Co-authored-by: Joshua Klein <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Casanovo's 'sequence' mode crashes when n_beams=10
2 participants