Don't crash when multiple beams have identical peptide scores #306

bittremieux · 2024-02-26T08:24:43Z

Fixes #271.

The problem was that if there are different beams, with different predicted amino acid sequences (i.e. tokens at this phase in the code), but an identical peptide score, beam caching fails. This likely occurs when there are multiple predictions that can't be distinguished, and thus only occurs when using multiple beams for ambiguous spectra.

The exact failure was because the information that is cached for each beam is a tuple of (peptide score, array of amino acid scores, array of amino acid tokens). When comparing tuples, first the first element is used, in case of ties the second element is used, and so on. In this case, the second element is an array, which doesn't have an obvious "truthy" value, leading to the observed error. However, we actually don't even want to compare those arrays, but compare beams based on the peptide score only.

This is now addressed by adding a completely random float as the second element in those tuples, before the array of amino acid scores. It's vanishingly unlikely that those random numbers would ever be equal, so in effect this arbitrarily breaks ties in case of equal peptide scores.

codecov · 2024-02-26T08:28:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.77%. Comparing base (cd29e4b) to head (666132d).

Additional details and impacted files

@@           Coverage Diff           @@
##              dev     #306   +/-   ##
=======================================
  Coverage   89.77%   89.77%           
=======================================
  Files          12       12           
  Lines         929      929           
=======================================
  Hits          834      834           
  Misses         95       95

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

melihyilmaz

Other than the removal of an existing unit test, looks good to me.

tests/unit_tests/test_unit.py

* Remove `train_from_scratch` config option (#275) Instead of having to specify `train_from_scratch` in the config file, training will proceed from an existing model weights file if this is given as an argument to `casanovo train`. Fixes #263. * Stabilize torch.topk() behavior (#290) * Add epsilon to index zero * Fix typo * Use base PyTorch for repeating along the vocabulary size * Combine masking steps * Lint with updated black version * Lint test files * Add topk unit test * Fix lint * Add fixme comment for future * Update changelog * Generate new screengrabs with rich-codex --------- Co-authored-by: Wout Bittremieux <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Rename max_iters to cosine_schedule_period_iters (#300) * Rename max_iters to cosine_schedule_period_iters * Add deprecated config option unit test * Fix missed rename * Proper linting * Remove unnecessary logging * Test that checkpoints with deprecated config options can be loaded * Minor change * Add test for fine-tuning with deprecated config options * Remove deprecated hyperparameters during model loading * Include deprecated hyperparameter warning * Test whether the warning is issued * Verify that the deprecated option is removed * Fix comments * Avoid defining deprecated options twice * Remap previous renamed config option `every_n_train_steps` * Update changelog --------- Co-authored-by: melihyilmaz <[email protected]> * Add FAQ entry about antibody sequencing * Don't crash when multiple beams have identical peptide scores (#306) * Test different beams with identical scores * Randomly break ties for beams with identical peptide score * Update changelog * Don't remove unit test * Allow csv to handle all newlines (#316) * Add 9-species model weights link to FAQ (#303) * Add model weights link * Generate new screengrabs with rich-codex * Clarify that these weights should only be used for benchmarking --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Wout Bittremieux <[email protected]> * Add FAQ entry about antibody sequencing (#304) * Add FAQ entry about antibody sequencing * Generate new screengrabs with rich-codex --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Melih Yilmaz <[email protected]> * Allow csv to handle all newlines The `csv` module tries to handle newlines itself. On Windows, this leads to line endings of `\r\r\n` instead of `\r\n`. Setting `newline=''` produces the intended output on both platforms. * Update CHANGELOG.md * Fix linting issue * Delete docs/images/help.svg --------- Co-authored-by: Melih Yilmaz <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Wout Bittremieux <[email protected]> Co-authored-by: William Stafford Noble <[email protected]> Co-authored-by: Wout Bittremieux <[email protected]> * Don't test on macOS versions with MPS (#327) * Prepare for release v4.2.0 * Update CHANGELOG.md (#332) --------- Co-authored-by: Melih Yilmaz <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: melihyilmaz <[email protected]> Co-authored-by: wsnoble <[email protected]> Co-authored-by: Joshua Klein <[email protected]>

bittremieux added 2 commits February 26, 2024 09:08

Test different beams with identical scores

d04a39d

Randomly break ties for beams with identical peptide score

a2ff04a

bittremieux requested a review from melihyilmaz February 26, 2024 08:24

bittremieux linked an issue Feb 26, 2024 that may be closed by this pull request

Casanovo's 'sequence' mode crashes when n_beams=10 #271

Closed

Update changelog

6d9adf0

melihyilmaz reviewed Feb 26, 2024

View reviewed changes

tests/unit_tests/test_unit.py Show resolved Hide resolved

Don't remove unit test

666132d

bittremieux requested a review from melihyilmaz February 27, 2024 06:59

melihyilmaz approved these changes Feb 27, 2024

View reviewed changes

melihyilmaz merged commit 6eabd6e into dev Feb 27, 2024
6 checks passed

melihyilmaz deleted the heappush branch February 27, 2024 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't crash when multiple beams have identical peptide scores #306

Don't crash when multiple beams have identical peptide scores #306

bittremieux commented Feb 26, 2024

codecov bot commented Feb 26, 2024 •

edited

Loading

melihyilmaz left a comment

Don't crash when multiple beams have identical peptide scores #306

Don't crash when multiple beams have identical peptide scores #306

Conversation

bittremieux commented Feb 26, 2024

codecov bot commented Feb 26, 2024 • edited Loading

Codecov Report

melihyilmaz left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 26, 2024 •

edited

Loading