Skip to content

Commit

Permalink
Add Casanovo-DB Functionality (#325)
Browse files Browse the repository at this point in the history
* begin adding tests for annotate mode

* add basic test for annotate mode

* added test case for annotate mode and modified method

* very rough sketch of db upgrade (untested)

* small upgrades to documentation

* better output formatting

* all tests added

* remove minor debugging print statement

* Generate new screengrabs with rich-codex

* remove excess info logs, add monkeypatch to tests

* mp fix

* fix line lengths and modify test

* Generate new screengrabs with rich-codex

* justins requested fixes

* added minor changes as requested by Wout

* partial fixes requested by wout. Lots of subclassing removed

* documentation fixes and starting to cleanup batching code

* cleaned up on_predict_batch_end, TODOs for calc_mz

* add proper calc_mz calculation with depthcharge

* rough implementation

* tested implementation of db search

* fix for issue with 0 candidates

* minor fixes added

* reordered and renamed variables for consistency

* casanovo-db full working version with code simplification

* Generate new screengrabs with rich-codex

* fix batching issues

* small fixes regarding documentation, import syntax, etc.

* add proteindatabase

* Generate new screengrabs with rich-codex

* finish proteindatabase

* all comments addressed

* new comments addressed

* final adjustments added

* minor changes regarding formatting and small efficiency boosts

* changes before reformatting config

* replace all occurences of "max_length" with "max_peptide_len"

* added nonspecific digestion

* minor comments

* full branch comments addressed

* Generate new screengrabs with rich-codex

* updated and fixed failed tests

* add mztab validation to dbsearch test

* lint fix

* fix integration test

* fix unit tests

* force fix test

* clean up test_digest_fasta_enzyme

* adjust test_digest_fasta_mods

* allows top_match filtering for casanovo-db

* change default value for protein value in PepSpecMatch

* reverse issues with decoder

* update test and remove logging statement

* db_utils fixes

* updates to dataloaders, model_runner, and model.py

* near final changes for all but db_utils

* line length fixes

* Minor refactoring and type hint fixes

* Use mask for more efficient candidate filtering

* Reorder methods in logical order

* Fix unit tests

* Directly generate DB peptides as DataFrame

* Fix type hints and line lengths

* Generate new screengrabs with rich-codex

* Refactor batching to avoid code repetition

* More minor refactoring

* Reformat with black

* Minor fix

* Fix output name crash

* Fix AA score masking

* Fix PSM export

* Less verbose logging of skipped peptides

* Appropriate end-of-run reporting

* Fix PSM export from de novo

* Generalize end-of-run reporting

* Log additional information on spectra with no matching candidates

* Fix linting issue

* Fix some testing warnings

* Log digestion settings

* Reduce logging level for spectra without candidates

* Round peptide masses for consistent sorting

* Fox linting

* Remove superfluous PSM export

* Update changelog

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wout Bittremieux <[email protected]>
  • Loading branch information
3 people authored Nov 18, 2024
1 parent 0d1df14 commit 17bb3f2
Show file tree
Hide file tree
Showing 17 changed files with 2,419 additions and 712 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

### Added

- Casanovo-DB mode (`casanovo db_search`) to use Casanovo as a learned score function for sequence database searching (given a FASTA protein database).
- During training, model checkpoints will be saved at the end of each training epoch in addition to the checkpoints saved at the end of every validation run.
- Besides as a local file, model weights can be specified from a URL. Upon initial download, the weights file is cached for future re-use.
- Training and optimizer metrics can now be logged to a CSV file by setting the `log_metrics` config file option to true - the CSV file will be written to under a sub-directory of the output directory named `csv_logs`.
Expand Down
Loading

0 comments on commit 17bb3f2

Please sign in to comment.