Casanovo's 'sequence' mode crashes when n_beams=10 #271

cfmelend · 2023-12-19T19:57:35Z

I'm experiencing the following error when running Casanovo's dev branch in sequence mode with n_beams=10 (this occurs with predict_batch_size >= 4:

File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/bin/casanovo", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/proj/cas_revisions/casanovo/casanovo.py", line 142, in sequence
    runner.predict(peak_path, output)
  File "/net/noble/vol1/home/cfmelend/proj/cas_revisions/casanovo/denovo/model_runner.py", line 163, in predict
    self.trainer.predict(self.model, self.loaders.test_dataloader())
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 864, in predict
    return call._call_and_handle_interrupt(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 903, in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1030, in _run_stage
    return self.predict_loop.run()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/loops/prediction_loop.py", line 122, in run
    self._predict_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/loops/prediction_loop.py", line 250, in _predict_step
    predictions = call._call_strategy_hook(trainer, "predict_step", *step_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/miniconda3/envs/cas_rev/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 429, in predict_step
    return self.lightning_module.predict_step(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/proj/cas_revisions/casanovo/denovo/model.py", line 823, in predict_step
    self.forward(batch[0], batch[1]),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/proj/cas_revisions/casanovo/denovo/model.py", line 199, in forward
    return self.beam_search_decode(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/noble/vol1/home/cfmelend/proj/cas_revisions/casanovo/denovo/model.py", line 271, in beam_search_decode
    self._cache_finished_beams(
  File "/net/noble/vol1/home/cfmelend/proj/cas_revisions/casanovo/denovo/model.py", line 538, in _cache_finished_beams
    heapadd(
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Predicting DataLoader 0:  58%|█████▊    | 1725/2952 [3:15:59<2:19:24,  0.15it/s]

This error doesn't appear to show up for n_beams < 10 or n_beams > 10. I've attached what appears to be the batch (32 spectra) that triggers this error consistently to the post.
single_problem_batch.txt

The text was updated successfully, but these errors were encountered:

bittremieux · 2023-12-25T10:48:13Z

I can't reproduce it with Casanovo v4.0.1. Can you try again with the latest release and check whether there are any differences between my log/output and the results you obtain?

out.log
out.mztab.txt

cfmelend · 2024-01-08T22:02:04Z

Running inference on the single problematic batch with the default settings from Casanovo v4.0.1 and the new default ckpt file I'm unable to reproduce the bug.

Peggyying · 2024-02-24T13:39:27Z

I'm experiencing the same error when running Casanovo's evaluate mode with n_beams>=3 :

Seed set to 454
INFO: Casanovo version 4.1.0
INFO: Sequencing and evaluating peptides from:
INFO:   /my/data/cross.cat.test.mgf
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
INFO: Reading 1 files...
/my/data/cross.cat.test.mgf: 27142spectra [00:07, 3869.78spectra/s]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Validation DataLoader 0:   0%|                                                                                     | 0/27 [00:00<?, ?it/s]WARNING: /anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/utilities/data.py:77: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1024. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.

WARNING: /anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/utilities/data.py:77: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1024. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.

Validation DataLoader 0:  30%|██████████████████████▊                                                      | 8/27 [06:41<15:53,  0.02it/s]Traceback (most recent call last):
  File "/anaconda3/envs/casanovo_env/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/anaconda3/envs/casanovo_env/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/my/casanovo-4.1.0/casanovo/casanovo.py", line 492, in <module>
    main()
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/my/casanovo-4.1.0/casanovo/casanovo.py", line 174, in evaluate
    runner.evaluate(annotated_peak_path)
  File "/my/casanovo-4.1.0/casanovo/denovo/model_runner.py", line 133, in evaluate
    self.trainer.validate(self.model, self.loaders.test_dataloader())
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 645, in validate
    return call._call_and_handle_interrupt(
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 685, in _validate_impl
    results = self._run(model, ckpt_path=ckpt_path)
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1028, in _run_stage
    return self._evaluation_loop.run()
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 134, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 391, in _evaluation_step
    output = call._call_strategy_hook(trainer, hook_name, *step_args)
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/anaconda3/envs/casanovo_env/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 403, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/my/casanovo-4.1.0/casanovo/denovo/model.py", line 767, in validation_step
    for spectrum_preds in self.forward(batch[0], batch[1]):
  File "/my/casanovo-4.1.0/casanovo/denovo/model.py", line 200, in forward
    return self.beam_search_decode(
  File "/my/casanovo-4.1.0/casanovo/denovo/model.py", line 272, in beam_search_decode
    self._cache_finished_beams(
  File "/my/casanovo-4.1.0/casanovo/denovo/model.py", line 539, in _cache_finished_beams
    heapadd(
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Validation DataLoader 0:  30%|██▉       | 8/27 [07:08<16:57,  0.02it/s]

The log file is casanovo-4.1.0.log.
The error doesn't show up for n_beams=1 or 2.

bittremieux · 2024-02-25T17:17:33Z

Ok, I think I figured out what the issue is. Here we cache finished beams based on their scores. The problem arises in the edge case that there is a cached beam and a beam to be added that have identical scores. In that case, heappush will use the second (and so forth) element to determine the order in which entries should occur in the heap. Here, the second element is an array (of amino acid scores), leading to the "ambiguous truth value" error. And this occurs when using a higher number of beams because in that case presumably there will be beams cached with score 0, whereas real scores are extremely unlikely to be identical.

Tricky bug, I don't have a good solution of the top off my head. We should discuss how to address this.

cfmelend added the bug Something isn't working label Dec 19, 2023

bittremieux self-assigned this Dec 21, 2023

bittremieux closed this as completed Jan 9, 2024

bittremieux reopened this Feb 25, 2024

wsnoble mentioned this issue Feb 26, 2024

[v3.5.0 ValueError] in the denovo mode of prediction with pretrained massive-kb model #305

Closed

bittremieux mentioned this issue Feb 26, 2024

Don't crash when multiple beams have identical peptide scores #306

Merged

bittremieux linked a pull request Feb 26, 2024 that will close this issue

Don't crash when multiple beams have identical peptide scores #306

Merged

bittremieux closed this as completed Feb 27, 2024

bittremieux mentioned this issue May 14, 2024

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() #329

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Casanovo's 'sequence' mode crashes when n_beams=10 #271

Casanovo's 'sequence' mode crashes when n_beams=10 #271

cfmelend commented Dec 19, 2023

bittremieux commented Dec 25, 2023

cfmelend commented Jan 8, 2024

Peggyying commented Feb 24, 2024

bittremieux commented Feb 25, 2024 •

edited

Loading

Casanovo's 'sequence' mode crashes when n_beams=10 #271

Casanovo's 'sequence' mode crashes when n_beams=10 #271

Comments

cfmelend commented Dec 19, 2023

bittremieux commented Dec 25, 2023

cfmelend commented Jan 8, 2024

Peggyying commented Feb 24, 2024

bittremieux commented Feb 25, 2024 • edited Loading

bittremieux commented Feb 25, 2024 •

edited

Loading