You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
medaka predict: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
#536
Describe the bug
When executing hundres of medaka smolecule runs in parallel, I attempted to allocate cpus based on the expected complexity of each smolecule.fa file, as follows:
High complexity: Files with more than 600 alignments and over 150 regions receive 3 threads.
Medium complexity: Files with 50-600 alignments and 12-150 regions receive 2 threads.
Low complexity: All other files receive 1 thread.
Alternatively, I tried setting a uniform allocation of 2 threads for all smolecule.fa files, regardless of complexity.
As I allocate 1GB memory for each cpu of a run (i.e., 100GB memory for a run of a total of 100 cpus), each thread gets allocated ~1GB of memory. However, as this is not a hard requirement for a medaka smolecule task to start, some smolecule.fa tasks will have less memory available during the run.
I use the following medaka smolecule command (e.g., for 3 threads):
In most cases, consensus sequence generation completes successfully for smolecule.fa files. However, in certain cases:
When threads are allocated based on file complexity, approximately 20% of the smolecule.fa files produce the error message below during the sampling or prediction process, or just stall at the same position in the log without giving the error message.
When a uniform allocation of 2 threads is used, approximately 1% of the smolecule.fa files produce the same error message, or just stall at the same position in the log without giving the error message.
Error message:
(during sampling process)
[05:15:03 - Sampler] Initializing sampler for consensus of region 1906:0-1546.
[05:15:03 - Feature] Processed 1800:0.0-1546.0 (median depth 3.0)
[05:15:03 - Sampler] Took 0.02s to make features.
[05:15:03 - Sampler] Initializing sampler for consensus of region 1967:0-1542.
[05:15:03 - Feature] Processed 1967:0.0-1541.0 (median depth 3.0)
[05:15:03 - Sampler] Took 0.01s to make features.
[05:15:03 - Sampler] Initializing sampler for consensus of region 1984:0-1542.
[05:15:03 - Feature] Processed 1906:0.0-1545.0 (median depth 3.0)
[05:15:03 - Sampler] Took 0.03s to make features.
[05:15:03 - Sampler] Initializing sampler for consensus of region 1988:0-1544.
[05:15:03 - Feature] Processed 1984:0.0-1541.0 (median depth 3.0)
[05:15:03 - Sampler] Took 0.03s to make features.
[05:15:03 - Feature] Processed 1988:0.0-1543.0 (median depth 3.0)
[05:15:03 - Sampler] Took 0.06s to make features.
Traceback (most recent call last):
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/bin/medaka", line 11, in <module>
sys.exit(main())
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/lib/python3.10/site-packages/medaka/medaka.py", line 836, in main
args.func(args)
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/lib/python3.10/site-packages/medaka/smolecule.py", line 498, in main
_ = fut.result()
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
(during prediction process)
[07:01:51 - PWorker] Batches in cache: 8.
[07:01:51 - PWorker] 22.8% Done (0.2/0.9 Mbases) in 377.1s
[07:05:04 - PWorker] Batches in cache: 8.
[07:05:04 - PWorker] 28.6% Done (0.2/0.9 Mbases) in 569.6s
[07:05:38 - PWorker] Batches in cache: 8.
[07:05:38 - PWorker] 34.3% Done (0.3/0.9 Mbases) in 604.3s
[07:05:51 - PWorker] Batches in cache: 8.
[07:05:51 - PWorker] 40.1% Done (0.3/0.9 Mbases) in 616.9s
[07:06:41 - PWorker] Batches in cache: 8.
[07:06:41 - PWorker] 45.8% Done (0.4/0.9 Mbases) in 666.8s
[07:06:47 - PWorker] Batches in cache: 8.
[07:08:53 - PWorker] Batches in cache: 8.
[07:08:53 - PWorker] 57.3% Done (0.5/0.9 Mbases) in 799.3s
Traceback (most recent call last):
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/bin/medaka", line 11, in <module>
sys.exit(main())
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/lib/python3.10/site-packages/medaka/medaka.py", line 836, in main
args.func(args)
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/lib/python3.10/site-packages/medaka/smolecule.py", line 498, in main
_ = fut.result()
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/home/m.messemaker/miniconda3/envs/py310_nanopore_tcr_consensus_v3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Or medaka smolecule just stalls without giving the error message at the same position in the .log.
Lastly, the error or stalling does not occur when I run with 25 threads per smolecule.fa, but this approach prevents me from completing all iterations over my smolecule.fa files, as too many threads remain idle for each file.
Is this error due to a memory allocation issue?
Should I allocate more memory per thread?
Would decreasing the batch size in medaka predict improve stability?
Thank you so much for your help again!
Marius
`` Logging Example of the # of alignments and regions:
Update: I tried making memory a hard requirement for a medaka smolecule task to start. Specifically, I tried multiple runs where I increased memory requirement for a task to start from 1 GB to 4.25 GB in steps of 0.25 GB. Besides the memory requirement, I uniformly require at least 2 cpus for a medaka smolecule task to start. Interestingly, increasing memory to 4.25 GB does not get rid of the 1% of smolecule.fa files that stalls at this position in the log (the error from the comment above is not returned anymore):
In addition, medaka smolecule does not stall like this on the same smolecule.fa files always in different runs. So, it does not seem a memory nor smolecule.fa file specific issue.
Again, thank you for your help with resolving this issue!
Describe the bug
When executing hundres of
medaka smolecule
runs in parallel, I attempted to allocate cpus based on the expected complexity of eachsmolecule.fa
file, as follows:Alternatively, I tried setting a uniform allocation of 2 threads for all
smolecule.fa
files, regardless of complexity.As I allocate 1GB memory for each cpu of a run (i.e., 100GB memory for a run of a total of 100 cpus), each thread gets allocated ~1GB of memory. However, as this is not a hard requirement for a
medaka smolecule
task to start, somesmolecule.fa
tasks will have less memory available during the run.I use the following
medaka smolecule
command (e.g., for 3 threads):In most cases, consensus sequence generation completes successfully for
smolecule.fa
files. However, in certain cases:Error message:
(during sampling process)
(during prediction process)
Or medaka smolecule just stalls without giving the error message at the same position in the .log.
Lastly, the error or stalling does not occur when I run with 25 threads per smolecule.fa, but this approach prevents me from completing all iterations over my smolecule.fa files, as too many threads remain idle for each file.
Is this error due to a memory allocation issue?
batch size
inmedaka predict
improve stability?Thank you so much for your help again!
Marius
``
Logging Example of the # of alignments and regions:
Environment:
Additional information
The text was updated successfully, but these errors were encountered: