Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python errors when running lifton #21

Open
zgb963 opened this issue Aug 5, 2024 · 3 comments
Open

Python errors when running lifton #21

zgb963 opened this issue Aug 5, 2024 · 3 comments

Comments

@zgb963
Copy link

zgb963 commented Aug 5, 2024

Hello,

I'm having an issue with running the lifton software. I'm running it on an HPC environment using 100GB memory and a computer node that has 2000 cores. The below bash script has the command I'm using to run liftton. The target genome is rhemac10 FASTA (rheMac10.fa) and I've also inputed the human genome hg38 FASTA (hg38.fa) and human genome annotation in GTF format from NCBI (hg38.ncbiRefSeq.gtf). I want to output a lifton rhemac10 annotation (hg38_lifton_rhemac10.gff3)

#!/bin/bash

SECONDS=0

cd ~/macaque_snRNAseq

#make sure than conda env is sourced
. /home/genevieve.baddoo1-umw/miniconda3/etc/profile.d/conda.sh

#activate lifton_pip conda env 
conda activate lifton_pip

lifton -g liftoff/hg38.ncbiRefSeq.gtf -o lifton/hg38_lifton_rhemac10.gff3 -copies -infer-genes liftoff/rheMac10.fa liftoff/hg38.fa

duration=$SECONDS
echo "$(($duration / 3600)) hours and $((($duration % 3600) / 60)) minutes and $(($duration % 60)) seconds elapsed."

Here is the bsub command I'm using to run lifton

bsub -q long -R rusage[mem=25G] -R span[hosts=1] -W 96:00 -n 4 -o ~/macaque_snRNAseq/lifton/my_out.%J -e ~/macaque_snRNAseq/lifton/my_err.%J ~/macaque_snRNAseq/scripts/lifton.sh

I've installed lifton in a conda environment using pip. It does run for over an hour but then I get the following python errors

252893 of 414578 (61%)
257039 of 414578 (62%)
261185 of 414578 (63%)
265330 of 414578 (64%)
2024-07-31 17:48:31,795 - INFO - Committing changes
2024-07-31 17:48:32,102 - INFO - Creating relations(parent) index
2024-07-31 17:48:36,819 - INFO - Creating relations(child) index
2024-07-31 17:48:41,709 - INFO - Creating features(featuretype) index
2024-07-31 17:48:44,560 - INFO - Creating features (seqid, start, end) index
2024-07-31 17:48:48,643 - INFO - Creating features (seqid, start, end, strand) index
2024-07-31 17:48:53,080 - INFO - Running ANALYZE features
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/liftoff/align_features.py", line 61, in align_single_chroms
    minimap2_index = build_minimap2_index(target_file, args, threads_arg, minimap2_path)
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/liftoff/align_features.py", line 109, in build_minimap2_index
    subprocess.run(
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/subprocess.py", line 493, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/subprocess.py", line 1720, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'minimap2'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/bin/lifton", line 8, in <module>
    sys.exit(main())
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/lifton.py", line 352, in main
    run_all_lifton_steps(args)
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/lifton.py", line 267, in run_all_lifton_steps
    liftoff_annotation = lifton_utils.exec_liftoff(lifton_outdir, args)
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/lifton_utils.py", line 113, in exec_liftoff
    liftoff_annotation = run_liftoff.run_liftoff(outdir, args)
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/run_liftoff.py", line 25, in run_liftoff
    liftoff_main.run_all_liftoff_steps(liftoff_args)
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/liftoff/liftoff_main.py", line 19, in run_all_liftoff_steps
    feature_db, feature_hierarchy, ref_parent_order = liftover_types.lift_original_annotation(ref_chroms, target_chroms,
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/liftoff/liftover_types.py", line 15, in lift_original_annotation
    align_and_lift_features(ref_chroms, target_chroms, args, feature_hierarchy, liftover_type, unmapped_features,
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/liftoff/liftover_types.py", line 23, in align_and_lift_features
    aligned_segments= align_features.align_features_to_target(ref_chroms, target_chroms, args,
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/site-packages/lifton/liftoff/align_features.py", line 24, in align_features_to_target
    for result in pool.imap_unordered(func, np.arange(0, len(target_chroms))):
  File "/home/genevieve.baddoo1-umw/miniconda3/envs/lifton_pip/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
FileNotFoundError: [Errno 2] No such file or directory: 'minimap2'

Am I using enough memory and cores to run the software? Does the input genome annotation have to be in GFF3 format instead of GTF? Also, can lifton output an annotation in GTF format or does it only output in GFF3 format?

@Glfrey
Copy link

Glfrey commented Aug 9, 2024

Hi @zgb963 ,

I'm not an author of this software but looking at your error it's a minimap2 issue:

FileNotFoundError: [Errno 2] No such file or directory: 'minimap2 .

It's already been noted in a previous issue that minimap2 is a dependency but isn't being installed with Lifton via pip. To correct this you can simply install Minimap2, I've put example commands below:

conda activate lifton_pip

conda install bioconda::minimap2

@zgb963
Copy link
Author

zgb963 commented Aug 21, 2024

Hi @Glfrey, thanks for your suggestion. I tried that but I'm still running into issues.

@Kuanhao-Chao I'm trying different genome annotation files (gtf/gff) and different genome assembly (fasta/fna) files to try to get lifton to run

lifton -g liftoff/hg38.ncbiRefSeq.gtf -o lifton/hg38_ncbi_lifton_rhemac10.gff3 -copies -infer-genes liftoff/rheMac10.fa liftoff/hg38.fa

lifton -g liftoff/GRCh38_latest_genomic.gff.gz -o lifton/hg38_latest_genomic_rhemac10.gff3 -copies -infer-genes liftoff/rheMac10.fa liftoff/GRCh38_latest_genomic.fna.gz

lifton -g liftoff/gencode.v46.chr_patch_hapl_scaff.annotation.gff3.gz -o lifton/hg38_gencode_lifton_rhemac10.gff3 -copies -infer-genes liftoff/rheMac10.fa liftoff/GRCh38.p14.genome.fa.gz

But I'm running into the issue where it continues running for days/weeks. It seems to get stuck and then after a certain amount of time the job disconnects from the HPC cluster I'm running it on. The below error is from the first lifton command.

2024-08-09 23:50:32,915 - INFO - Committing changes
2024-08-09 23:50:47,379 - INFO - Populating features table and first-order relations: 4886701 features
2024-08-09 23:50:47,380 - INFO - Creating relations(parent) index
2024-08-09 23:50:51,905 - INFO - Creating relations(child) index
2024-08-09 23:50:56,358 - INFO - Inferring gene extents and writing to tempfile
2024-08-09 23:51:27,612 - INFO - Importing inferred features into db
0 of 414578 (0%)
4146 of 414578 (1%)
8292 of 414578 (2%)
12438 of 414578 (3%)
16584 of 414578 (4%)
20729 of 414578 (5%)
24875 of 414578 (6%)
29021 of 414578 (7%)
33167 of 414578 (8%)
37313 of 414578 (9%)
41458 of 414578 (10%)
45604 of 414578 (11%)
49750 of 414578 (12%)
53896 of 414578 (13%)
58041 of 414578 (14%)
62187 of 414578 (15%)
66333 of 414578 (16%)
70479 of 414578 (17%)
74625 of 414578 (18%)
78770 of 414578 (19%)
82916 of 414578 (20%)
87062 of 414578 (21%)
91208 of 414578 (22%)
95353 of 414578 (23%)
99499 of 414578 (24%)
103645 of 414578 (25%)
107791 of 414578 (26%)
111937 of 414578 (27%)
116082 of 414578 (28%)
120228 of 414578 (29%)
124374 of 414578 (30%)
128520 of 414578 (31%)
132665 of 414578 (32%)
136811 of 414578 (33%)
140957 of 414578 (34%)
145103 of 414578 (35%)
149249 of 414578 (36%)
153394 of 414578 (37%)
157540 of 414578 (38%)
161686 of 414578 (39%)
165832 of 414578 (40%)
169977 of 414578 (41%)
174123 of 414578 (42%)
178269 of 414578 (43%)
182415 of 414578 (44%)
186561 of 414578 (45%)
190706 of 414578 (46%)
194852 of 414578 (47%)
198998 of 414578 (48%)
203144 of 414578 (49%)
207289 of 414578 (50%)
211435 of 414578 (51%)
215581 of 414578 (52%)
219727 of 414578 (53%)
223873 of 414578 (54%)
228018 of 414578 (55%)
232164 of 414578 (56%)
236310 of 414578 (57%)
240456 of 414578 (58%)
244602 of 414578 (59%)
248747 of 414578 (60%)
252893 of 414578 (61%)
257039 of 414578 (62%)
261185 of 414578 (63%)
265330 of 414578 (64%)
2024-08-10 00:21:18,822 - INFO - Committing changes
2024-08-10 00:21:19,141 - INFO - Creating relations(parent) index
2024-08-10 00:21:23,443 - INFO - Creating relations(child) index
2024-08-10 00:21:27,910 - INFO - Creating features(featuretype) index
2024-08-10 00:21:30,595 - INFO - Creating features (seqid, start, end) index
2024-08-10 00:21:34,485 - INFO - Creating features (seqid, start, end, strand) index
2024-08-10 00:21:38,633 - INFO - Running ANALYZE features
[M::mm_idx_gen::75.398*0.97] collected minimizers
[M::mm_idx_gen::108.555*0.98] sorted minimizers
[M::main::117.121*0.97] loaded/built the index for 2939 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2939
[M::mm_idx_stat::117.963*0.97] distinct minimizers: 101324913 (39.04% are singletons); average occurrences: 5.469; average spacing: 5.362; total length: 2971331530
[M::main] Version: 2.28-r1209
[M::main] CMD: minimap2 -d liftoff/rheMac10.fa.mmi -a --end-bonus 5 --eqx -N 50 -p 0.5 -t 1 liftoff/rheMac10.fa
[M::main] Real time: 118.148 sec; CPU: 114.212 sec; Peak RSS: 18.134 GB
[M::main::7.523*1.02] loaded/built the index for 2939 target sequence(s)
[M::mm_mapopt_update::9.039*1.01] mid_occ = 596
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2939
[M::mm_idx_stat::9.880*1.01] distinct minimizers: 101324913 (39.04% are singletons); average occurrences: 5.469; average spacing: 5.362; total length: 2971331530
[M::worker_pipeline::13992.728*1.00] mapped 82 sequences
[M::worker_pipeline::24214.322*1.00] mapped 100 sequences
[M::worker_pipeline::36576.297*1.00] mapped 47 sequences
[M::worker_pipeline::50749.561*1.00] mapped 172 sequences
[M::worker_pipeline::64914.746*1.00] mapped 33 sequences
[M::worker_pipeline::73109.560*1.00] mapped 219 sequences
[M::worker_pipeline::83463.119*1.00] mapped 127 sequences
[M::worker_pipeline::94445.968*1.00] mapped 32 sequences
[M::worker_pipeline::104860.589*1.00] mapped 99 sequences
[M::worker_pipeline::115211.614*1.00] mapped 43 sequences
[M::worker_pipeline::128461.516*1.00] mapped 64 sequences
[M::worker_pipeline::141707.142*1.00] mapped 18 sequences
[M::worker_pipeline::151138.239*1.00] mapped 147 sequences
[M::worker_pipeline::163815.684*1.00] mapped 67 sequences
[M::worker_pipeline::174043.778*1.00] mapped 172 sequences
[M::worker_pipeline::182177.212*1.00] mapped 114 sequences
[M::worker_pipeline::191677.295*1.00] mapped 110 sequences
[M::worker_pipeline::199522.721*1.00] mapped 116 sequences
[M::worker_pipeline::207399.311*1.00] mapped 151 sequences
[M::worker_pipeline::220907.263*1.00] mapped 62 sequences
[M::worker_pipeline::240168.177*1.00] mapped 39 sequences
[M::worker_pipeline::250907.620*1.00] mapped 288 sequences
[M::worker_pipeline::259718.763*1.00] mapped 219 sequences
[M::worker_pipeline::270653.477*1.00] mapped 323 sequences
[M::worker_pipeline::280281.698*1.00] mapped 46 sequences
[M::worker_pipeline::287356.027*1.00] mapped 96 sequences
[M::worker_pipeline::293751.409*1.00] mapped 211 sequences
[M::worker_pipeline::309397.448*1.00] mapped 37 sequences
[M::worker_pipeline::320090.547*1.00] mapped 138 sequences
[M::worker_pipeline::337433.466*1.00] mapped 20 sequences
[M::worker_pipeline::354774.982*1.00] mapped 6 sequences
[M::worker_pipeline::363487.857*1.00] mapped 43 sequences
[M::worker_pipeline::375867.275*1.00] mapped 59 sequences
[M::worker_pipeline::384898.624*1.00] mapped 73 sequences
[M::worker_pipeline::394625.919*1.00] mapped 6 sequences
[M::worker_pipeline::414167.374*1.00] mapped 27 sequences
[M::worker_pipeline::433697.015*1.00] mapped 21 sequences
[M::worker_pipeline::453218.862*1.00] mapped 4 sequences
[M::worker_pipeline::472765.670*1.00] mapped 5 sequences
[M::worker_pipeline::492359.958*1.00] mapped 3 sequences
[M::worker_pipeline::505418.164*1.00] mapped 44 sequences
[M::worker_pipeline::513250.057*1.00] mapped 61 sequences
[M::worker_pipeline::519450.873*1.00] mapped 127 sequences
[M::worker_pipeline::529641.397*1.00] mapped 163 sequences
[M::worker_pipeline::542846.726*1.00] mapped 93 sequences
[M::worker_pipeline::559132.917*1.00] mapped 3 sequences
[M::worker_pipeline::575434.872*1.00] mapped 3 sequences
[M::worker_pipeline::591715.965*1.00] mapped 3 sequences
User defined signal 2

I don't understand why it either errors out or it takes a long time to run? Do I have to use specific genome annotations/assemblies? I was having the same issues with the Liftoff software, which is why I'm trying Lifton. Is it the -copies flag or the -infer-genes flag? When I don't include the -infer-genes flag for example, I get the error saying that there are no features in the genome annotation file.

@zgb963
Copy link
Author

zgb963 commented Oct 15, 2024

@Kuanhao-Chao any updates on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants