-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codan fails and kills pipeline due to finding duplicate key(s) #76
Comments
Hello @laurabaxter21, Thank you very much for your interest in Thank you. |
I am having the same issue. Did you figure out a solution? |
Hi, yes I recall I just deleted the offending duplicated sequences from the FASTA file and their corresponding entries from the gft file (they didn't seem critically important). Then I re-ran finder from checkpoint 5 and it completed OK.
Hope that helps,
Laura
…________________________________
From: Gregory M. Chorak, PhD ***@***.***>
Sent: 07 June 2023 16:03
To: sagnikbanerjee15/Finder ***@***.***>
Cc: Baxter, Laura ***@***.***>; Mention ***@***.***>
Subject: Re: [sagnikbanerjee15/Finder] codan fails and kills pipeline due to finding duplicate key(s) (Issue #76)
Running the latest run_finder-v1.1.0. Everything runs fine until the codan step (Braker is complete), which finds a duplicate key and kills the pipeline. Looking at the assemblies_psiclass_modified/combined/combined_split_transcripts_with_bad_SJ_redundancy_removed.fasta file for duplicated sequence IDs, I find 2 (C2.27447_0_covsplit.0 and C7.149167_0_covsplit.0, both with different sequences in each of the duplicates).
Could I just delete these out from FASTA/gtf and continue from checkpoint 5?
assemblies_psiclass_modified/combined/cds_predict.error:
Traceback (most recent call last): File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 524, in main() File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 506, in main codan_BOTH(options.transcripts, options.output_folder, options.model, options.cpu) File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 355, in codan_BOTH retrieveORF_BOTH(transcripts, outF+"minus.fa", outF) File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 147, in retrieveORF_BOTH record_dictP = SeqIO.index(transcripts, "fasta") File "/usr/lib/python3/dist-packages/Bio/SeqIO/init.py", line 979, in index return _IndexedSeqFileDict( File "/usr/lib/python3/dist-packages/Bio/File.py", line 350, in init raise ValueError("Duplicate key '%s'" % key) ValueError: Duplicate key 'C2.27447_0_covsplit.0'
I am having the same issue. Did you figure out a solution?
—
Reply to this email directly, view it on GitHub<#76 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFLU2GXLSA533TUDYDT4HB3XKCJ2RANCNFSM6AAAAAAWDCCUWU>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
That worked for me also. Thank you! Greg |
Running the latest run_finder-v1.1.0.
Everything runs fine until the codan step (Braker is complete), which finds a duplicate key and kills the pipeline.
Looking at the assemblies_psiclass_modified/combined/combined_split_transcripts_with_bad_SJ_redundancy_removed.fasta file for duplicated sequence IDs, I find 2 (C2.27447_0_covsplit.0 and C7.149167_0_covsplit.0, both with different sequences in each of the duplicates).
Could I just delete these out from FASTA/gtf and continue from checkpoint 5?
assemblies_psiclass_modified/combined/cds_predict.error:
Traceback (most recent call last):
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 524, in
main()
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 506, in main
codan_BOTH(options.transcripts, options.output_folder, options.model, options.cpu)
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 355, in codan_BOTH
retrieveORF_BOTH(transcripts, outF+"minus.fa", outF)
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 147, in retrieveORF_BOTH
record_dictP = SeqIO.index(transcripts, "fasta")
File "/usr/lib/python3/dist-packages/Bio/SeqIO/init.py", line 979, in index
return _IndexedSeqFileDict(
File "/usr/lib/python3/dist-packages/Bio/File.py", line 350, in init
raise ValueError("Duplicate key '%s'" % key)
ValueError: Duplicate key 'C2.27447_0_covsplit.0'
The text was updated successfully, but these errors were encountered: