Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codan fails and kills pipeline due to finding duplicate key(s) #76

Open
laurabaxter21 opened this issue Mar 21, 2023 · 4 comments
Open

Comments

@laurabaxter21
Copy link

Running the latest run_finder-v1.1.0.
Everything runs fine until the codan step (Braker is complete), which finds a duplicate key and kills the pipeline.
Looking at the assemblies_psiclass_modified/combined/combined_split_transcripts_with_bad_SJ_redundancy_removed.fasta file for duplicated sequence IDs, I find 2 (C2.27447_0_covsplit.0 and C7.149167_0_covsplit.0, both with different sequences in each of the duplicates).

Could I just delete these out from FASTA/gtf and continue from checkpoint 5?

assemblies_psiclass_modified/combined/cds_predict.error:

Traceback (most recent call last):
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 524, in
main()
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 506, in main
codan_BOTH(options.transcripts, options.output_folder, options.model, options.cpu)
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 355, in codan_BOTH
retrieveORF_BOTH(transcripts, outF+"minus.fa", outF)
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 147, in retrieveORF_BOTH
record_dictP = SeqIO.index(transcripts, "fasta")
File "/usr/lib/python3/dist-packages/Bio/SeqIO/init.py", line 979, in index
return _IndexedSeqFileDict(
File "/usr/lib/python3/dist-packages/Bio/File.py", line 350, in init
raise ValueError("Duplicate key '%s'" % key)
ValueError: Duplicate key 'C2.27447_0_covsplit.0'

@sagnikbanerjee15
Copy link
Owner

Hello @laurabaxter21,

Thank you very much for your interest in finder. We have decided to focus our attention on developing the 2nd version of the software. As of now, we do not have the capabilities to support the older version due to a lack of personnel and I sincerely apologize for that. If you want to follow up on this please email me at [email protected] and I will do my best to help you out.

Thank you.

@DrDoom-EvoGen
Copy link

Running the latest run_finder-v1.1.0. Everything runs fine until the codan step (Braker is complete), which finds a duplicate key and kills the pipeline. Looking at the assemblies_psiclass_modified/combined/combined_split_transcripts_with_bad_SJ_redundancy_removed.fasta file for duplicated sequence IDs, I find 2 (C2.27447_0_covsplit.0 and C7.149167_0_covsplit.0, both with different sequences in each of the duplicates).

Could I just delete these out from FASTA/gtf and continue from checkpoint 5?

assemblies_psiclass_modified/combined/cds_predict.error:

Traceback (most recent call last): File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 524, in main() File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 506, in main codan_BOTH(options.transcripts, options.output_folder, options.model, options.cpu) File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 355, in codan_BOTH retrieveORF_BOTH(transcripts, outF+"minus.fa", outF) File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 147, in retrieveORF_BOTH record_dictP = SeqIO.index(transcripts, "fasta") File "/usr/lib/python3/dist-packages/Bio/SeqIO/init.py", line 979, in index return _IndexedSeqFileDict( File "/usr/lib/python3/dist-packages/Bio/File.py", line 350, in init raise ValueError("Duplicate key '%s'" % key) ValueError: Duplicate key 'C2.27447_0_covsplit.0'

I am having the same issue. Did you figure out a solution?

@laurabaxter21
Copy link
Author

laurabaxter21 commented Jun 7, 2023 via email

@DrDoom-EvoGen
Copy link

That worked for me also.

Thank you!

Greg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants