-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract Error: "Sequence contains non-DNA character '*' at position 393" #30
Comments
You seem to be using an older version. Python 2.7 and g2gtools 0.2.9. Can you update to Python 3 and g2gtools 2.0.0 and retry? |
@mattjvincent thank you so much for your reply. I have been using the docker image:
|
Installing from the main branch will install 2.0.0. I just created a release. Feel free to create a Docker image of your own. There is one in the main branch in the top level directory. |
@jon4thin ... Try:
You will have to mount directories and such, but this should get you started. |
I will give it a shot right now and update shortly! I have never made a Docker image before so I will have to see how far i can get. tysvm for getting me going! |
Is the |
sorry for the delay |
EMASE's codebase was folded into GBRS. However, emase-zero has remain unchanged. There have been some minor flag changes since upgrading gbrs to Python 3, but we will have to compile a list. |
Hey @mattjvincent , sorry to bother again. I updated my pipeline to use
I have the full error if needed. Unless I am missing something, I had a conversation with a GATK person, and it seems like one can just grep/awk remove these records anyway: Remove the It is just a little bit frustrating because the g2gtools -> alntools -> emase-zero pipeline requires 3 phases of manual removal of records:
This would be no problem if it was mentioned in a manual or something. It just takes a long time to figure out if you dont know before hand. |
Interesting. The reason it fails is because g2gtools only uses the following bases/characters (upper or lower can be used):
I understand that |
The There should be no stop codons denoted as |
I am trying to run
g2gtools extract
with a g2gtools DB created from a GTF and a FASTA. It is running successfully but after 14 minutes it stops:2024-08-05T19:38:02.180415541Z GTTCCAATACTGTGTTGCAGTGGGAGCCCAAACTTTCCCCAGTGTGAGTGCTCCCAGCAAGAAAGTGGCAAAGCAGATGGCCGCAGAGGAAGCCATGAAGGCCCTGCATGGGGAGGCGACCAACTCCATGGCTTCTGATAACCAG 2024-08-05T19:38:02.180441323Z [g2gtools debug] Exon ID=ENSE00003837056.1_L;Exon: ENSE00003837056.1_L chr1_L:154596815-154597005 (-1) #6 2024-08-05T19:38:02.180595316Z [g2gtools debug] chr1_L:154596815-154597005 (Length: 191) 2024-08-05T19:38:02.180600730Z CCTGAAGGTATGATCTCAGAGTCACTTGATAACTTGGAATCCATGATGCCCAACAAGGTCAGGAAGATTGGCGAGCTCGTGAGATACCTGAACACCAACCCTGTGGGTGGCCTTTTGGAGTACGCCCGCTCCCATGGCTTTGCTGCTGAATTCAAGTTGGTCGACCAGTCCGGACCTCCTCACGAGCCCAA 2024-08-05T19:38:02.180627776Z [g2gtools debug] Exon ID=ENSE00003836678.2_L;Exon: ENSE00003836678.2_L chr1_L:154589767-154590419 (-1) #7 2024-08-05T19:38:02.183394207Z Traceback (most recent call last): 2024-08-05T19:38:02.185504398Z File "/opt/conda/envs/g2gtools/bin/g2gtools", line 4, in <module> 2024-08-05T19:38:02.186001359Z __import__('pkg_resources').run_script('g2gtools==0.2.9', 'g2gtools') 2024-08-05T19:38:02.186006933Z File "/opt/conda/envs/g2gtools/lib/python2.7/site-packages/pkg_resources/__init__.py", line 666, in run_script 2024-08-05T19:38:02.187122669Z self.require(requires)[0].run_script(script_name, ns) 2024-08-05T19:38:02.187127761Z File "/opt/conda/envs/g2gtools/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1462, in run_script 2024-08-05T19:38:02.187499526Z exec(code, namespace, namespace) 2024-08-05T19:38:02.187942044Z File "/opt/conda/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/EGG-INFO/scripts/g2gtools", line 132, in <module> 2024-08-05T19:38:02.187946940Z G2GToolsApp() 2024-08-05T19:38:02.187949734Z File "/opt/conda/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/EGG-INFO/scripts/g2gtools", line 99, in __init__ 2024-08-05T19:38:02.187994284Z getattr(self, args.command)() 2024-08-05T19:38:02.187999699Z File "/opt/conda/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/EGG-INFO/scripts/g2gtools", line 105, in extract 2024-08-05T19:38:02.188002813Z g2gtools.g2g_commands.command_fasta_extract(sys.argv[2:], self.script_name + ' extract') 2024-08-05T19:38:02.188005779Z File "/opt/conda/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/g2gtools/g2g_commands.py", line 422, in command_fasta_extract 2024-08-05T19:38:02.189323301Z fasta.fasta_extract_transcripts(fasta_file, args.database, None, raw=args.raw) 2024-08-05T19:38:02.189330035Z File "/opt/conda/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/g2gtools/fasta.py", line 562, in fasta_extract_transcripts 2024-08-05T19:38:02.190316627Z partial_seq_str = str(g2g_utils.reverse_complement_sequence(partial_seq)) 2024-08-05T19:38:02.190322077Z File "/opt/conda/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/g2gtools/g2g_utils.py", line 229, in reverse_complement_sequence 2024-08-05T19:38:02.190975659Z return reverse_sequence(complement_sequence(sequence)) 2024-08-05T19:38:02.190981606Z File "/opt/conda/envs/g2gtools/lib/python2.7/site-packages/g2gtools-0.2.9-py2.7.egg/g2gtools/g2g_utils.py", line 214, in complement_sequence 2024-08-05T19:38:02.191010963Z raise ValueError("Sequence contains non-DNA character '{0}' at position {1:n}\n".format(val[position], position + 1)) 2024-08-05T19:38:02.191575326Z ValueError: Sequence contains non-DNA character '*' at position 393
Is '*' supposed to denote a stop codon? but then again, I only have 35 in my entire FASTA....
The text was updated successfully, but these errors were encountered: