-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: string index out of range #32
Comments
I have this same problem. I investigated the code and it's a problem with the function that loads ref data from the fasta file. Something is wrong with the position indexing causing the number of bases to be shorter than the range that's trying to read those bases. The data loader seems to add a configurable margin and well as an additional 500 bases on each side of the range. I'm not really sure why, maybe for visualization reasons? Anyway, i tried debugging it and mostly just got confused because it loads 599 bases when I use a range of 1-1700, but it only loads 1 base with a range of 1-581 and 19 bases with a range of 581-1700. Which means that somehow not only is the program running into indexing errors by making the range larger than the number of bases, but when I try to narrow the range it just loads barely any data at all, which is incredibly confusing. Normally I would expect the number of bases in range 1:1700 to equal the number of bases in any split of the range, so I simply don't understand how indexing is supposed to work in for this program. I think it's some kind of user error on the indexing, but the documentation doesn't really explain how to make this work. I get the sense the program was designed primarily for downloading reference data from a database so that the overlap margin data exists and this is causing weird out of range error behaviour for a full "chromosome" interval where you want to visualize the entire chromosome interval because your reference fasta database is made up of single genes (which is my use case). |
I met this problem too and I worked 1 day to solve it... I am gonna list some points that may be helpful if someone needs it. The first key process is in _option.py, line112-113 (the line may be inaccurate because I have changed the code):
The code afterward uses arguments 'g_spos' and 'g_epos', so make sure the margin is what you need. However, the _option.py doesn't receive the reference file, so I did not change it and just modified another scripts. The second key process is in bamsnap.py, line 587:
The code expands the range of selection for about 500 bp and 2 times of margin length. If you wanna show the whole region, it will tried to extract this longer range from your reference and caused error. My solution is to limit the spos and epos, just like:
However, it is easy to limit the spos but difficult for epos, because we don't know the max length of reference. So I changed the code of next function:
So that it can return the chromosome length, which can be used to limit the epos. The final code is like:
In this way, we can solve the problem in code "refseq[gpos+1] = seq[I]". However the change of spos and epos will cause some other problems. Sorry for I forget the exact error, and I will just paste the edited code below:
and line 27 in basetrack.py:
The trouble is, code above used an argument "self.chrom_len" which is not defined in the raw code. It was quoted in bamsnap.py. It originated from the function : get_refseq_from_fasta(), and was transported among several functions. It is hard to track it secondly, so if you want to solve this problem, work on yourself. |
command :
bamsnap -bam ./F1_dad.bam -ref new_GCA_024713975.2_ASM2471397v2_genomic.fna -ref_index_rebuild -pos CM045671.1:1-31859138
I got bam file for "olive flounder" species, and then names of the chromosomes are
"CM045671.1", "CM045672.1" ...
As I wanted to screen shot the alignment image for chromosome "CM045671.1",
I set the position like above.
And then, the error occured:
/home/jwshin0727/miniconda3/lib/python3.10/site-packages/pyfaidx-0.7.2.1-py3.10.egg/pyfaidx/init.py:523: RuntimeWarning: Index file /home/jwshin0727/CNU/Reference/Chinese/new_GCA_024713975.2_ASM2471397v2_genomic.fna.fai is older than FASTA file /home/jwshin0727/CNU/Reference/Chinese/new_GCA_024713975.2_ASM2471397v2_genomic.fna.
warnings.warn(
Process proc 1:
Traceback (most recent call last):
File "/home/jwshin0727/miniconda3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/jwshin0727/miniconda3/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/jwshin0727/miniconda3/lib/python3.10/site-packages/bamsnap-0.2.19-py3.10.egg/bamsnap/bamsnap.py", line 233, in run_process_drawplot_bamlist
refseq = rseq.get_refseq(pos1)
File "/home/jwshin0727/miniconda3/lib/python3.10/site-packages/bamsnap-0.2.19-py3.10.egg/bamsnap/bamsnap.py", line 543, in get_refseq
refseq = self.get_refseq_from_localfasta(pos1)
File "/home/jwshin0727/miniconda3/lib/python3.10/site-packages/bamsnap-0.2.19-py3.10.egg/bamsnap/bamsnap.py", line 592, in get_refseq_from_localfasta
refseq[gpos+1] = seq[i]
IndexError: string index out of range
2023-05-18 23:02:10,954 : [INFO] Total running time: 0.0 sec
How can I solve the problem??
The text was updated successfully, but these errors were encountered: