Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with complement_map dict when using Broad GRCh38 reference #11

Open
jmlivingstone opened this issue Jan 25, 2022 · 0 comments
Open

Comments

@jmlivingstone
Copy link

Hello,

I am having an issue running reditools.py with the Broad version of the GRCh38 fasta file.
I noticed that there are non-ACGT nucleotide symbols within this reference file, so the complement_map function throws an error when it encounters these symbols
Traceback (most recent call last):
File "/reditools2.0/src/cineca/reditools.py", line 1374, in
analyze(options)
File "/reditools2.0/src/cineca/reditools.py", line 1133, in analyze
column = get_column(pos_based_read_dictionary, reads, splice_positions, last_chr, omopolymeric_positions, target_positions, i)
File "/reditools2.0/src/cineca/reditools.py", line 289, in get_column
ref = complement(ref)
File "/reditools2.0/src/cineca/reditools.py", line 1190, in complement
return complement_map[b]
KeyError: 'R'

I believe the solution is to update the dictionary as such to map the non ACTG symbols to 'N'
complement_map = {"A":"T", "T":"A", "C":"G", "G":"C", "R":"N", "Y":"N", "K":"N", "M":"N", "S":"N", "W":"N", "B":"N"}

One specific co-ordinate you can test this failure on is chr17:489373

Thank you,
Julie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant