Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to target specific genomic regions #1

Open
ajrominger opened this issue Jul 28, 2021 · 0 comments
Open

Add functionality to target specific genomic regions #1

ajrominger opened this issue Jul 28, 2021 · 0 comments

Comments

@ajrominger
Copy link
Member

I was thinking we could take one (or both) of two approaches:

  1. add arguments to function getNCBISeqID that specify additional query criteria for the call to the API (e.g. COI[Gene]). We could either make these named arguments, or allow them to be passed as ...
  2. one drawback to querying on specific terms related to the genomic region is that there (surprisingly) doesn't seem to be great consistency in GenBank about naming conventions (e.g. COI is also sometimes CO1, cyt ox 1, cytochrome oxidase subunit 1, etc). There are two possible work-arounds:
    1. make the function download all sequences for a species and then go back in and try to figure out which ones belong to the same genomic regions and which of one those regions is the one the user wanted (e.g. by fuzzy matching key words about the region or attempting to align all sequences and seeing which ones align to any sequences that are clearly labeled as matching the region of interest). I admit, this was the approach I at first had in mind
    2. use in silico PCR to extract out the region of interest, perhaps following these folks' work: https://github.com/limey-bean/CRUX_Creating-Reference-libraries-Using-eXisting-tools

All approaches have some drawbacks and some advantages.

Whatever we end up going for I think there should be a way of saying either the user wants one or a few specific regions, or they just want a dump of all sequences available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant