Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow for Erin's refdb builder #11

Open
btupper opened this issue Feb 21, 2024 · 2 comments
Open

workflow for Erin's refdb builder #11

btupper opened this issue Feb 21, 2024 · 2 comments

Comments

@btupper
Copy link

btupper commented Feb 21, 2024

Hi All,

I started to convert the recent @egreyavis script to a more workflow friendly form. I forked this repos and will eventually make a pull request after I am done messing around. The workflow is in a subfolder in the fork here. I'm hoping that we can post questions/answers here so we have a more reliable place to track communications than my inbox (trust me, you know it will end up being eaten by the dog or going through the laundry there.)

The workflow moves most of the user defined values into a YAML, includes a set up script the will install and or load packages, and moves reusable code into a suite of functions. I haven't made it to the steps where queries are made, but I hope to get to that today.

Cheers,
Ben

@btupper
Copy link
Author

btupper commented Feb 22, 2024

Hi,

I have pulled a section of the script that tries to capture information where order is missing. It can be found here I think I have it close to the desired behavior, but it is a complex decision making step and it's hard to know for sure.

I do encounter this error from NCBI "HTTP failure: 502, bad gateway. This error code is often returned when trying to download many records in a single request. Try using web history as described in the rentrez tutorial". So I have added use_history = TRUE to each call to rentrez::entrez_search() It raises the question about the maximum number of returns when searching for targets rather than mitogenomes; former is set to 999999 which seems like a lot compared to 9999 for the latter. What is the motivation for setting the return max so high?

@egreyavis
Copy link
Contributor

egreyavis commented Feb 23, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants