-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using local blast tools in lieu of entrez #12
Comments
@btupper try https://docs.ropensci.org/restez/articles/restez.html on charlie |
@egreyavis which databases (databasii?) are of interest to the project? |
SO, I downloaded 1,2 and 6 ('Invertebrate', 'Plant (including fungi and algae)' and 'Other vertebrate') and then ran
I think this exceeds R's limit. I'll have to investigate. If needed, can we run the process on the 3 databasii separately and then merge the results? |
Yes we could run them separately and then merge. |
OK - I'll set that up and see what happens |
I'm thinking we should set the |
Sure that's fine!
Erin K. Grey, PhD
Phone: (773) 401-9849
Web: www.egreylab.com
Email: ***@***.***
…On Mon, Apr 8, 2024 at 8:52 AM Ben Tupper ***@***.***> wrote:
I'm thinking we should set the max_length argument to avoid that error.
(It's not really an error but a limitation of character lengths in R - who
knew one might want 2^31 characters in a sequence?). I'm not sure about
downstream consequences, but I suspect that it would allow us to proceed.
How about max_length = 10^9 which is about half the 2^31 at 1 billion.
—
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI3FEDPO5SNUXNLHV2GYJFTY4KHILAVCNFSM6AAAAABE73XBP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGY3TOOJQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I have tried a number of different max_length values (10^6, 10^15, 10^20, etc) and I still encounter that error on occasion. I kicked all three (invertebrates, plants and vertebrates) this morning - each using a yaml similar tot he follwoing.
Invertebrates bailed early with that same error. Plants and vertebrates are still running. |
Good news! The vertebrates database was successfully built. Bad news! Plants joined invertebrates in failing to build. |
I have cloned the |
Thanks Ben.
…On Fri, Apr 12, 2024, 9:41 AM Ben Tupper ***@***.***> wrote:
I have cloned the restez package, and have added error trapping/handling
to the bit of code that flops. I'll build the package on "charlie" and give
it a whirl. If that resolves the issues (by skipping the big ones) then we
are at least unblocked. I'll keep you posted.
—
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI3FEDKIVUZ3Z6ISZBM4XNLY47QB5AVCNFSM6AAAAABE73XBP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJRG44DOMBWGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Good news! Three databases (databasii?) downloaded an operational...
|
sweet!!
Erin K. Grey, PhD
Phone: (773) 401-9849
Web: www.egreylab.com
Email: ***@***.***
…On Tue, Apr 23, 2024 at 12:22 PM Ben Tupper ***@***.***> wrote:
Good news! Three databases (databasii?) downloaded an operational...
INFO [2024-04-23 12:19:49] db_name: invertebrate
INFO [2024-04-23 12:19:49] db_ready: TRUE
Checking setup status at ...
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Restez path ...
... Path '/mnt/storage/data/edna/refdb/restez/invertebrate/restez'
... Does path exist? 'Yes'
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Download ...
... Path '/mnt/storage/data/edna/refdb/restez/invertebrate/restez/downloads'
... Does path exist? 'Yes'
... N. files 2100
... Total size 261G
... GenBank division selections 'Invertebrate'
... GenBank Release 259
... Last updated '2024-04-08 10:32:31'
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Database ...
... Path '/mnt/storage/data/edna/refdb/restez/invertebrate/restez/sql_db'
... Does path exist? 'Yes'
... Total size 404G
... Does the database have data? 'Yes'
... Number of sequences 1441629
... Min. sequence length 1
... Max. sequence length 1e+06
... Last_updated '2024-04-23 05:58:35'
INFO [2024-04-23 12:19:49] db_name: other_vertebrate
INFO [2024-04-23 12:19:49] db_ready: TRUE
Checking setup status at ...
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Restez path ...
... Path '/mnt/storage/data/edna/refdb/restez/other_vertebrate/restez'
... Does path exist? 'Yes'
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Download ...
... Path '/mnt/storage/data/edna/refdb/restez/other_vertebrate/restez/downloads'
... Does path exist? 'Yes'
... N. files 510
... Total size 62.1G
... GenBank division selections 'Other vertebrate'
... GenBank Release 259
... Last updated '2024-04-08 12:10:53'
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Database ...
... Path '/mnt/storage/data/edna/refdb/restez/other_vertebrate/restez/sql_db'
... Does path exist? 'Yes'
... Total size 50.7G
... Does the database have data? 'Yes'
... Number of sequences 832069
... Min. sequence length 1
... Max. sequence length 1e+06
... Last_updated '2024-04-12 00:00:57'
INFO [2024-04-23 12:19:50] db_name: plant_with_fungi_algae
INFO [2024-04-23 12:19:50] db_ready: TRUE
Checking setup status at ...
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Restez path ...
... Path '/mnt/storage/data/edna/refdb/restez/plant_with_fungi_algae/restez'
... Does path exist? 'Yes'
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Download ...
... Path '/mnt/storage/data/edna/refdb/restez/plant_with_fungi_algae/restez/downloads'
... Does path exist? 'Yes'
... N. files 1714
... Total size 337G
... GenBank division selections 'Plant (including fungi and algae)'
... GenBank Release 259
... Last updated '2024-04-08 11:57:06'
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Database ...
... Path '/mnt/storage/data/edna/refdb/restez/plant_with_fungi_algae/restez/sql_db'
... Does path exist? 'Yes'
... Total size 118G
... Does the database have data? 'Yes'
... Number of sequences 1345347
... Min. sequence length 1
... Max. sequence length 1e+06
... Last_updated '2024-04-15 02:54:05'
INFO [2024-04-23 12:19:50] done!
—
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI3FEDPIMQ757GEALHGPFW3Y62DENAVCNFSM6AAAAABE73XBP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZSHA3DCMJVGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I shared the local NCBI database search idea with Julia Brown at Bigelow. She thinks that we may be able to leverage the blastdbcmd to replace some of the entrez functionality we use now.
@egreyavis Perhaps we can set up a time to walk through the examples with an eye toward building some R wrappers.
The text was updated successfully, but these errors were encountered: