-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blast Databse to JSON/Solr index #23
Comments
It seems to be a convention, that concatenate the db|id with |. Perhaps we can use a regexp to parse out the metadata if needed. |
Hi Mike, Yes, it would be very helpful to have such a program. This was one of the "reach" goals for the hackathon and not that difficult to do… Best, From: Mike Panciera [mailto:[email protected]] We can imagine that someone (say, me) wants to dump their existing blast database (say nr/nt) into something Seqr-compatible. blastdbcmd can dump FASTA entries like so:
MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDKDNDGKITIKELAGDIDFDKALKEY KEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQK @lianyihttps://github.com/lianyi Do you have anything for this? — |
One can specify the output format of I will try this and see how it works out |
Let me know if y'all want to see interface with tom madden, head of blast. Cheers! Ben One can specify the output format of blastbdcmd so maybe the thing to do is I will try this and see how it works out — |
I discovered that using the blastdbcmd -db databases/ncbi/blast/nr/nr -entry all -outfmt "%s,%a,%g,%o,%i,%t,%l,%h,%T,%X,%e,%L,%C,%S,%N,%B,%K,%P" -target_only Takes a (prohibitively?) long time to run (and can't be parallelized simply, as far as I know). |
Mike, It could be that some of the fields you are requesting are slow, but some are fast. blast stores data in multiple files. Best, From: Mike Panciera [mailto:[email protected]] I discovered that using the blastdbcmd with outfmt options, i.e. blastdbcmd -db databases/ncbi/blast/nr/nr -entry all -outfmt "%s,%a,%g,%o,%i,%t,%l,%h,%T,%X,%e,%L,%C,%S,%N,%B,%K,%P" -target_only Takes a (prohibitively?) long time to run (and can't be parallelized simply, as far as I know). — |
We can imagine that someone (say, me) wants to dump their existing blast database (say nr/nt) into something Seqr-compatible.
blastdbcmd can dump FASTA entries like so:
We have an index command but it doesn't know about the metadata between the
|
.@lianyi Do you have anything for this?
The text was updated successfully, but these errors were encountered: