Skip to content

Commit

Permalink
uncurated-ncib-dataset: Separate commands to give explanation
Browse files Browse the repository at this point in the history
Explain what each step of the example command is doing to give readers
a better understanding.
  • Loading branch information
joverlee521 committed Mar 22, 2024
1 parent 59527e5 commit 79a0176
Showing 1 changed file with 28 additions and 2 deletions.
30 changes: 28 additions & 2 deletions src/snippets/uncurated-ncbi-dataset.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,34 @@
1. Enter an interactive Nextstrain shell to be able to run the NCBI Datasets CLI commands without installing them separately.

.. code-block::
$ nextstrain shell .
$ datasets download virus genome taxon <taxon-id> --filename ingest/data/ncbi_dataset.zip
$ dataformat tsv virus-genome --package ingest/data/ncbi_dataset.zip > ingest/data/raw_metadata.tsv
2. Create the ``ingest/data`` directory if it doesn't already exist.

.. code-block::
$ mkdir -p ingest/data
3. Download the dataset with the pathogen NCBI taxonomy ID.

.. code-block::
$ datasets download virus genome taxon <taxon-id> \
--filename ingest/data/ncbi_dataset.zip
4. Extract and format the metadata as a TSV file for easy inspection

.. code-block::
$ dataformat tsv virus-genome \
--package ingest/data/ncbi_dataset.zip \
> ingest/data/raw_metadata.tsv
5. Exit the Nextstrain shell to return to your usual shell environment.

.. code-block::
$ exit
The produced ``ingest/data/raw_metadata.tsv`` will contain all of the fields available from NCBI Datasets.
Expand Down

0 comments on commit 79a0176

Please sign in to comment.