Skip to content

Commit

Permalink
Merge pull request #238 from nextstrain/data-format-tsv
Browse files Browse the repository at this point in the history
Data format: TSV
  • Loading branch information
joverlee521 authored Dec 3, 2024
2 parents a15d2ab + a6d0a87 commit 38d42fd
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 0 deletions.
3 changes: 3 additions & 0 deletions src/fetch-docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@
if __name__ == '__main__':
# Use a Session for connection pooling
session = requests.Session()
session.headers.update({
"User-Agent": "https://github.com/nextstrain/docs.nextstrain.org ([email protected])",
})

class RemoteDoc:
def __init__(self, source_url, dest_path):
Expand Down
33 changes: 33 additions & 0 deletions src/reference/data-formats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,39 @@
Data formats
============

.. contents:: Table of Contents
:local:

TSV
===

Nextstrain strongly prefers using TSV files for metadata even though Augur commands support other delimiters as inputs.
If you are using other formats, we recommend using :doc:`augur curate passthru <augur:usage/cli/curate/passthru>` to convert them to TSV.

Nextstrain tools and workflows produce `RFC 4180 CSV-like TSVs <https://datatracker.ietf.org/doc/html/rfc4180>`__.

When using `csvtk <https://bioinf.shenwei.me/csvtk/>`__

* the ``--lazy`` (``-l``) option should not be necessary
* the ``fix-quotes``/``del-quotes`` commands should not be necessary

When using `tsv-utils <https://opensource.ebay.com/tsv-utils/>`__

* pass the inputs through ``csv2tsv --csv-delim $'\t'``
* pass the final ``tsv-util`` outputs through ``csvtk fix-quotes --tabs``

.. code-block:: bash
csv2tsv --csv-delim $'\t' metadata.tsv \
| tsv-select -H -f strain,date \
| tsv-uniq -H -f strain \
| csvtk fix-quotes --tabs > output.tsv
See our internal `discussion on TSV standardization <https://github.com/nextstrain/augur/issues/1566>`__ for more details.

JSON
====

Nextstrain uses a few different kinds of `JSON
<https://en.wikipedia.org/wiki/JSON>`__ files at various stages in a typical
build.
Expand Down

0 comments on commit 38d42fd

Please sign in to comment.