docs: convert to reST, re-organize and update contents across all pages #894

victorlin · 2022-03-22T21:25:07Z

Preview

Convert pages to reST
Rewrite tutorial
Update contents of existing pages to reflect current workflow
Re-organize content

Related issues

Post-merge tasks

sync redirects.yml using readthedocs-cli

docs/src/tutorial/example-data.rst

docs/src/tutorial/index.rst

docs/src/tutorial/example-data.rst

docs/src/tutorial/custom-data.rst

docs/src/tutorial/example-data.rst

docs/src/tutorial/custom-data.rst

docs/src/tutorial/example-data.rst

docs/src/tutorial/custom-data.rst

docs/src/tutorial/genomic-surveillance.rst

huddlej · 2022-03-23T00:00:40Z

docs/src/tutorial/genomic-surveillance.rst

+
+.. code:: text
+
+   nextstrain build . --cores 4 --configfile ncov-tutorial/genomic-surveillance.yaml


This command had a subtle issue where the combined metadata (results/combined_metadata.tsv.xz) and sequences (results/combined_sequences_for_subsampling.fasta.xz) already existed from the "custom data" run, so the workflow didn't re-run the steps to combine these files and include the newly defined inputs. As a result the workflow crashed when trying to run an augur filter query that referred to a nonexistent background_data column.

To fix the issue, I had to tell the workflow to rebuild everything with the --forceall flag:

nextstrain build . --forceall --configfile ncov-tutorial/genomic-surveillance.yaml

This seems like a Snakemake bug or a bug in our workflow; the new "background_data" entry and updated contents of the "custom_data" files should trigger a rebuild of files that depend on them.

I have to sign off for the day, but I will follow up with this issue by looking into potential differences between Snakemake versions. The Docker image still uses a relatively old Snakemake, so it's possible this is a Snakemake bug that has been fixed by a later release.

Still looking into the DAG issue with Snakemake, but noting that this build ran in 42 min on 3 CPUs with the config in ncov-tutorial@cb2f69fe413a6b979687b52b897f6fdd2a8c4da9.

Ran updated config in 27 min on an M1 Mac using 8 cores (native runtime).

@huddlej I am running this using Docker runtime on my computer (very slowly due to nextstrain/docker-base#35), but I am at the refine step and can see that this already ran successfully:

python3 scripts/combine_metadata.py --metadata results/sanitized_metadata_reference_data.tsv.xz results/sanitized_metadata_custom_data.tsv.xz results/sanitized_metadata_background_data.tsv.xz --origins reference_data custom_data background_data --output results/combined_metadata.tsv.xz 2>&1 | tee logs/combine_input_metadata.txt

So, I don't think I am able to reproduce this 😕

docs/src/tutorial/custom-data.rst

docs/src/tutorial/example-data.rst

jameshadfield

Nice work @victorlin! I've read through and tested the first tutorial. Will try to cover the rest tomorrow.

docs/src/tutorial/example-data.rst

jameshadfield

Awesome @victorlin! Thanks for taking the lead on this. I think the three tutorials are great. In addition to the in-line comments, I have bigger picture comments here about removing / streamlining the other pages. Perhaps this is a separate PR?

remove my_profiles
The three tutorials plus the "preparing your data" page cover pretty much all the contents that ./my_profiles did. How do we feel about

removing ./reference/multiple_inputs and corresponding ./data/ files.
shift any useful config YAMLs into the ncov-tutorial page as examples. Perhaps this needs a corresponding page in the "reference material" along the lines of "more complicated configuration file examples"

reorder the pages in "reference material"

Orientation / overview pages should come first

docs/src/tutorial/custom-data.rst

docs/src/tutorial/example-data.rst

docs/src/tutorial/custom-data.rst

docs/src/tutorial/genomic-surveillance.rst

docs/src/tutorial/example-data.rst

docs/src/tutorial/genomic-surveillance.rst

docs/src/tutorial/running.md

docs/src/tutorial/custom-data.rst

Content to be revised later.

pandoc -f markdown -t rst --wrap=none docs/src/reference/customizing-analysis.md -o docs/src/reference/customizing-analysis.rst

pandoc -f markdown -t rst --wrap=none docs/src/reference/orientation-files.md -o docs/src/reference/orientation-files.rst

pandoc -f markdown -t rst --wrap=none docs/src/guides/data-prep.md -o docs/src/guides/data-prep.rst

pandoc -f markdown -t rst --wrap=none docs/src/reference/customizing-visualization.md -o docs/src/reference/customizing-visualization.rst

pandoc -f markdown -t rst --wrap=none docs/src/reference/orientation-workflow.md -o docs/src/reference/orientation-workflow.rst

pandoc -f markdown -t rst --wrap=none docs/src/visualization/sharing.md -o docs/src/visualization/sharing.rst

pandoc -f markdown -t rst --wrap=none docs/src/tutorial/running.md -o docs/src/tutorial/running.rst

pandoc -f markdown -t rst --wrap=none docs/src/reference/metadata-fields.md -o docs/src/reference/metadata-fields.rst

pandoc -f markdown -t rst --wrap=none docs/src/reference/data_submitter_faq.md -o docs/src/reference/data_submitter_faq.rst rm docs/src/reference/data_submitter_faq.md

pandoc -f markdown -t rst --wrap=none docs/src/reference/naming_clades.md -o docs/src/reference/naming_clades.rst rm docs/src/reference/naming_clades.md

pandoc -f markdown -t rst --wrap=none docs/src/reference/remote_inputs.md -o docs/src/reference/remote_inputs.rst rm docs/src/reference/remote_inputs.md

pandoc -f markdown -t rst --wrap=none docs/src/visualization/interpretation.md -o docs/src/visualization/interpretation.rst rm docs/src/visualization/interpretation.md

pandoc -f markdown -t rst --wrap=none docs/src/visualization/narratives.md -o docs/src/visualization/narratives.rst rm docs/src/visualization/narratives.md

- Move existing files to https://github.com/nextstrain/ncov-tutorial/tree/main/examples - nextstrain/ncov-tutorial@9fa64b8 - Add placeholder README pointing readers to new guide (page will exist once tutorial PR changes merged)

To be referenced by tutorial.

- Add the new tutorial pages along with supporting images. - Update the reference to demo videos

- Organizational changes: - Expose pages in main sidebar (6e35cdc) - Move pages to guides: - "Update the workflow" section from tutorial/setup -> guides/update-workflow - reference/customizing-analysis -> guides/workflow-config-file - reference/customizing-visualization -> guides/customizing-visualization - reference/data-prep -> guides/data-prep - Split "Data Prep" into 3 pages - Add reference/glossary - Rename reference files: - configuration -> workflow-config-file - orientation-files -> files - orientation-workflow -> nextstrain-overview - tutorial/running -> troubleshoot - Remove files: - reference/multiple_inputs - Changes across multiple files: - Fix MD->reST conversion glitches - Reference "builds.yaml" as "workflow config file" - Remove my_profiles/ references - Reference glossary terms where appropriate - Use sphinx reference directive [1] to link to specific sections - Per-file changes: - tutorial/setup - Remove basic example in setup page (replaced by the "example data" tutorial) - reference/gisaid-search - Remove off-topic line - reference/nextstrain-overview - Capitalize Augur, Auspice, Snakemake, Nextflow - Describe build vs. workflow - reference/files - Re-organize page with "user files" vs. "internal files" - reference/troubleshoot - Formerly tutorial/running, it has been stripped down to just troubleshooting content - dev_docs - Link to docs for installation/setup [1]: https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#cross-referencing-arbitrary-locations

victorlin · 2022-05-02T17:11:37Z

Merging with @jameshadfield's approval on Slack.

victorlin self-assigned this Mar 22, 2022