-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
By having all phylogenetic workflows start from two lists of inputs (`config.inputs`, `config.additional_inputs`) we enable a broad range of uses with a consistent interface. 1. Using local ingest files is trivial (see added docs) and doesn't need a bunch of special-cased logic that is prone to falling out of date (as it had indeed done) 2. Adding extra / private data follows the similar pattern, with an additional config list being used so that we are explicit that the new data is additional and enforce an ordering which is needed for predictable `augur merge` behaviour. The canonical data can be removed / replaced via step (1) if needed. I considered adding additional data after the subtype-filtering step, which would avoid the need to add subtype in the metadata but requires encoding this in the config overlay. I felt the chosen way was simpler and more powerful. When considering sequences the structure is more complex than metadata because the influenza genome is segmented and we wish to allow users to provide additional data for only some segments (see docstring for `_parse_config_input`). For non-segmented pathogens the simpler structure used here for metadata could also be used for sequences. This workflow uses an old version of the CI workflow, <https://github.com/nextstrain/.github/blob/v0/.github/workflows/pathogen-repo-ci.yaml#L233-L240> which copies `example_data`. We could upgrade to the latest version and use a config overlay to swap out the canonical inputs with the example data. Note that one of the side effects of the current implementation is that merged inputs use the same filepath irrespective of the workflow. For instance, both gisaid & h5n1-cattle-outbreak use the intermediate path `results/metadata_merged.tsv`, which means it's not possible to maintain runs of both those analysis concurrently if both were to use merged inputs. Using separate analysis directories, e.g. <#103> will help avoid this shortcoming.
- Loading branch information
1 parent
f348382
commit 51888c9
Showing
4 changed files
with
212 additions
and
62 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters