Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ncov/ as a standalone directory #921

Closed
wants to merge 5 commits into from

Conversation

victorlin
Copy link
Member

Description of proposed changes

Expanding some thoughts from #894 (comment) and Slack thread.

This is intended to be more of a discussion, but created as a draft PR to have some tangible changes.

Current usage of ncov/:

ncov/
├─ data/
│  ├─ metadata.tsv
│  ├─ sequences.fasta
├─ my_profiles/
│  ├─ analysis1/
│  │  ├─ config.yaml
│  │  ├─ builds.yaml
│  │  ├─ auspice_config.json
│  ├─ analysis2/
│  │  ├─ config.yaml
│  │  ├─ builds.yaml
│  │  ├─ auspice_config.json
├─ auspice/
│  ├─ global_dataset.json
│  ├─ country_dataset.json
├─ <internal ncov files>
├─ <other Snakemake intermediate files>
/
  1. git clone https://github.com/nextstrain/ncov
  2. Customize workflow:
    1. Add custom data in ncov/data/
    2. Define a Snakemake profile:
      1. ncov/my_profiles/<profile>/builds.yaml
      2. ncov/my_profiles/<profile>/config.yaml
      3. ncov/my_profiles/<your_profile>/auspice_config.json
      4. Other defaults overrides
  3. Run snakemake/nextstrain build from ncov/ directory.
    • This produces several directories:
      • ncov/.snakemake/
      • ncov/auspice/
      • ncov/benchmarks/
      • ncov/logs/
      • ncov/results/
  4. View ncov/auspice/ with auspice/nextstrain view.
  5. Update/set workflow version with git pull/git switch.

Cons:

  1. Using Snakemake profiles is an extra complication and we plan to get rid of my_profiles/ in docs: convert to reST, re-organize and update contents across all pages #894.
  2. File paths in ncov/my_profiles/<profile>/builds.yaml must be relative to ncov/.
  3. This places user files within ncov/, a git version-controlled directory. .gitignore helps, but having users work directly in ncov/ can lead to potential merge problems during the update step.

An alternative user story:

my-ncov-analyses/
├─ data/
│  ├─ metadata.tsv
│  ├─ sequences.fasta
├─ config/
│  │  ├─ analysis1.yaml
│  │  ├─ analysis2.yaml
│  │  ├─ auspice_config.json
│  │  ├─ <other config files>
├─ ncov/
│  ├─ auspice/
│  │  ├─ global_dataset.json
│  │  ├─ country_dataset.json
│  ├─ <internal ncov files>
│  ├─ <other Snakemake intermediate files>
/
  1. Create a personal repo from the ncov-tutorial template repo, for example named my-ncov-analyses.
  2. git clone https://github.com/user/my-ncov-analyses
  3. cd my-ncov-analyses
  4. git clone https://github.com/nextstrain/ncov
  5. Customize workflow:
    1. Add custom data in my-ncov-analyses/data/.
    2. Add a custom workflow config file anywhere within my-ncov-analyses/.
    3. Add defaults overrides files anywhere within my-ncov-analyses/.
  6. Run snakemake/nextstrain build from my-ncov-analyses/ directory.
  7. View my-ncov-analyses/ncov/auspice/ with auspice/nextstrain view.

Pros:

  1. Working directory is my-ncov-analyses/.
    • File paths in the workflow config file (builds.yaml) can be relative to my-ncov-analyses/.
  2. This uses ncov/ as a standalone directory, similar to an executable.
    • Compare to current usage, where scripts and other workflow internals are exposed to users.
  3. Enables easier sharing of customization files across workflow config files (e.g. use the same auspice_config.json in both analysis1.yaml and analysis2.yaml)

Cons:

  1. Requires some workflow changes to get this working (see notes in next section).
  2. No strict requirements/guidelines for organizing config files. Maybe Snakemake profiles is a good way to organize these files?
  3. Requires the following in my-ncov-analyses/:
    1. Ignore ncov/ from version control: nextstrain/ncov-tutorial@dfc2a6e
    2. Add a modular Snakefile pointing to ncov/Snakefile: nextstrain/ncov-tutorial@6125006

Notes while trying this implementation:

  1. There is a base workflow config file defaults/parameters.yaml which has defaults/ paths relative to ncov/.
  2. Using workdir would scope everything as relative to ncov/, including paths intended to be relative to my-ncov-analyses/. This defeats the purpose of having ncov/ as a subdirectory. workflow.source_path() accomplishes the same thing but can be applied to specific paths only.
    • Example using workflow.source_path() to properly scope all files: config paths starting with defaults/: f4ffc85
    • Scripts in ncov/scripts/ also must be properly scoped (this is where I got a bit stuck): 4709e56

Related issue(s)

Testing

Release checklist

If this pull request introduces backward incompatible changes, complete the following steps for a new release of the workflow:

  • Determine the version number for the new release by incrementing the most recent release (e.g., "v2" from "v1").
  • Update docs/src/reference/change_log.md in this pull request to document these changes and the new version number.
  • After merging, create a new GitHub release with the new version number as the tag and release title.

If this pull request introduces new features, complete the following steps:

  • Update docs/src/reference/change_log.md in this pull request to document these changes by the date they were added.

This reverts commit 65ffdc763e05a019d1aa60d764d5f8a47fa4e9de.
TODO: fix import:

    from .utils import stream_tar_file_contents
ImportError: attempted relative import with no known parent package
@victorlin
Copy link
Member Author

This was an experimental direction. Not pursuing at the moment due to the complexity of changes.

The updated tutorials still use ncov/ as the working directory.

@victorlin victorlin closed this Dec 28, 2022
@victorlin victorlin deleted the victorlin/standalone-directory branch December 28, 2022 21:41
@victorlin
Copy link
Member Author

Update: this is being pursued elsewhere with a better direction (analysis directory doesn't contain pathogen repo which is what I was doing here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

1 participant