-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass at making snakemake workflow for innovation model #11
Conversation
I've tried to convert compare_natural notebook to a Python script There's a couple of remaining tasks for this section:
|
I couldn't run Snakemake due to a couple bugs in the Snakefile 1. run_models.smk missing comma 2. Snakefile missing pandas import 3. Some confusion in analysis_periods of iterating over list vs dictionary. Should be solved by passing analysis_periods.keys() to expand However, I couldn't test workflow due to missing data/ files so I'm not 100% sure this fixes things.
Thanks so much for diving in here @marlinfiggins. I'm sorry that I didn't notice this PR before you pointed it out to me last month. I just tried working from
I think that you're assuming that local files like I just did almost exactly this over here: https://github.com/blab/fitness-dynamics?tab=readme-ov-file#provision-metadata-locally. I can continue review once I know how to provision local data. Also, separate question: can we just drop |
This copies over logic from https://github.com/blab/fitness-dynamics where the prepare_clade_data rule that calls scripts/prepare-data.py is based on a defined analysis window of min_date and max_date rather than defining included_days. This is significantly cleaner for performing historical analyses. Additionally, drop references to cases (and the requirement of inputting cases to prepare-data.py). We're not uses cases in the MLR analysis and they just add unused overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marlinfiggins ---
Please update README.md with examples of how to actually run this workflow. I'd like to be able to work through the readme to be understand how to interact with the workflow. See https://github.com/blab/fitness-flux for a pretty minimal example of what I mean.
@marlinfiggins: Thanks for providing instructions on how to run workflow. However, I'm running into errors. If I reduce
Update to the latest Docker image with
Are there dependencies that need updating for this to run? |
This can be resolved via blab/evofr#44. Suggest to update ncov-escape README.md with current installation instructions. This will be I've made this revision in the below commit. |
@marlinfiggins --- After the update to the Nextstrain docker image, I'm now longer getting the above jax error. However, if run attempt to run the repo in it's current state without touching
The config looks like it should just use what's in
Marlin, this doesn't have to be a correct / complete analysis, but I really would like the default analysis and what's specified in the readme to at least run. Can you take another look at this? Please try working from a fresh repo to make sure things at least run. Noting here that if I swap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking now is having the default config.yaml
/ instructions in the readme complete without error. See above comment.
…pike-predictors-phenos main branch
Sorry about this, I've made some changes. The bug you were running into was with running the regression prior model. It was attempting to fetch the phenotypes from the wrong files. The issue here was that I had to switch the code to use the old version of the predictors Previously, I just grabbed this by hand from the main branch and renamed the relevant columns to be consistent since I expected this PR to be merged by the time we tested this. For now, I've changed this and the relevant code to use the phenotypes from the main branch, but I'll just revert these changes once the lineage phenotypes PR is properly merged. |
I've also removed the window analysis for now until I fix the location of the |
I've also added the window analysis back and fixed the instructions for provisioning the metadata. |
@marlinfiggins --- Thanks so much for humoring me with the detailed instructions. I was able to run this workflow for different analysis periods and everything worked as expected. I like the set up for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks great at this point.
This PR implements a Snakemake workflow for provisioning sequence counts with similar methods used in forecasts-ncov. This is my first time working with Snakemake, so any suggestions, comments, or questions are appreciated.
Major changes include:
config.yaml
for specification here)workflow/snakemake_rules/prepare_data.smk
)scripts/run-innovation-model
)This to be accomplished still:
mlr-fitness/data/pango-relationships.nb
for correctnessNote: I've borrowed the files
scripts/prepare-data.py
andscripts/collapse-lineages.py
directly from forecasts-ncov. Let me know if there's a better way of doing this.