Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Base Count Coverage Depth #7

Merged
merged 72 commits into from
Sep 12, 2024
Merged

Implementing Base Count Coverage Depth #7

merged 72 commits into from
Sep 12, 2024

Conversation

gordonkoehn
Copy link
Collaborator

@gordonkoehn gordonkoehn commented Sep 6, 2024

This Pull Request (PR) aims to integrate quality control scripts from @AugusteRi in this new package usefulGnom.

Aplogies, this PR is messy and includes some unrelated project setup.

All three scripts can now be run as snakemake rules flexible to location, enddate of samples:

  1. calculating basecnt coverage depth,
  2. total coverage depth and
  3. computing frequency matrix+calculating mutations statistics

I`ve verified the outputs against the original scripts on Euler.

For the full analysis for Zürich with the last samples from the 07_03 the workflows directory one can now simply run:

snakemake -c 2 mutation_statistics_Zürich_2024_07_03

or for individual output files.

snakemake -c 2         ${OUTDIR}Zürich (ZH)/lineplotC23039G_G22599C_Zürich (ZH)_2024-07-03.pdf"
snakemake -c 2         ${OUTDIR}Zürich (ZH)/heatmapC23039G_G22599C_Zürich (ZH)_2024-07-03.pdf"

To configure this script, you need to edit the paths in workflows/base_coverage. stick for your personal Euler setup and the location of data files.


Open Questions:

@AugusteRi I hope I integrated the script as it's intended to be run. I stuck as closely to your setup as I understood it, this leaves ad-hoc filters you had in your code not integrated as of now.

  • Do we need to integrate protocoll as another filter?
  • Do we need to take care of the startdate ?

More generally,

  • Are these typical .tsv formats and datafiles ? I've assumed so, and integrated the reading and parsing into usefulGnom for future scripts.

@gordonkoehn gordonkoehn self-assigned this Sep 6, 2024
@gordonkoehn
Copy link
Collaborator Author

Yes, Script 1: basecnt_coverage runs as a snakemake rule basecnt_coverage_depth

@gordonkoehn
Copy link
Collaborator Author

Second script added no errors

@gordonkoehn
Copy link
Collaborator Author

NB: these scripts are mostly reading/ parsing/ matching - not much core logic to write tests for easily. Hence there are no tests.

@gordonkoehn gordonkoehn added the enhancement New feature or request label Sep 11, 2024
@gordonkoehn gordonkoehn merged commit a4a2a7b into main Sep 12, 2024
1 check passed
Copy link
Collaborator Author

@gordonkoehn gordonkoehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor Open Questions to @AugusteRi

Comment on lines -48 to -49
# filtering condition to take only Artic v4.1 protocol:
# (timeline_file["proto"] == "v41")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AugusteRi Do we need to integrate the protocol as another filter? Is this an argument that should be crucial to pass down? You didn't integrate it in the file naming before, so I excluded it for now.


selected_rows = timeline_file[
# select the rows with date from 2022-07 to 2023-03 (according to samples.wastewateronly.ready.tsv)
(timeline_file["date"] > "2024-01-01") &
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AugusteRi Do you typically change the start date? Is it worth adding this as another parameter to the final command? I assume the start date is rarely changed, as you did before, so I also excluded it from the final command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant