Skip to content

sherlyn99/micov

 
 

Repository files navigation

micov: aggregate MIcrobiome COVerage

Micov is a program to compute coverage over many genomes and many samples. These coverages can be used to filter genomes or contigs which are not well represented in a dataset, or to examine differential coverage in your dataset.

Design

The primary input mapping structure for micov is SAM/BAM or BED (3-column). Coverage data can be aggregated into Qiita-like coverage.tgz files. Per-sample coverages can be then be harvested from multiple coverage.tgz files.

Why coverage.tgz files? Qiita provides a rich set of already computed coverage data in a BED3 compatible format. Rather than invent yet-another-format, we opted to establish functionality on what is readily available from that resource.

Installation

We currently recommend creating a separate conda environment, and installing into that

$ conda create -n micov -c conda-forge polars matplotlib scipy click tqdm numba duckdb pyarrow
$ pip install micov

Examples

Compressing covered regions, and computing the per-genome coverage, from existing Qiita files:

$ micov qiita-coverage \
    --lengths genome-lengths-in-reference.map \
    --output coverage-example \
    --qiita-coverages /qmounts/qiita_data/BIOM/191463/coverages.tgz \
    --qiita-coverages /qmounts/qiita_data/BIOM/191556/coverages.tgz \
    --qiita-coverages /qmounts/qiita_data/BIOM/191575/coverages.tgz \
    --qiita-coverages /qmounts/qiita_data/BIOM/191879/coverages.tgz \
    --qiita-coverages /qmounts/qiita_data/BIOM/191926/coverages.tgz \
    --qiita-coverages /qmounts/qiita_data/BIOM/191613/coverages.tgz \
    --qiita-coverages /qmounts/qiita_data/BIOM/192511/coverages.tgz \
    --samples-to-keep metadata-with-samples-of-interest.tsv

The above command can be constrained to particular features as well.

If instead, the desire is to produce non-cumulative, cumulative and coverage maps, the command is slightly restructured. This command as well can be limited to specific features.

$ micov per-sample-group \
    --qiita-coverages /qmounts/qiita_data/BIOM/191463/coverages.tgz \
    --qiita-coverages /qmounts/qiita_data/BIOM/191556/coverages.tgz \
    --qiita-coverages /qmounts/qiita_data/BIOM/191575/coverages.tgz \
    --qiita-coverages /qmounts/qiita_data/BIOM/191879/coverages.tgz \
    --lengths genome-lengths-in-reference.map \
    --sample-metadata metadata-with-samples-of-interest.tsv \
    --sample-metadata-column cool_categorical_variable \
    --output plots-example 

Exising .SAM/.BAM can be compressed into a BED-like format by file or pipe. A pipe example is shown below:

$ xzcat some_data.sam.xz | micov compress | compressed.tsv

Compressed BED-like representations can be aggregated into Qiita-like coverage files as well:

$ micov consolidate \
    --lengths genome-lengths.tsv \
    --paths a-file-with-a-list-of-paths \
    --output consolidated.tgz

About

Aggregate genome coverage

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Makefile 0.1%