Support for microbiome-specific data containers #3

antagomir · 2022-07-21T20:25:27Z

Supporting microbiome-specific data containers in R; in particular phyloseq (more widely used) and TreeSummarizedExperiment (more recent). This would support standardized application as part of common microbiome workflows.

We could potentially provide a PR if this finds support.

darcyj · 2022-07-23T17:38:45Z

Great ideas. I'm not too familiar with TreeSummarizedExperiment; does it have methods to return phylo and matrix/data.frame objects? If so, it would be very easy to include as an optional dependency.

Before a PR, a good first step would be to have a simple demonstration script, i.e. take one of those objects, extract the necessary inputs for specificity, and then run phy_or_env_spec(). Then those conversions could be made automatic within this package, perhaps through a wrapper function like "check_convert_matrix" or something (that name stinks).

antagomir · 2022-07-24T08:26:29Z

TreeSummarizedExperiment (Huang et al. 2021) inherits SummarizedExperiment (i.e TreeSE is an SE).

TreeSummarizedExperiment supports analogous slots than phyloseq, plus some others. It has some extended capabilities in terms of multi-assay data analysis, and differences in technicalities like memory handling. In many cases conversions between these formats are straightforward, the conversion functions are available in the mia R package. phyloseq can be always converted into TreeSummarizedExperiment, the other way it is possible for those parts that are support by phyloseq.

TreeSummarizedExperiment supports tree information for both samples and features (taxa), and this is typically in phylo format. Yes there are functions to pull out the row/column trees and numeric abundance matrices, and sample/feature side information as data.frame.

We could have a look at an example script.

Disclaimer: we are developing an ecosystem around this container, the beta version of the tutorial is online.

antagomir · 2022-07-24T09:44:48Z

A simple example with just discrete groups is here:

library(mia)
library(specificity)

# Load example data
tse <- microbiomeDataSets::OKeefeDSData()

# Convert counts to relative abundances and add the new assay
tse <- transformSamples(tse, assay_name="counts", method="relabundance")
# Filter the prevalent taxa only
prevalent.taxa <- getPrevalentTaxa(tse, detection=0.1/100, prevalence=90/100, assay_name="relabundance")
# Get abundance matrix (relative abundance assay) from the data for the prevalent taxa
abundances <- assay(tse[prevalent.taxa, ], "relabundance")
# Get sample metadata / phenodata
phenodata <- colData(tse)

# create list to hold phy_or_env_spec outputs
specs_list <- list()

# Specificity on nationality
specs_list$nationality <- phy_or_env_spec(
    abunds_mat=t(abundances),
    env=as.numeric(phenodata$nationality),
    n_sim=100,
    p_method="gamma_fit",
    n_cores=4
)


# Specificity on BMI group
specs_list$bmi <- phy_or_env_spec(
    abunds_mat=t(abundances),
    env=as.numeric(phenodata$bmi_group),
    n_sim=100,
    p_method="gamma_fit",
    n_cores=4
)

# Specificity on sex
specs_list$sex <- phy_or_env_spec(
    abunds_mat=t(abundances),
    env=as.numeric(phenodata$sex),
    n_sim=100,
    p_method="gamma_fit",
    n_cores=4
)

plot_specs_violin(specs_list, cols=c("forestgreen", "red", "black"))

antagomir mentioned this issue Jul 24, 2022

CRAN / Bioconductor availability #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for microbiome-specific data containers #3

Support for microbiome-specific data containers #3

antagomir commented Jul 21, 2022

darcyj commented Jul 23, 2022

antagomir commented Jul 24, 2022

antagomir commented Jul 24, 2022 •

edited

Loading

Support for microbiome-specific data containers #3

Support for microbiome-specific data containers #3

Comments

antagomir commented Jul 21, 2022

darcyj commented Jul 23, 2022

antagomir commented Jul 24, 2022

antagomir commented Jul 24, 2022 • edited Loading

antagomir commented Jul 24, 2022 •

edited

Loading