Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for microbiome-specific data containers #3

Open
antagomir opened this issue Jul 21, 2022 · 3 comments
Open

Support for microbiome-specific data containers #3

antagomir opened this issue Jul 21, 2022 · 3 comments

Comments

@antagomir
Copy link

Supporting microbiome-specific data containers in R; in particular phyloseq (more widely used) and TreeSummarizedExperiment (more recent). This would support standardized application as part of common microbiome workflows.

We could potentially provide a PR if this finds support.

@darcyj
Copy link
Owner

darcyj commented Jul 23, 2022

Great ideas. I'm not too familiar with TreeSummarizedExperiment; does it have methods to return phylo and matrix/data.frame objects? If so, it would be very easy to include as an optional dependency.

Before a PR, a good first step would be to have a simple demonstration script, i.e. take one of those objects, extract the necessary inputs for specificity, and then run phy_or_env_spec(). Then those conversions could be made automatic within this package, perhaps through a wrapper function like "check_convert_matrix" or something (that name stinks).

@antagomir
Copy link
Author

TreeSummarizedExperiment (Huang et al. 2021) inherits SummarizedExperiment (i.e TreeSE is an SE).

TreeSummarizedExperiment supports analogous slots than phyloseq, plus some others. It has some extended capabilities in terms of multi-assay data analysis, and differences in technicalities like memory handling. In many cases conversions between these formats are straightforward, the conversion functions are available in the mia R package. phyloseq can be always converted into TreeSummarizedExperiment, the other way it is possible for those parts that are support by phyloseq.

TreeSummarizedExperiment supports tree information for both samples and features (taxa), and this is typically in phylo format. Yes there are functions to pull out the row/column trees and numeric abundance matrices, and sample/feature side information as data.frame.

We could have a look at an example script.

Disclaimer: we are developing an ecosystem around this container, the beta version of the tutorial is online.

@antagomir
Copy link
Author

antagomir commented Jul 24, 2022

A simple example with just discrete groups is here:

library(mia)
library(specificity)

# Load example data
tse <- microbiomeDataSets::OKeefeDSData()

# Convert counts to relative abundances and add the new assay
tse <- transformSamples(tse, assay_name="counts", method="relabundance")
# Filter the prevalent taxa only
prevalent.taxa <- getPrevalentTaxa(tse, detection=0.1/100, prevalence=90/100, assay_name="relabundance")
# Get abundance matrix (relative abundance assay) from the data for the prevalent taxa
abundances <- assay(tse[prevalent.taxa, ], "relabundance")
# Get sample metadata / phenodata
phenodata <- colData(tse)

# create list to hold phy_or_env_spec outputs
specs_list <- list()

# Specificity on nationality
specs_list$nationality <- phy_or_env_spec(
    abunds_mat=t(abundances),
    env=as.numeric(phenodata$nationality),
    n_sim=100,
    p_method="gamma_fit",
    n_cores=4
)


# Specificity on BMI group
specs_list$bmi <- phy_or_env_spec(
    abunds_mat=t(abundances),
    env=as.numeric(phenodata$bmi_group),
    n_sim=100,
    p_method="gamma_fit",
    n_cores=4
)

# Specificity on sex
specs_list$sex <- phy_or_env_spec(
    abunds_mat=t(abundances),
    env=as.numeric(phenodata$sex),
    n_sim=100,
    p_method="gamma_fit",
    n_cores=4
)

plot_specs_violin(specs_list, cols=c("forestgreen", "red", "black"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants