-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How to correctly use ILR with balance tree basis? #262
Comments
@jolespin you are asking all of the right questions -- getting the ids aligned is probably the most annoying part wrt using the ilr transform The most important thing to realize is that the ilr transform will transform proportions at the tips of the tree into log-ratios in the ancestral nodes of the tree. There are two ways you can link back the balances to the OTUs (assuming that all of your internal nodes are labeled - everything breaks if you have None labels in your internal nodes).
from gneiss.util import NUMERATOR, DENOMINATOR
num_otus = t_skbio.find('y1').children[NUMERATOR].subset()
denom_otus = t_skbio.find('y1').children[DENOMINATOR].subset() Not great, some more thought needs to done to figure out how to clean up the querying process So back to your original questions
|
@mortonjt Thanks a lot for getting back to me so quickly with such a detailed response! I looked into it and your explanation plus links were extremely helpful. I actually understand what is happening after looking at the One thing I wanted to point out was the I am close to understanding the from gneiss.util import NUMERATOR, DENOMINATOR
num_otus = t_skbio.find('y1').children[NUMERATOR].subset()
denom_otus = t_skbio.find('y1').children[DENOMINATOR].subset()
len(num_otus), len(denom_otus)
# (882, 0) from gneiss.balances import balance_basis
from gneiss.composition import ilr_transform
# Otu table
df_otus = df_counts_f1.copy() # filter Otu counts where n=64 samples, m=883 otus
# Path to the newick file generated from 16S sequences in clustalo
path_to_newick_from_clustalo = "./Data/otu_clustering/clustalo_output/otus.qfilter.EE0.15.guidetree"
# Convert ete3 tree
t = ete3.Tree(newick=path_to_newick_from_clustalo)
# Prune tree
t.prune(df_otus.columns)
# Get newick string for pruned tree
def ete3_to_skbio(tree):
intermediate_node_index = 1
for node in tree.traverse():
if not node.is_leaf():
node.name = f"y{intermediate_node_index}"
intermediate_node_index += 1
return skbio.TreeNode.read(StringIO(tree.write(format=1, format_root_node=True)))
t_skbio = ete3_to_skbio(t)
# Gneiss
basis, nodes = balance_basis(t_skbio)
# ILR
df_ilr = ilr_transform(df_otus+1, t_skbio) |
Its possible that the denominator is a tip, which will return an empty set when using
Definitely not a very friendly way to interpret your balances - an easier way would be to use the |
Ah man I didn't even think about that edge case. Thanks for catching that. I've looked into the tutorial and it's helpful to see the applications. Right now I'm using Is there any source code you can direct me to on Also, I created a helper script that evades the edgecase:
|
@jolespin couple comments here
|
This was originally asked here
scikit-bio/scikit-bio#1582
There are two details that confuse me regarding using ilr and a custom basis argument.
I understand that gneiss is used for the balance_basis and that m-1 attributes/coumns are returned by ilr which are supposed to represent the balances of the nodes in the tree.
**My first question, how can I know which column goes to which node in my dendrogram?
My second question, how does the ilr function know the labels/order of the nodes in the tree that was provided and does my order for the mat input matter at all?**
Lastly, (bonus) is the implementation below equivalent to PhILR?
My df_otus is a pd.DataFrame that I filtered. The newick tree was created by clustalo when I did a multiple sequence alignment of the Otu centroid sequences from uparse.
The text was updated successfully, but these errors were encountered: