-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hill diversity profile #535
base: main
Are you sure you want to change the base?
Conversation
…o other popular alpha indices
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
On question I have about Hill numbers if if and how they take into account different cell numbers per patient or other technical confounders. E.g. with shannon entropy, the entropy would simply increase with increasing cell count per group which is why by default we use normalized shannon entropy in scirpy. I was wondering if something similar is necessary/possible when using the Hill curves. |
I think I get what you are referring to, but I am not sure if I know the answer to your question. While I was implementing this I was inspired by cdiversity(https://github.com/AI-SysBio/cdiversity/tree/main) and I think they did it quite similar to the implementation you can see here. Furthermore, I think that many paper I read during my thesis did just apply the Hill number formula for multiple diversity orders and plotted the values as a curve (that is essentially what this function here does as well). However, I had just a look at the vignette from Immcantation and they seem to consider "more" and have a more advanced way to calculate this (https://alakazam.readthedocs.io/en/stable/vignettes/Diversity-Vignette/). Maybe have a look at it and tell me if my implementation here is just straight wrong or maybe it "just" does some oversimplifications? |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Your implementation is certainly not wrong. I do think however that it is important to consider the sequencing depth (or the number of cells per sample, in our case). I also think that it's more of an issue with scRNA-seq data than with bulk data because the number of cells can vary quite a lot between samples. This function from Alakazam seems to address it and it doesn't look too complicated (but obviously more complicated than the current version): https://bitbucket.org/kleinstein/alakazam/src/f7986680439908fd8660dde25074923f34ea93cf/R/Diversity.R#lines-803:951 Would also be curious to hear @FFinotello's opinion on this as she has worked with the diversity indices before more than I have. |
Added hill diversity profile and function to convert Hill numbers into other popular alpha indices (like simpson and shannon). The latter was requested by Francesca and allows to further manipulate a DataFrame to maximize the gained insights. Like for example calculating evenness indices etc...
Closes #...