Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hill diversity profile #535

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

MKanetscheider
Copy link
Collaborator

Added hill diversity profile and function to convert Hill numbers into other popular alpha indices (like simpson and shannon). The latter was requested by Francesca and allows to further manipulate a DataFrame to maximize the gained insights. Like for example calculating evenness indices etc...

Closes #...

  • CHANGELOG.md updated
  • Tests added (For bug fixes or new features)
  • Tutorial updated (if necessary)

@grst
Copy link
Collaborator

grst commented Oct 11, 2024

On question I have about Hill numbers if if and how they take into account different cell numbers per patient or other technical confounders. E.g. with shannon entropy, the entropy would simply increase with increasing cell count per group which is why by default we use normalized shannon entropy in scirpy.

I was wondering if something similar is necessary/possible when using the Hill curves.

@grst grst mentioned this pull request Oct 11, 2024
6 tasks
@MKanetscheider
Copy link
Collaborator Author

On question I have about Hill numbers if if and how they take into account different cell numbers per patient or other technical confounders. E.g. with shannon entropy, the entropy would simply increase with increasing cell count per group which is why by default we use normalized shannon entropy in scirpy.

I was wondering if something similar is necessary/possible when using the Hill curves.

I think I get what you are referring to, but I am not sure if I know the answer to your question. While I was implementing this I was inspired by cdiversity(https://github.com/AI-SysBio/cdiversity/tree/main) and I think they did it quite similar to the implementation you can see here. Furthermore, I think that many paper I read during my thesis did just apply the Hill number formula for multiple diversity orders and plotted the values as a curve (that is essentially what this function here does as well).

However, I had just a look at the vignette from Immcantation and they seem to consider "more" and have a more advanced way to calculate this (https://alakazam.readthedocs.io/en/stable/vignettes/Diversity-Vignette/). Maybe have a look at it and tell me if my implementation here is just straight wrong or maybe it "just" does some oversimplifications?

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@grst
Copy link
Collaborator

grst commented Nov 23, 2024

Your implementation is certainly not wrong. I do think however that it is important to consider the sequencing depth (or the number of cells per sample, in our case). I also think that it's more of an issue with scRNA-seq data than with bulk data because the number of cells can vary quite a lot between samples.

This function from Alakazam seems to address it and it doesn't look too complicated (but obviously more complicated than the current version): https://bitbucket.org/kleinstein/alakazam/src/f7986680439908fd8660dde25074923f34ea93cf/R/Diversity.R#lines-803:951

Would also be curious to hear @FFinotello's opinion on this as she has worked with the diversity indices before more than I have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

2 participants