Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow overriding of CSR accumulators with an IndexLike #181

Merged
merged 1 commit into from
Jan 8, 2024

Conversation

thetorpedodog
Copy link
Contributor

To support a custom indexer in TileDB-SOMA, this introduces a basic abstraction that is used to build the Indexer used by the CSR Accumulator. This can then be easily overridden by an implementation's custom subclass to swap that out without having to duplicate substantial parts of the code.

The naming is, admittedly, not ideal; this is in part due to the naming of the things we're trying to abstract over (the Pandas Index type which has the get_indexer method).


This is the somacore counterpart of single-cell-data/TileDB-SOMA#1728. It is entirely compatible with current code, though; once we’re positive that it fufills the needs of the tiledb-soma reindexer, we can merge it and release it and tiledbsoma installations will continue to work fine.

Copy link
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a code read, and this looks great to me. Thanks for cleaning up some of the crufty code!

@beroy
Copy link
Collaborator

beroy commented Nov 16, 2023

Plz fix the formattings

@thetorpedodog thetorpedodog force-pushed the build-index-abstraction branch 2 times, most recently from 34306ab to 1d71cc1 Compare December 4, 2023 17:59
@johnkerl johnkerl changed the title Allow overriding of CSR accumulators with an IndexLike. Allow overriding of CSR accumulators with an IndexLike Dec 5, 2023
@@ -26,7 +26,7 @@ def get_indexer(
"""Something compatible with Pandas' Index.get_indexer method."""


IndexFactory = Callable[[npt.NDArray[np.int64]], "IndexLike"]
IndexFactory = Callable[[npt.NDArray[np.int64], Optional[Any]], "IndexLike"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we have this take the NDArray as the only parameter is so that we can use the function pandas.Index as an index factory. With this added, pandas.Index no longer works, which is why the format check is now failing.

@beroy
Copy link
Collaborator

beroy commented Dec 15, 2023

@thetorpedodog can you plz merge this as my C++ reindexer is dependent on this?

@thetorpedodog
Copy link
Contributor Author

I would like to wait until we get approval on the TileDB-SOMA side of the house. I am pretty confident that this is going to be it, but I don’t want to cut a release prematurely. It will be very quick—after merging this, we can immediately make the tag and everything will be ready in a matter of minutes. (There will be no rush to release TileDB-SOMA; this is fully compatible with existing TileDB-SOMA.)

To support a custom indexer in TileDB-SOMA, this introduces a basic
abstraction that is used to build the Indexer used by the
CSR Accumulator.  This can then be easily overridden by an
implementation's custom subclass to swap that out without having to
duplicate substantial parts of the code.

The naming is, admittedly, not ideal; this is in part due to the naming
of the things we're trying to abstract over (the Pandas `Index` type
which has the `get_indexer` method).
@thetorpedodog
Copy link
Contributor Author

We’re good to go!

@thetorpedodog thetorpedodog merged commit a79f984 into main Jan 8, 2024
6 checks passed
@thetorpedodog thetorpedodog deleted the build-index-abstraction branch January 8, 2024 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants