What is the status / plan for vectorized APIs? #341

wingkitlee0 · 2024-01-14T19:05:05Z

Currently, what is the recommend way to do this (with this package)?
For example, I have two numpy arrays of lat/lon and want to convert them into an array of h3 hex

wingkitlee0 · 2024-03-24T20:30:52Z

For reference:
master...wingkitlee0:h3-py:basic-vectorized

ajfriend · 2024-03-29T22:01:43Z

We tried previously having an "unstable" API to experiment with things like vector functions: #147

I'd be interested in bringing that back in the 4.0 branch, probably still as an unstable/experimental module to indicate the API may change in the future as we figure out what we want it to look like.

wingkitlee0 · 2024-03-30T15:28:17Z

@ajfriend Thanks for the info.

wingkitlee0 · 2024-05-19T02:10:54Z

@ajfriend I took a second look on this issue...

A few questions:

It seems that h3-py was designed to be "self-contained" without numpy. But I was thinking to have a set of APIs that accept numpy arrays. This may require numpy in cython which becomes a dependency (not 100% if we need numpy in cython). I am curious about this restriction.
There are a few backends: basic_str, basic_int, etc that shares the same API. Only a subset of them will be relevant for a "vectorized" version. I am thinking to put those vectorized functions in an experimental module (e.g., api.experimental.vectorized) which does not use the symlink-ed __init__.py. Does it make sense?
Do you have a sense of how much faster a vectorized version would be? For now, if I move the for-loop from python to C I get around 30% faster in integer representation.

Thanks

ajfriend · 2024-05-19T02:45:59Z

It seems that h3-py was designed to be "self-contained" without numpy. But I was thinking to have a set of APIs that accept numpy arrays. This may require numpy in cython which becomes a dependency (not 100% if we need numpy in cython). I am curious about this restriction.

I haven't been deep in the Cython code in a while, but if my memory serves, numpy shouldn't be a hard dependency because it is compatible with the https://docs.python.org/dev/library/stdtypes.html#memoryview interface, which is how we interface with Cython. See some of the discussion around memoryview objects in https://uber.github.io/h3-py/api_comparison.html . To that point, h3.api.memview_int is actually the base API that the other three are built on top of.

A memoryview object is easily converted into a numpy array (and can be done without copying the data; see the docs link above). Any function expecting a memoryview object can accept a numpy array without having to "know about" (or import) numpy arrays.

Because numpy is compatible with the memoryview interface, numpy isn't a dependency in terms of the library being able to accept numpy arrays as inputs. The only reason we include numpy as an optional dependency is that you need it if you want to return numpy arrays as outputs (without expecting the user to convert a memoryview).

(I'll try to get to your other points soon.)

ajfriend · 2024-05-19T18:50:39Z

There are a few backends: basic_str, basic_int, etc that shares the same API. Only a subset of them will be relevant for a "vectorized" version. I am thinking to put those vectorized functions in an experimental module (e.g., api.experimental.vectorized) which does not use the symlink-ed __init__.py. Does it make sense?

That makes sense. Although, recently, I've been thinking there might be another potential way to organize this rather than putting the code in api.experimental.vectorized. h3-py exposes its Cython interface (although I'm not sure we officially support it yet) and other Cython packages should be able to simply add it as a Cython dependency. We should probably come up with a short example repo showing people how to do this (#373).

This would let others write fast Cython code involving the h3-py C and Cython functions, without having to have those live inside the main repo. This vectorized API could be one such example. We don't necessarily have to do it this way, but I wanted to raise the idea and see what other folks thought.

ajfriend · 2024-05-19T18:54:05Z

Do you have a sense of how much faster a vectorized version would be? For now, if I move the for-loop from python to C I get around 30% faster in integer representation.

I don't off the top of my head. I think it would depend on the application. That being said, I think developing some benchmarks would be helpful. You might be able to test things fairly quickly with something like https://github.com/uber/h3-py/blob/master/tests/test_cython/cython_example.pyx

wingkitlee0 · 2024-05-20T00:00:58Z

another potential way to organize this
other Cython packages should be able to simply add it as a Cython dependency.

That's an interesting idea. I think exposing cython functions is good.

However, I think Cython is hard for most python users. Most people probably prefer pip install h3 and get all the good stuff. In other word, I would like to see more people to use H3 for their geospatial projects and it would be easier if we have only one h3 python library. Maybe a limited subset of vectorized functions is good enough to start. More advanced users can build their library with the Cython functions..

Should we continue the discussion back in Slack (for more visibility?)

BTW, thanks for answering my questions. After reading a little bit more memoryview, I agree that we do not need numpy in Cython for this.

ajfriend · 2024-05-20T00:14:02Z

For what I'm thinking, the average user would not need to know anything about Cython. If we produced a separate package (called whatever, maybe h3vectorized for example), users would be able to just pip install that. But development would be much easier in the h3vectorized repo, since all the hard CMake and scikit-build stuff is handled separately by the h3-py repo (in the same way that a package that cimports numpy doesn't need to build all of numpy from scratch each time).

Since we're not sure exactly what the vectorized API looks like just yet, I think that could provide a lot of benefits: You're less strongly coupled to the h3-py repo and have more freedom to experiment with API design. In a lot of ways, that's a more natural way to explore an API than having an experimental submodule. You would also not be subject to the fairly rigid versioning requirements we have for h3 bindings to match the core library version in major and minor semver numbers.

If we later finalize an API, we could migrate it into h3-py directly or keep it as a separate package.

Again, I'm not saying this is the only way to do it, but I do think it is an attractive option. But, all in all, once we get that first example package working, I'd expect this to be the much easier path.

ajfriend · 2024-05-20T00:35:11Z

Also, feel free to flag the discussion in Slack, but I'd like to keep the main discussion on Github since Slack history gets lost relatively quickly.

ajfriend · 2024-05-20T03:29:34Z

I threw together an example that still needs some work, but this would be the idea: https://github.com/ajfriend/h3_example_package

wingkitlee0 · 2024-05-20T23:09:20Z

Thanks for the clarifications! I think it makes all sense now. I didn't realize h3-py is more than a Cython wrapper but also build the H3 C library.

migrate it into h3-py directly or keep it as a separate package.

sounds good

Thanks also for the example.. I will try to build something on it..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the status / plan for vectorized APIs? #341

What is the status / plan for vectorized APIs? #341

wingkitlee0 commented Jan 14, 2024

wingkitlee0 commented Mar 24, 2024

ajfriend commented Mar 29, 2024

wingkitlee0 commented Mar 30, 2024

wingkitlee0 commented May 19, 2024 •

edited

Loading

ajfriend commented May 19, 2024 •

edited

Loading

ajfriend commented May 19, 2024

ajfriend commented May 19, 2024

wingkitlee0 commented May 20, 2024

ajfriend commented May 20, 2024 •

edited

Loading

ajfriend commented May 20, 2024

ajfriend commented May 20, 2024

wingkitlee0 commented May 20, 2024

What is the status / plan for vectorized APIs? #341

What is the status / plan for vectorized APIs? #341

Comments

wingkitlee0 commented Jan 14, 2024

wingkitlee0 commented Mar 24, 2024

ajfriend commented Mar 29, 2024

wingkitlee0 commented Mar 30, 2024

wingkitlee0 commented May 19, 2024 • edited Loading

ajfriend commented May 19, 2024 • edited Loading

ajfriend commented May 19, 2024

ajfriend commented May 19, 2024

wingkitlee0 commented May 20, 2024

ajfriend commented May 20, 2024 • edited Loading

ajfriend commented May 20, 2024

ajfriend commented May 20, 2024

wingkitlee0 commented May 20, 2024

wingkitlee0 commented May 19, 2024 •

edited

Loading

ajfriend commented May 19, 2024 •

edited

Loading

ajfriend commented May 20, 2024 •

edited

Loading