-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC: Move some Tempita to Cython/C++ #56432
Conversation
# TODO: must be a cleaner way to do this? | ||
# even arr.data = move(idx.data()) would be better but arr.data is readonly | ||
arr = np.empty(idx.size(), dtype=np.intp) | ||
memcpy(arr.data, idx.const_data(), idx.size() * sizeof(cnp.npy_intp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data
member in Cython is read-only and I wasn't sure of any ndarray constructor that would properly manage the lifecycle of a raw data buffer. cc @jbrockmendel in case you know of a better way to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the advantage of using a vector here over just putting things into the ndarray and resizing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think @seberg is the person to ask about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lithomas1 maybe you could do without it, although this was trying to stay a faithful port of the current codebase which creates a custom templated Vector class
std::vector also has the advantage of working out of the box and using RAII; if you were to do this with a raw buffer it takes a few more steps and requires manual memory management, along with diving into ndarray internals
Is there a user-facing benefit here, or the goal just "less tempita"? This moves the affected code out of the "things i am confident i can maintain" bucket |
@jbrockmendel not sure yet - need to do some more research. Using the C++ templated khash would be nice internally as the C one we use is a mess of macros. What I'm not sure of yet is how Cython allows for template specialization, if at all In tempita, we often times do something like: {{ if dtype == object }}
...
{{ endif }} C++ template specializations would look something like: template<typename T>
bool isPyObject() { return false; }
template<>
bool isPyObject<PyObject*>() { return true; } I'm not sure yet what Cython offers but assuming not a lot that would require leveraging C++ directly to truly get rid of Tempita |
I think most of the places left where we do this are in function signatures where we cant (yet?) do |
Thanks for the feedback @jbrockmendel . Also want to be clear - I don't expect this PR to be merged. Just toying around with the idea. Figured pushing this up allows me to see how it works in CI and get preliminary feedback New tooling is not something to take lightly, so if this ultimately shows promise it is probably worth a PDEP first |
So AFAICT the current build failure is due to the c++ compiler coming from the system not a conda environment. We could add the c++ compiler to the conda list, although I think this might actually be a bug with meson and how it determines when to link against python with c++. Need to research more to try and make an MRE, unless @eli-schwartz knows off hand what could be wrong |
From ubuntu-latest actions-311:
It is checking PATH for both compilers, and as far as I can tell, picking up the stock Ubuntu one? If you want to choose a specific compiler then the conda environment could choose to export $CC and $CXX and meson will use those instead. |
@eli-schwartz yea absolutely - if we were to add a conda compiler then this works, mainly because of the changes conda makes to LD_FLAGS. But I think it should be OK to use the system compilers. I haven't faced this issue with C files just C++ and I am not sure the rules for when an extension should or should not link Python, but the error in the logs suggest that link was expected. Will investigate further and try to come up with an MRE for meson if I can confirm a bug |
The missing symbol is a name-mangled symbol for C++, which means it is somehow expecting a C++ "_Z21PyTraceMalloc_Untrackjm" symbol instead of a C "PyTraceMalloc_Untrack" symbol. This is confusing to me. The latter exists in my libpython3.11.so, the former does not. |
Alright so it appears the issue can be resolved by adding arguments for Not very familiar with these flags but from what I understand the gc-sections cleans up symbols that ultimately do not get used in the library. Still not clear why this shows up specifically for this extension and not our other C++ extension |
klib provides a c++ header file that, while not exactly the same as our existing vendored copy, could help simplify the layers we have between Cython -> Tempita -> khash
I get the general impression that a lot of the
IntXXVector
classes we have and hashing code could be simplified using C++ templates, although the transition from current state would not be easy