Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#902: fixed memory related bugs with emit dataframe #414

Merged
merged 26 commits into from
Jun 7, 2024

Conversation

tomuben
Copy link
Collaborator

@tomuben tomuben commented May 23, 2024

fixes #902

3 problems were identified, first 2 are memory related:

1. Numpy object leaked

Py-Object returned from

PyArray_FROM_OTF(data.get(), NPY_OBJECT, NPY_ARRAY_IN_ARRAY))

also needs to be deallocated (call to Py_XDECRED()). In current implementation, we decreased reference counter only for the transposed array. Debugging showed the reference counter:

Ref count of colArray = 1
Ref count of pyArray = 2

This mean the array retrieved from PyArray_Transpose() is a new object

=> We need to decrease reference counter for both.

2. Items returned from PyList_GetItem() must not be released

See documentation

...
Return value: Borrowed reference. Part of the [Stable ABI](https://docs.python.org/3/c-api/stable.html#stable)
...
  • Currently we assign the object returned from PyList_GetItem() to a std::unique_ptr which calls Py_XDECREF() in the destructor.
  • This can lead do undefined behavior as we might decrease the reference counter to many times.

3. emit with datetime only object fails

Running emit on a dataframe which contains only datetime64[ns] columns fails with error message:

pyodbc.DataError: ('22002', '[22002] [EXASOL][EXASolution driver]VM error: F-UDF-CL-LIB-1127: F-UDF-CL-SL-PYTHON-1002: F-UDF-CL-SL-PYTHON-1026: ExaUDFError: F-UDF-CL-SL-PYTHON-1114: Exception during run \nTEST_DTYPE_EMIT:7 run\nRuntimeError: F-UDF-CL-SL-PYTHON-1136: F-UDF-CL-SL-PYTHON-1130: PyObject is unexpectedly a null pointer\n (Session: 1800240827916484608) (-3452546) (SQLExecDirectW)')

Reason is that the default conversion to numpy expects only objects as cell items. For the case where only one column of type NPY_DATETIME is in the source dataframe, a workaround was already implemented (see here).
Solution: Convert all items in the dataframe to type object if all columns are of type NPY_DATETIME.

Minor changes

  • renamed checkPyPtrIsNull() -> checkPyPtrIsNotNull()
  • created new tempory objects in handleEmitInt/handleEmitFloat/handleEmitTimestamp
  • new check function checkPyObjectIsNotNull()

@tomuben tomuben force-pushed the bug/902_fix_memory_leak_emit_df branch 2 times, most recently from 54c992f to b0c1a63 Compare May 24, 2024 18:05
@tomuben tomuben changed the title #902: fixed memory leak with emit dataframe #902: fixed memory related bugs with emit dataframe May 28, 2024
pyResult.reset(PyObject_CallMethodObjArgs(resultHandler, pySetNullMethodName.get(), pyColSetMethods[c].first.get(), NULL));
return;
}
switch (colInfo[c].type) {
case SWIGVMContainers::BOOLEAN:
if (pyBool.get() == Py_True) {
if (pyBool == Py_True) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes also only slightly sense, but we don't change it int this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that Py_True (which comes from the Python header file) is a static pointer to the (immortal) "True" object of Python.
See https://docs.python.org/3/c-api/bool.html#c.Py_True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, what ever pyBool is, it is either Py_True or Py_False, otherwise we would need to use a Python comparison instead of a c++. That means, the comparison is meaningless, because we do both cases the same. We could replace this with INCREF(pyBool)

@tomuben tomuben force-pushed the bug/902_fix_memory_leak_emit_df branch from eeff40d to 2a927fb Compare May 31, 2024 15:16
tkilias
tkilias previously approved these changes Jun 6, 2024
tomuben added 2 commits June 7, 2024 09:42
Needed to increase memory diff limit in test
dataframe_memory_leak.test_dataframe_set_emits
from 15KB to 20KB, because cuda container used
15.2KB.
tkilias
tkilias previously approved these changes Jun 7, 2024
@tomuben tomuben merged commit 5db9319 into master Jun 7, 2024
9 of 10 checks passed
@tomuben tomuben deleted the bug/902_fix_memory_leak_emit_df branch June 7, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak with emit dataframe
2 participants