#902: fixed memory related bugs with emit dataframe #414

tomuben · 2024-05-23T18:40:54Z

fixes #902

3 problems were identified, first 2 are memory related:

1. Numpy object leaked

Py-Object returned from

PyArray_FROM_OTF(data.get(), NPY_OBJECT, NPY_ARRAY_IN_ARRAY))

also needs to be deallocated (call to Py_XDECRED()). In current implementation, we decreased reference counter only for the transposed array. Debugging showed the reference counter:

Ref count of colArray = 1
Ref count of pyArray = 2

This mean the array retrieved from PyArray_Transpose() is a new object

=> We need to decrease reference counter for both.

2. Items returned from `PyList_GetItem()` must not be released

See documentation

...
Return value: Borrowed reference. Part of the [Stable ABI](https://docs.python.org/3/c-api/stable.html#stable)
...

Currently we assign the object returned from PyList_GetItem() to a std::unique_ptr which calls Py_XDECREF() in the destructor.
This can lead do undefined behavior as we might decrease the reference counter to many times.

3. emit with datetime only object fails

Running emit on a dataframe which contains only datetime64[ns] columns fails with error message:

pyodbc.DataError: ('22002', '[22002] [EXASOL][EXASolution driver]VM error: F-UDF-CL-LIB-1127: F-UDF-CL-SL-PYTHON-1002: F-UDF-CL-SL-PYTHON-1026: ExaUDFError: F-UDF-CL-SL-PYTHON-1114: Exception during run \nTEST_DTYPE_EMIT:7 run\nRuntimeError: F-UDF-CL-SL-PYTHON-1136: F-UDF-CL-SL-PYTHON-1130: PyObject is unexpectedly a null pointer\n (Session: 1800240827916484608) (-3452546) (SQLExecDirectW)')

Reason is that the default conversion to numpy expects only objects as cell items. For the case where only one column of type NPY_DATETIME is in the source dataframe, a workaround was already implemented (see here).
Solution: Convert all items in the dataframe to type object if all columns are of type NPY_DATETIME.

Minor changes

renamed checkPyPtrIsNull() -> checkPyPtrIsNotNull()
created new tempory objects in handleEmitInt/handleEmitFloat/handleEmitTimestamp
new check function checkPyObjectIsNotNull()

test_container/tests/test/pandas/all/dataframe_memory_leak.py

tkilias · 2024-05-31T08:48:11Z

exaudfclient/base/python/python3/python_ext_dataframe.cc

        pyResult.reset(PyObject_CallMethodObjArgs(resultHandler, pySetNullMethodName.get(), pyColSetMethods[c].first.get(), NULL));
        return;
    }
    switch (colInfo[c].type) {
        case SWIGVMContainers::BOOLEAN:
-            if (pyBool.get() == Py_True) {
+            if (pyBool == Py_True) {


this makes also only slightly sense, but we don't change it int this PR

My understanding is that Py_True (which comes from the Python header file) is a static pointer to the (immortal) "True" object of Python.
See https://docs.python.org/3/c-api/bool.html#c.Py_True

The thing is, what ever pyBool is, it is either Py_True or Py_False, otherwise we would need to use a Python comparison instead of a c++. That means, the comparison is meaningless, because we do both cases the same. We could replace this with INCREF(pyBool)

test_container/tests/test/pandas/all/dataframe_memory_leak.py

1. Fixed incorrect Python DECREF in create_dataframe() 2. Fixed multiple incorrect DECREF's in emit()

1. Added memory leak check tests 2. Added a test for a multi datetime column dataframe

test_container/tests/test/pandas/all/dataframe_memory_leak.py

Needed to increase memory diff limit in test dataframe_memory_leak.test_dataframe_set_emits from 15KB to 20KB, because cuda container used 15.2KB.

tomuben force-pushed the bug/902_fix_memory_leak_emit_df branch 2 times, most recently from 54c992f to b0c1a63 Compare May 24, 2024 18:05

tomuben changed the title ~~#902: fixed memory leak with emit dataframe~~ #902: fixed memory related bugs with emit dataframe May 28, 2024

tkilias requested changes May 31, 2024

View reviewed changes

tomuben added 16 commits May 31, 2024 12:16

#902: fixed memory leak with emit dataframe

b1c239e

Test

67862d5

Several fixes:

1c93f0c

1. Fixed incorrect Python DECREF in create_dataframe() 2. Fixed multiple incorrect DECREF's in emit()

Reverted formatting

0035320

Reverted release PyList_Append()

64a3fa8

Fixed formatting

04d4bdf

Fixed formatting

f9cd794

Fixed error code

8fe96dc

Avoid copy of int/double

f148427

Tests and fix for datetime only columns

d6b1177

1. Added memory leak check tests 2. Added a test for a multi datetime column dataframe

Added more comments

7e300f0

Decreased batch count in emit_dtypes_memory_leak.py

e8f3f90

Use a single variable for max memory usage

170aa4c

Modified dateframe_memory_leak.py

0daddff

Fixed findings from review

0b5233d

Uncommented tests

2a927fb

tomuben force-pushed the bug/902_fix_memory_leak_emit_df branch from eeff40d to 2a927fb Compare May 31, 2024 15:16

Adjusted memory limit for test_dataframe_set_emits()

9685734

tkilias reviewed Jun 3, 2024

View reviewed changes

test_container/tests/test/pandas/all/dataframe_memory_leak.py Outdated Show resolved Hide resolved

tomuben and others added 5 commits June 3, 2024 08:38

Fixes from review

7c931b2

Merge branch 'master' into bug/902_fix_memory_leak_emit_df

ebee2e2

Merge branch 'master' into bug/902_fix_memory_leak_emit_df

93b609d

fixed dataframe_memory_leak.py

53bc403

Merge branch 'master' into bug/902_fix_memory_leak_emit_df

96b58aa

tkilias previously approved these changes Jun 6, 2024

View reviewed changes

Merge branch 'master' into bug/902_fix_memory_leak_emit_df

5ea9ef2

tomuben added 2 commits June 7, 2024 09:42

Merge branch 'master' into bug/902_fix_memory_leak_emit_df

f7f8fb7

Increased memory limit in test

0bb3b39

Needed to increase memory diff limit in test dataframe_memory_leak.test_dataframe_set_emits from 15KB to 20KB, because cuda container used 15.2KB.

tomuben dismissed tkilias’s stale review via 0bb3b39 June 7, 2024 16:22

tkilias previously approved these changes Jun 7, 2024

View reviewed changes

Fixed include file for std::all_of

36a15c0

tomuben dismissed tkilias’s stale review via 36a15c0 June 7, 2024 18:12

tkilias approved these changes Jun 7, 2024

View reviewed changes

tomuben merged commit 5db9319 into master Jun 7, 2024
9 of 10 checks passed

tomuben deleted the bug/902_fix_memory_leak_emit_df branch June 7, 2024 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#902: fixed memory related bugs with emit dataframe #414

#902: fixed memory related bugs with emit dataframe #414

tomuben commented May 23, 2024 •

edited

Loading

tkilias May 31, 2024

tomuben May 31, 2024

tkilias Jun 3, 2024

#902: fixed memory related bugs with emit dataframe #414

#902: fixed memory related bugs with emit dataframe #414

Conversation

tomuben commented May 23, 2024 • edited Loading

1. Numpy object leaked

=> We need to decrease reference counter for both.

2. Items returned from PyList_GetItem() must not be released

3. emit with datetime only object fails

Minor changes

tkilias May 31, 2024

Choose a reason for hiding this comment

tomuben May 31, 2024

Choose a reason for hiding this comment

tkilias Jun 3, 2024

Choose a reason for hiding this comment

tomuben commented May 23, 2024 •

edited

Loading

2. Items returned from `PyList_GetItem()` must not be released