-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#902: fixed memory related bugs with emit dataframe #414
Conversation
54c992f
to
b0c1a63
Compare
pyResult.reset(PyObject_CallMethodObjArgs(resultHandler, pySetNullMethodName.get(), pyColSetMethods[c].first.get(), NULL)); | ||
return; | ||
} | ||
switch (colInfo[c].type) { | ||
case SWIGVMContainers::BOOLEAN: | ||
if (pyBool.get() == Py_True) { | ||
if (pyBool == Py_True) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes also only slightly sense, but we don't change it int this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that Py_True
(which comes from the Python header file) is a static pointer to the (immortal) "True" object of Python.
See https://docs.python.org/3/c-api/bool.html#c.Py_True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is, what ever pyBool is, it is either Py_True or Py_False, otherwise we would need to use a Python comparison instead of a c++. That means, the comparison is meaningless, because we do both cases the same. We could replace this with INCREF(pyBool)
1. Fixed incorrect Python DECREF in create_dataframe() 2. Fixed multiple incorrect DECREF's in emit()
1. Added memory leak check tests 2. Added a test for a multi datetime column dataframe
eeff40d
to
2a927fb
Compare
Needed to increase memory diff limit in test dataframe_memory_leak.test_dataframe_set_emits from 15KB to 20KB, because cuda container used 15.2KB.
fixes #902
3 problems were identified, first 2 are memory related:
1. Numpy object leaked
Py-Object returned from
also needs to be deallocated (call to
Py_XDECRED()
). In current implementation, we decreased reference counter only for the transposed array. Debugging showed the reference counter:This mean the array retrieved from
PyArray_Transpose()
is a new object=> We need to decrease reference counter for both.
2. Items returned from
PyList_GetItem()
must not be releasedSee documentation
PyList_GetItem()
to a std::unique_ptr which callsPy_XDECREF()
in the destructor.3. emit with datetime only object fails
Running emit on a dataframe which contains only datetime64[ns] columns fails with error message:
Reason is that the default conversion to
numpy
expects only objects as cell items. For the case where only one column of typeNPY_DATETIME
is in the source dataframe, a workaround was already implemented (see here).Solution: Convert all items in the dataframe to type
object
if all columns are of typeNPY_DATETIME
.Minor changes
checkPyPtrIsNull()
->checkPyPtrIsNotNull()
checkPyObjectIsNotNull()