Fix: Ensure Cache Key pk is Converted to INT to Prevent Dataframe Series Null Issues #161
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
-> Ensure that the cache key pk (if used) is always converted to an INT format.
This addresses a bug that occurs when a queryset is loaded into a dataframe. Specifically, if the queryset includes a foreign key with nullable fields and a mix of instances with null and non-null related fields, pandas assigns the dtype of the primary key (pk) column as object. Consequently, pk values are automatically converted to floats because a pandas integer Series cannot contain None.
To avoid this, we must explicitly reconvert the pk column to INT before using it as a cache key.
Without this step, as of now, the dataframe ends up with None for every row in such cases.
[Using pandas 2.2.2]