Ensure HDFStore read gives column-major data with CoW #55743
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixing a failing test that turned up in #55732.
This PR specifically fixes
pandas/tests/io/pytables/test_store.py::test_hdfstore_strides
, which tests that the data coming back from HDF are column-major (the "idea" of the test was to ensure the layout is preserved, i.e. when starting with a DataFrame that is column-major, you should get back one with the same layout. But in practice we don't do that, we just simply always return column-major data because ofconcat
making a copy by default (see #22073 (comment)), and we only test the column-major case in the test)We have ran into this issue before: in pandas/tests/io/pytables/test_store.py::test_hdfstore_strides, @phofl initially just skipped the test for CoW. But then in the final version of that PR, we removed that skip in favor of adding a
copy=True
inside theconcat
call in HDFStore.read.However, later, we then changed
concat
to ignore thecopy
keyword alltogether when CoW is enabled, even when specifically passingcopy=True
(done in #51464). Because of that, this test is failing again (the reason this wasn't caught in our CI, s because the specific test was in the meantime moved to "single_cpu", which we don't run on the CoW CI builds)This PR implements the option to manually copy inside HDFStore.read when CoW is enabled to achieve the same end result, but two other options would be:
pd.concat(..)
again to do honor thecopy=True
option, even in the case of CoW