Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.concat dataframes with different datetime64 resolutions #53641

Merged
merged 18 commits into from
Oct 30, 2023

Conversation

Charlie-XIAO
Copy link
Contributor

@mroeschke mroeschke requested a review from jbrockmendel June 13, 2023 17:07
@mroeschke mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels Jun 13, 2023
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jul 15, 2023
@seanslma
Copy link

seanslma commented Oct 10, 2023

I can confirm this issue still exists in pandas 2.1.1.

This bug together with the latest changes in pyarrow 13.0.0 apache/arrow#38171 makes it difficult to use pd.concat and pd.merge #55212.

t1 = '2023-09-01'
t2 = '2023-09-01 01:00:00'
t3 = '2023-09-01 01:30:00'
ds1 = pd.date_range(t1, t1, freq='30T')
ds2 = pd.date_range(t2, t3, freq='30T')
df1 = pd.DataFrame({
    'ds': ds1.astype('datetime64[us]'),
    'y': list(range(len(ds1))),
})
df2 = pd.DataFrame({
    'ds': ds2.astype('datetime64[ns]'),
    'y': list(range(len(ds2))),
})
pd.concat([df1, df2], axis=0)

And the error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/conda-envs/api/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 393, in concat
    return op.get_result()
  File "/home/user/conda-envs/api/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 680, in get_result
    new_data = concatenate_managers(
  File "/home/user/conda-envs/api/lib/python3.9/site-packages/pandas/core/internals/concat.py", line 189, in concatenate_managers
    values = _concatenate_join_units(join_units, copy=copy)
  File "/home/user/conda-envs/api/lib/python3.9/site-packages/pandas/core/internals/concat.py", line 486, in _concatenate_join_units
    concat_values = concat_compat(to_concat, axis=1)
  File "/home/user/conda-envs/api/lib/python3.9/site-packages/pandas/core/dtypes/concat.py", line 132, in concat_compat
    return cls._concat_same_type(to_concat_eas)
  File "/home/user/conda-envs/api/lib/python3.9/site-packages/pandas/core/arrays/datetimelike.py", line 2251, in _concat_same_type
    new_obj = super()._concat_same_type(to_concat, axis)
  File "/home/user/conda-envs/api/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py", line 230, in _concat_same_type
    return super()._concat_same_type(to_concat, axis=axis)
  File "arrays.pyx", line 190, in pandas._libs.arrays.NDArrayBacked._concat_same_type
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 2

@MarcoGorelli MarcoGorelli self-requested a review October 10, 2023 06:55
@@ -858,3 +858,17 @@ def test_concat_multiindex_with_category():
)
expected = expected.set_index(["c1", "c2"])
tm.assert_frame_equal(result, expected)


def test_concat_datetime64_diff_resolution():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be simpler or more thorough to take existing tests in this file and parametrize them over units?

@jbrockmendel
Copy link
Member

Will this also close #55067, #55374?

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Oct 25, 2023

Will this also close #55067, #55374?

@jbrockmendel Yes, I can confirm that #55067 and #55374 raise no ValueError with this PR.

Also, I tried what you suggested, i.e., parametrize existing tests but find it too hard to realize. Currently I made a slightly more comprehensive test than the previous version and parametrized it with different datetime resolutions. Can you check if that suffices?

@jbrockmendel
Copy link
Member

Also, I tried what you suggested, i.e., parametrize existing tests but find it too hard to realize

I suggest defining these fixtures in tests.reshape.concat.test_datetimes

@pytest.fixture(params=["s", "ms", "us", "ns"])
def unit(request):
    return request.param


unit2 = unit

Then pass these fixtures to test_concat_tz_series3 and cast first to f"M8[{unit}]" and second to f"M8[{unit2}]". This test would then hit the bug that this PR fixes.

Updating other tests in this file to use the same pattern would be optional.

@jbrockmendel
Copy link
Member

A few comments, otherwise looking good

@Charlie-XIAO
Copy link
Contributor Author

Done, thanks for the review @jbrockmendel!

@mroeschke mroeschke added this to the 2.2 milestone Oct 30, 2023
@mroeschke mroeschke merged commit f7bda17 into pandas-dev:main Oct 30, 2023
33 checks passed
@mroeschke
Copy link
Member

Thanks @Charlie-XIAO

@lithomas1
Copy link
Member

@mroeschke
Looks like this is missing a whatsnew.

Also, would you be open to backporting this to 2.1.x?
It seems quite a few people are being hit by this.

@mroeschke
Copy link
Member

Ah, yeah +1 to backport this to 2.1.x

@lithomas1
Copy link
Member

@meeseeksdev backport 2.1.x.

Copy link

lumberbot-app bot commented Dec 3, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.1.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 f7bda1734cde43adb2e867304668e56e14c17a9f
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #53641: BUG: `pd.concat` dataframes with different datetime64 resolutions'
  1. Push to a named branch:
git push YOURFORK 2.1.x:auto-backport-of-pr-53641-on-2.1.x
  1. Create a PR against branch 2.1.x, I would have named this PR:

"Backport PR #53641 on branch 2.1.x (BUG: pd.concat dataframes with different datetime64 resolutions)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

lithomas1 pushed a commit to lithomas1/pandas that referenced this pull request Dec 4, 2023
@zacqed
Copy link

zacqed commented Dec 11, 2023

@lithomas1 - This issue still exists on 2.1.4, was the backport included in the release?

@lithomas1 lithomas1 modified the milestones: 2.1.4, 2.2 Dec 11, 2023
@lithomas1
Copy link
Member

Unfortunately not, there seem to be other changes necessary in addition to this one to patch the bug for the 2.1 series.
(the added tests here fail when backported to 2.1.)

If you're curious, my attempt to do it was here #56311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Non-Nano datetime64/timedelta64 with non-nanosecond resolution Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.concat dataframes with different datetime64 types
7 participants