Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: merge should upcast to highest resolution #56310

Closed
wants to merge 5 commits into from

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Dec 4, 2023

@lithomas1
Copy link
Member Author

@jbrockmendel

This only gets some of the cases.

I think this misses when we take everything from the left join col.
Any idea where to handle the upcasting then?

@mroeschke mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels Dec 4, 2023
# TODO(non-nano) Workaround for common_type not dealing
# with different resolutions
result_dtype = key_col.dtype
key_col = Index(lvals).astype(result_dtype).where(~mask_left, rvals)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass copy=False?

@jbrockmendel
Copy link
Member

I think this misses when we take everything from the left join col. Any idea where to handle the upcasting then?

Do you have an example? That'd save me some time in figuring out what code paths that case goes through

@lithomas1
Copy link
Member Author

Sorry for the slow reply.

I think my changes here only fixed the result for outer/right merges. The tests that I've added that are failing are doing so for inner/left merges.

Here's a MRE.

t1 = '2023-09-01'
t2 = '2023-09-01 01:00:00'
t3 = '2023-09-01 01:30:00'
ds1 = pd.date_range(t1, t2, freq='30T')
ds2 = pd.date_range(t1, t3, freq='30T')
df1 = pd.DataFrame({
    'ds': ds1.astype('datetime64[us]'),
    'y1': range(len(ds1)),
})
df2 = pd.DataFrame({
    'ds': ds2.astype('datetime64[ns]'),
    'y2': range(len(ds2)),
})
# Changing this to outer/right makes it work
df3 = df1.merge(df2, on=['ds'], how='left') #will only convert df2 `ds` type to df1 `ds` type
df3.dtypes

@@ -761,6 +761,7 @@ Datetimelike
- Bug in parsing datetime strings with nanosecond resolution with non-ISO8601 formats incorrectly truncating sub-microsecond components (:issue:`56051`)
- Bug in parsing datetime strings with sub-second resolution and trailing zeros incorrectly inferring second or millisecond resolution (:issue:`55737`)
- Bug in the results of :func:`to_datetime` with an floating-dtype argument with ``unit`` not matching the pointwise results of :class:`Timestamp` (:issue:`56037`)
- Fixed bug in :meth:`DataFrame.merge` not being able to join on ``datetime64`` columns of differing resolutions (:issue:`55212`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in main they are able to join just gives the wrong result reso?

@jbrockmendel
Copy link
Member

The code change looks good to me. The test change looks plausible but i haven't examined closely.

Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 22, 2024
@mroeschke
Copy link
Member

Looks like this PR has gone stale and needs a rebase. Closing to clear the queue but feel free to reopen when you have time to circle back

@mroeschke mroeschke closed this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Non-Nano datetime64/timedelta64 with non-nanosecond resolution Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale
Projects
None yet
3 participants