-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joins do not warn user when using POSc and Date comparisons #6605
Comments
I think your issue is not that it does or doesn't join, it's that base R's warning is either bypassed or suppressed, is that right? Unfortunately in R, the units for The lack of a warning does make me wonder (not quite enough yet to look under the hood at the source) how (Similarly, I'd really like |
That's correct. Its understandable that joins on a POSc and Date column would behave differently but the warning to notify the user on the type difference, in this instance, seems to be bypassed (compared to when using those operators outside of a join context). |
My guess is that something internal "knows" that the underlying join keys are both numeric under the hood, and it just works regardless of other attributes of the numbers. (That's one way I might have approached it.) |
This question seems to be a better structure for the "issue", suggest it might be useful to either keep this issue and close 1200 as a dupe, or at least update 1200 with a better reprex (either from the SO question it references or from this issue). Further, do you (@ben-schwen) think it's a good idea and likely that having the |
Yes, we should do some prechecks on compatible types. Ofc, this will never catch everything but it should at least catch common pitfalls (similar as we already do in |
@ben-schwen , I haven't studied the Some questions:
Offhand, do you know exactly where this precheck would go in the code? |
@ben-schwen, here is one possible patch (I'm happy to provide as a PR). It works, but it is doing precisely one bespoke precheck: if both columns to join are in dt1_posc[dt2, on = .(id, date_time1 <= date_2, date_time2 >= date_1)]
# Warning in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult, :
# Attempting to join column x.date_time1 (POSIXt) with column i.date_2 (Date), they are unlikely to be compatible numbers. Suggest you convert one to the other's class.
# Warning in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult, :
# Attempting to join column x.date_time2 (POSIXt) with column i.date_1 (Date), they are unlikely to be compatible numbers. Suggest you convert one to the other's class.
# id date_time1 date_time2 infocol
# <num> <Date> <Date> <char>
# 1: 1 2018-06-01 2018-01-01 <NA>
# 2: 1 2020-11-01 2019-01-01 <NA>
# 3: 2 2010-01-01 2010-01-01 <NA> With this patch: --- a/R/bmerge.R
+++ b/R/bmerge.R
@@ -70,6 +70,14 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
}
stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, xclass, iname, iclass)
}
+ timeclasses = c("Date", "POSIXt", "ITime", "times")
+ xclass_time = intersect(class(x[[xc]]), timeclasses)
+ iclass_time = intersect(class(i[[ic]]), timeclasses)
+ if (length(xclass_time) > 0L && length(iclass_time) > 0L &&
+ !identical(sort(xclass_time), sort(iclass_time))) {
+ warningf("Attempting to join column %s (%s) with column %s (%s), they are unlikely to be compatible numbers. Suggest you convert one to the other's class.",
+ xname, toString(xclass_time), iname, toString(iclass_time))
+ }
if (xclass == iclass) {
if (verbose) catf("%s has same type (%s) as %s. No coercion needed.\n", iname, xclass, xname)
next I'm making the assumption that the follow-on checks (using It could alternatively |
Looks good. I would only add
|
Issue
There is a possible silent warning/error that occurs when joining on POSc and Date columns. It is unclear if this is expected behaviour based on my readings.
The example below shows how the same tables (dt1 and dt2) either join or fail to (left) join based upon the format of the date/time columns in dt1 (either POSc or Date). Although the simple <= and >= comparisons on vectors pass a warning, there is no warning passed when performing a join, which led to confusion as to the cause of join difference.
Example
Session info
I used the latest version of data.table on CRAN (1.16.2) in the following session
The text was updated successfully, but these errors were encountered: