-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column type error when combining max(na.rm = TRUE) with any(is.na()) in grouped operation #6608
Comments
Ty for the report. This is part, due to the bug fix 35 of
So when you leave out the |
there are several types of consistency to consider here
MRE below has good warnings and verbose output so I don't think we need to change anything (this behavior is expected and well documented). > data.table(x=c(NA,1L))[,.(m=max(x)),by=x]
x m
<int> <int>
1: NA NA
2: 1 1
> data.table(x=c(NA,1L))[,.(m=max(x,na.rm=TRUE)),by=x]
Error in `[.data.table`(data.table(x = c(NA, 1L)), , .(m = max(x, na.rm = TRUE)), :
Column 1 of result for group 2 is type 'integer' but expecting type 'double'. Column types must be consistent for each group.
In addition: Warning message:
In max(x, na.rm = TRUE) : no non-missing arguments to max; returning -Inf
> data.table(x=c(NA,1L))[,.(m=max(x,na.rm=TRUE)),by=x,verbose=TRUE]
Detected that j uses these columns: <none>
Finding groups using forderv ... forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=0, all1(ascArg)=1
forder.c received 2 rows and 1 columns
forderReuseSorting: opt=0, took 0.000s
0.000s elapsed (0.000s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'list(max(x, na.rm = TRUE))'
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... Error in `[.data.table`(data.table(x = c(NA, 1L)), , .(m = max(x, na.rm = TRUE)), :
Column 1 of result for group 2 is type 'integer' but expecting type 'double'. Column types must be consistent for each group.
In addition: Warning message:
In max(x, na.rm = TRUE) : no non-missing arguments to max; returning -Inf
> data.table(x=c(NA,1L))[,.(m=as.integer(max(x,na.rm=TRUE))),by=x]
x m
<int> <int>
1: NA NA
2: 1 1
Warning messages:
1: In max(x, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
2: In `[.data.table`(data.table(x = c(NA, 1L)), , .(m = as.integer(max(x, :
NAs introduced by coercion to integer range |
Description
I've encountered an error when combining
max(..., na.rm = TRUE)
withany(is.na(...))
in the same grouped operation. The operation fails with a type error, but works fine whenna.rm = FALSE
.Expected behavior
The second operation below should work the same as the first one, just handling NAs differently via na.rm = TRUE.
Observed behavior
The operation fails with a type error suggesting column type inconsistency across groups, even though all input columns are integers.
Minimal reproducible example
Create sample data
Works (when na.rm = FALSE)
Fails (when set na.rm = TRUE)
Works (when comment out the any(...) )
Output of sessionInfo()
The text was updated successfully, but these errors were encountered: