Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linear model with Float32 #260

Open
Ankur-deDev opened this issue Oct 18, 2018 · 11 comments
Open

linear model with Float32 #260

Ankur-deDev opened this issue Oct 18, 2018 · 11 comments

Comments

@Ankur-deDev
Copy link

Hi GLM,

I am trying to use the lm function with the following command:
mydata = DataFrame(X=MyDataType.(1:3), Y=MyDataType.(11:13)); lm(@formula(Y ~ 1 + X), mydata)

This works fine with MyDataType = Float64 or Int32 and Int64 but with Float32 I get the following error:

ERROR: MethodError: no method matching delbeta!(::DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}, ::Array{Float32,1})
Closest candidates are:
delbeta!(::DensePredQR{T<:Union{Float32, Float64}}, ::Array{T<:Union{Float32, Float64},1}) where T<:Union{Float32, Float64} at /home/tooler/.julia/packages/GLM/FxTmX/src/linpred.jl:76
delbeta!(::DensePredChol{T<:Union{Float32, Float64},#s14} where #s14<:LinearAlgebra.Cholesky, ::Array{T<:Union{Float32, Float64},1}) where T<:Union{Float32, Float64} at /home/tooler/.julia/packages/GLM/FxTmX/src/linpred.jl:130
delbeta!(::DensePredChol{T<:Union{Float32, Float64},#s14} where #s14<:LinearAlgebra.CholeskyPivoted, ::Array{T<:Union{Float32, Float64},1}) where T<:Union{Float32, Float64} at /home/tooler/.julia/packages/GLM/FxTmX/src/linpred.jl:135

My environment is the following:

julia version 1.0.1
[38e38edf] GLM v1.0.1

@nalimilan
Copy link
Member

The code probably needs some tweaks to be more generic. PR welcome.

@andreasnoack
Copy link
Member

@kleinschmidt Is it on purpose that the model matrix promotes to Float64 even though the data columns are Float32?

It's worth noticing that lm(Matrix{Float32},Vector{Float32}) actually works so it's not Float32 that is the problem per se. The problem is the mix of the design matrix being Float64 and the response being Float32 that gives the error.

@kleinschmidt
Copy link
Member

I don't know actually...I suspect it was just a sensible default. I think it could be overridden by specifying ModelMatrix{Matrix{Float32}} in fit. Or by doing some promoteing.

I'm not super inclined to do the deep dive in the current StatsModels code to fix this though given that (inshallah) that's going to be replaced by Terms 2.0 (JuliaStats/StatsModels.jl#71).

@kleinschmidt
Copy link
Member

It's also hard to handle this in a sufficiently general way since you might have a categorical variable that gets expanded according to a contrasts matrix; that defaults to Float64, so you'd somehow have to figure out to un-promote that to Float32 in the case when the only continuous variables are Float32s.

@kleinschmidt
Copy link
Member

Also to clarify what @andreasnoack said: this is an issue with StatsModels (which handles the formula), not with GLM.

@greimel
Copy link

greimel commented May 2, 2019

I just encountered the same issue. This problem occurs when reading data from Stata's .dta files using StatFiles.jl. Stata seems to work with lower precision.

@xgdgsc
Copy link

xgdgsc commented May 25, 2023

Glad the other lm function with no formula works with Float32. I would not put the function with formula on doc if it doesn' t work with such a basic data type. Is it a good time to fix this?

@kleinschmidt
Copy link
Member

I guess the question is, what should GLM do when it gets a y and X of heterogenous eltypes? the current situation with (IIRC) a method error is not great, and all the fixes I can think of for StatsModels will not completely solve this problem (it would still be possible to get heterogenous eltypes out, and JuliaStats/StatsModels.jl#294 by making promotion less aggressive by default may in fact make the problem worse here if, for instance, users are doing something like y ~ 1 + x where x is Int64 or something...)

I'd suggest something like a promotion (or at least an eltype check) in

function fit(::Type{LinearModel}, X::AbstractMatrix{<:Real}, y::AbstractVector{<:Real},
to ensure that teh eltype is consistent, if that's really required for GLM to work.

@kleinschmidt
Copy link
Member

Ahahaha yes here's exactly what I was talking about...statsmodels doctests picked up this (lack of Bool-to-Float64 conversion): https://github.com/JuliaStats/StatsModels.jl/actions/runs/5095234337/jobs/9159967200?pr=294#step:5:305

@andreasnoack
Copy link
Member

Looks like this might be even worse at this point

julia> df = DataFrame(x = randn(Float32, 10), y = randn(Float32, 10))
10×2 DataFrame
 Row │ x           y
     │ Float32     Float32
─────┼───────────────────────
   1-0.945676   -0.132859
   2-0.0802135   0.180285
   3-1.86329    -0.291655
   4-0.705064   -0.633368
   51.63913    -1.11767
   6-0.494249    1.19169
   7-1.91468    -0.221668
   8-1.32549    -0.136988
   90.302089   -0.940081
  100.351315   -0.532551

julia> lm(@formula(y ~ x), df)
ERROR: MethodError: no method matching delbeta!(::GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}, ::Vector{Float32})

but the underlying issue might be JuliaStats/StatsModels.jl#293

@andreasnoack
Copy link
Member

There might be a separate issue related to offsets, see #562

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants