Dealing with missing values #52

tlienart · 2019-01-10T03:46:04Z

Probably for a future point:

julia> X = AbstractArray{Union{Float64, Missing}, 2}(randn(5, 7))
julia> X[1, 2] = missing
julia> X[3, 5] = missing
julia> cov(X)
7×7 Array{Union{Missing, Float64},2}:
  0.323781   missing  -0.235777   0.0266937  missing   0.460899   0.345166
   missing   missing    missing    missing   missing    missing    missing
 -0.235777   missing   1.44032   -1.2644     missing   0.39682   -0.442537
  0.0266937  missing  -1.2644     1.69334    missing  -0.367602  -0.374397
   missing   missing    missing    missing   missing    missing    missing
  0.460899   missing   0.39682   -0.367602   missing   1.74075    0.614322
  0.345166   missing  -0.442537  -0.374397   missing   0.614322   2.00857

I don't think that's ideal (using both Statistics and StatsBase). See also covrob r package where a function to filter missing value can be provided.

It would seem pretty easy to at least implement

fail if there are missing
omit if there are missing (remove the corresponding obs)

And then maybe we could suggest imputing maybe via Impute.jl

refs

The text was updated successfully, but these errors were encountered:

mateuszbaran · 2019-01-10T22:34:27Z

There are also algorithms designed specifically to deal with missing data, for example: https://arxiv.org/pdf/1201.2577.pdf .

tlienart · 2019-01-10T23:01:42Z

Ok so that's a Lasso-type problem on a slightly modified observed covariance (eq (1.5)). I guess that can be added once we've added a (Graphical) Lasso estimator for the covariance.

rumela · 2020-09-05T18:32:53Z

Consider exporting a shrinkage method that relies on the matrix S, but not the underlying matrix of samples, X (I note that analytical_nonlinear_shrinkage appears to use only S, and not X). The motivation here is that in stock data there are typically missing samples, so a matrix, X, cannot be fully constructed. Instead, pairwise covariances can be calculated to form the elements of a matrix, T (though T is not guaranteed positive semidefinite as its elements are computed on inconsistent data sets).

Then, consider adding the method described here: https://nhigham.com/2013/02/13/the-nearest-correlation-matrix/ (there is already sample code in Matlab/R/Python). Then, T can be "converted" to a positive semidefinite matrix, S, that can then be fed into analytical_nonlinear_shrinkage.

mateuszbaran · 2020-09-06T09:02:26Z

This looks like a good approach, I could review and merge a pull request that adds this. I don't personally need this functionality at the moment so I'm not going to work on it myself.

tlienart added the enhancement New feature or request label Jan 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with missing values #52

Dealing with missing values #52

tlienart commented Jan 10, 2019 •

edited

Loading

mateuszbaran commented Jan 10, 2019

tlienart commented Jan 10, 2019

rumela commented Sep 5, 2020

mateuszbaran commented Sep 6, 2020

Dealing with missing values #52

Dealing with missing values #52

Comments

tlienart commented Jan 10, 2019 • edited Loading

refs

mateuszbaran commented Jan 10, 2019

tlienart commented Jan 10, 2019

rumela commented Sep 5, 2020

mateuszbaran commented Sep 6, 2020

tlienart commented Jan 10, 2019 •

edited

Loading