Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with missing values #52

Open
tlienart opened this issue Jan 10, 2019 · 4 comments
Open

Dealing with missing values #52

tlienart opened this issue Jan 10, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@tlienart
Copy link
Collaborator

tlienart commented Jan 10, 2019

Probably for a future point:

julia> X = AbstractArray{Union{Float64, Missing}, 2}(randn(5, 7))
julia> X[1, 2] = missing
julia> X[3, 5] = missing
julia> cov(X)
7×7 Array{Union{Missing, Float64},2}:
  0.323781   missing  -0.235777   0.0266937  missing   0.460899   0.345166
   missing   missing    missing    missing   missing    missing    missing
 -0.235777   missing   1.44032   -1.2644     missing   0.39682   -0.442537
  0.0266937  missing  -1.2644     1.69334    missing  -0.367602  -0.374397
   missing   missing    missing    missing   missing    missing    missing
  0.460899   missing   0.39682   -0.367602   missing   1.74075    0.614322
  0.345166   missing  -0.442537  -0.374397   missing   0.614322   2.00857 

I don't think that's ideal (using both Statistics and StatsBase). See also covrob r package where a function to filter missing value can be provided.

It would seem pretty easy to at least implement

  • fail if there are missing
  • omit if there are missing (remove the corresponding obs)

And then maybe we could suggest imputing maybe via Impute.jl

refs

@tlienart tlienart added the enhancement New feature or request label Jan 10, 2019
@mateuszbaran
Copy link
Owner

There are also algorithms designed specifically to deal with missing data, for example: https://arxiv.org/pdf/1201.2577.pdf .

@tlienart
Copy link
Collaborator Author

Ok so that's a Lasso-type problem on a slightly modified observed covariance (eq (1.5)). I guess that can be added once we've added a (Graphical) Lasso estimator for the covariance.

@rumela
Copy link

rumela commented Sep 5, 2020

Consider exporting a shrinkage method that relies on the matrix S, but not the underlying matrix of samples, X (I note that analytical_nonlinear_shrinkage appears to use only S, and not X). The motivation here is that in stock data there are typically missing samples, so a matrix, X, cannot be fully constructed. Instead, pairwise covariances can be calculated to form the elements of a matrix, T (though T is not guaranteed positive semidefinite as its elements are computed on inconsistent data sets).

Then, consider adding the method described here: https://nhigham.com/2013/02/13/the-nearest-correlation-matrix/ (there is already sample code in Matlab/R/Python). Then, T can be "converted" to a positive semidefinite matrix, S, that can then be fed into analytical_nonlinear_shrinkage.

@mateuszbaran
Copy link
Owner

This looks like a good approach, I could review and merge a pull request that adds this. I don't personally need this functionality at the moment so I'm not going to work on it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants