Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible changes and extensions #90

Open
timholy opened this issue Nov 20, 2023 · 3 comments
Open

Possible changes and extensions #90

timholy opened this issue Nov 20, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@timholy
Copy link
Contributor

timholy commented Nov 20, 2023

I'd like to check receptivity/thoughts on a couple of possible extensions and changes:

  • Would you be open to switching the return type for some methods to something that encodes both the matrix and its factorization? (For example, PDMats does this.) The factorization is already computed for some shrinkage methods and it seems a shame to throw all that work away, especially if the only thing you're computing the covariance matrix for is Mahalanobis distance calculations for which the factorized matrix is much more efficient.
  • For very high dimensional applications, I'm interested in "spiked" covariance models in which some eigenvalues are large and the remaining eigenvalues are assumed to be equal to one other. This allows one to efficiently represent the matrix in Woodbury form as an isotropic (or more generally, diagonal) matrix + a low-rank correction. There are also nonlinear shrinkage methods for this case that I'd be happy to contribute. But it would require a new WoodburyCovariance(rank::Int) <: CovarianceEstimator which I'd be happy to add if you're willing to take both WoodburyMatrices.jl and probably TSVD.jl on as dependencies. Alternately, we could make them package extensions but we'd need to export the WoodburyCovariance name from the main package. That's a bit ugly if, for example, you ever want to use Aqua (you'll fail the "no undefined exports" test, though you can disable it).
@mateuszbaran
Copy link
Owner

Sure, I'm definitely open to doing some changes. Currently I don't have enough time to work on new features myself but I can do review and maintenance.

  • Would you be open to switching the return type for some methods to something that encodes both the matrix and its factorization? (For example, PDMats does this.) The factorization is already computed for some shrinkage methods and it seems a shame to throw all that work away, especially if the only thing you're computing the covariance matrix for is Mahalanobis distance calculations for which the factorized matrix is much more efficient.

Sure, maybe the best solution would be adding a parameter that tells whether to return the covariance matrix or its factorization.

  • For very high dimensional applications, I'm interested in "spiked" covariance models in which some eigenvalues are large and the remaining eigenvalues are assumed to be equal to one other. This allows one to efficiently represent the matrix in Woodbury form as an isotropic (or more generally, diagonal) matrix + a low-rank correction. There are also nonlinear shrinkage methods for this case that I'd be happy to contribute. But it would require a new WoodburyCovariance(rank::Int) <: CovarianceEstimator which I'd be happy to add if you're willing to take both WoodburyMatrices.jl and probably TSVD.jl on as dependencies. Alternately, we could make them package extensions but we'd need to export the WoodburyCovariance name from the main package. That's a bit ugly if, for example, you ever want to use Aqua (you'll fail the "no undefined exports" test, though you can disable it).

This looks like an interesting extension and it would be great to have it implemented here 👍 .
Both WoodburyMatrices.jl and TSVD.jl look fairly lightweight so if they wouldn't increase loading time too much, I would be fine with adding them as dependencies.

@mateuszbaran mateuszbaran added the enhancement New feature or request label Nov 20, 2023
@timholy
Copy link
Contributor Author

timholy commented Nov 20, 2023

adding a parameter that tells whether to return the covariance matrix or its factorization

With PDMats you return both: https://github.com/JuliaStats/PDMats.jl/blob/e0cad7caf7d955d5a14fe830f3684026a9652ad7/src/pdmat.jl#L4-L9. Note that AbstractPDMat{T} <: AbstractMatrix{T}, so the returned object acts like "the matrix" for purposes like C[i, j] and C * v. But for C \ v it instead uses the factorization. Essentially, the way to think about PDMats is as a way of caching the matrix and its factorization so you get good performance even if you use \ many times and in many different locations.

I would be fine with adding them as dependencies.

Great! I'll get cracking.

@mateuszbaran
Copy link
Owner

OK, cool. Returning both is also fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants