Possible changes and extensions #90

timholy · 2023-11-20T13:09:57Z

I'd like to check receptivity/thoughts on a couple of possible extensions and changes:

Would you be open to switching the return type for some methods to something that encodes both the matrix and its factorization? (For example, PDMats does this.) The factorization is already computed for some shrinkage methods and it seems a shame to throw all that work away, especially if the only thing you're computing the covariance matrix for is Mahalanobis distance calculations for which the factorized matrix is much more efficient.
For very high dimensional applications, I'm interested in "spiked" covariance models in which some eigenvalues are large and the remaining eigenvalues are assumed to be equal to one other. This allows one to efficiently represent the matrix in Woodbury form as an isotropic (or more generally, diagonal) matrix + a low-rank correction. There are also nonlinear shrinkage methods for this case that I'd be happy to contribute. But it would require a new WoodburyCovariance(rank::Int) <: CovarianceEstimator which I'd be happy to add if you're willing to take both WoodburyMatrices.jl and probably TSVD.jl on as dependencies. Alternately, we could make them package extensions but we'd need to export the WoodburyCovariance name from the main package. That's a bit ugly if, for example, you ever want to use Aqua (you'll fail the "no undefined exports" test, though you can disable it).

The text was updated successfully, but these errors were encountered:

mateuszbaran · 2023-11-20T14:31:12Z

Sure, I'm definitely open to doing some changes. Currently I don't have enough time to work on new features myself but I can do review and maintenance.

Would you be open to switching the return type for some methods to something that encodes both the matrix and its factorization? (For example, PDMats does this.) The factorization is already computed for some shrinkage methods and it seems a shame to throw all that work away, especially if the only thing you're computing the covariance matrix for is Mahalanobis distance calculations for which the factorized matrix is much more efficient.

Sure, maybe the best solution would be adding a parameter that tells whether to return the covariance matrix or its factorization.

For very high dimensional applications, I'm interested in "spiked" covariance models in which some eigenvalues are large and the remaining eigenvalues are assumed to be equal to one other. This allows one to efficiently represent the matrix in Woodbury form as an isotropic (or more generally, diagonal) matrix + a low-rank correction. There are also nonlinear shrinkage methods for this case that I'd be happy to contribute. But it would require a new WoodburyCovariance(rank::Int) <: CovarianceEstimator which I'd be happy to add if you're willing to take both WoodburyMatrices.jl and probably TSVD.jl on as dependencies. Alternately, we could make them package extensions but we'd need to export the WoodburyCovariance name from the main package. That's a bit ugly if, for example, you ever want to use Aqua (you'll fail the "no undefined exports" test, though you can disable it).

This looks like an interesting extension and it would be great to have it implemented here 👍 .
Both WoodburyMatrices.jl and TSVD.jl look fairly lightweight so if they wouldn't increase loading time too much, I would be fine with adding them as dependencies.

timholy · 2023-11-20T18:26:56Z

adding a parameter that tells whether to return the covariance matrix or its factorization

With PDMats you return both: https://github.com/JuliaStats/PDMats.jl/blob/e0cad7caf7d955d5a14fe830f3684026a9652ad7/src/pdmat.jl#L4-L9. Note that AbstractPDMat{T} <: AbstractMatrix{T}, so the returned object acts like "the matrix" for purposes like C[i, j] and C * v. But for C \ v it instead uses the factorization. Essentially, the way to think about PDMats is as a way of caching the matrix and its factorization so you get good performance even if you use \ many times and in many different locations.

I would be fine with adding them as dependencies.

Great! I'll get cracking.

mateuszbaran · 2023-11-20T19:25:08Z

OK, cool. Returning both is also fine.

mateuszbaran added the enhancement New feature or request label Nov 20, 2023

timholy mentioned this issue Nov 20, 2023

Other factorizations: parametrization or new names? JuliaStats/PDMats.jl#198

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible changes and extensions #90

Possible changes and extensions #90

timholy commented Nov 20, 2023 •

edited

Loading

mateuszbaran commented Nov 20, 2023

timholy commented Nov 20, 2023

mateuszbaran commented Nov 20, 2023

Possible changes and extensions #90

Possible changes and extensions #90

Comments

timholy commented Nov 20, 2023 • edited Loading

mateuszbaran commented Nov 20, 2023

timholy commented Nov 20, 2023

mateuszbaran commented Nov 20, 2023

timholy commented Nov 20, 2023 •

edited

Loading