API discussion #17

tlienart · 2018-12-12T00:12:23Z

A few comments/questions

The structs Simple, Uncorrected are a bit useless, would it maybe make sense to have something that's closer to the initial cov call such as:

cov(X, corrected=false) # so basically re-export `cov`
cov(X, method=LedoitWolf(0.5))

In the literature there's a fair bit of stuff on estimating the covariance and also a fair bit of stuff on estimating the inverse covariance (precision), do you think that should be included and if so what kind of API? something like prec(X, ...) or prec(::Method, X)? (by the way I've added a few other methods to Other covariance estimators #8)
The package should probably guide a user who knows they want a robust covariance but don't know which one to use, we should probably have a table of some sorts indicating stuff a given estimator may be good for or not?
we should probably track space/time complexity for the various functions and try to reduce them as much as possible as these estimators become particularly useful for very large dimensional matrices and then the Julia advantage will really shine

The text was updated successfully, but these errors were encountered:

mateuszbaran · 2018-12-12T11:34:08Z

The current interface is inspired by the one from Interpolations.jl. I agree that method would be better as a second argument but Simple and Uncorrected aren't really useless. They make the API internally consistent since method fully describes the estimator. This way you can have an algorithm that uses cov and easily pass any estimation method to such algorithm.
We should also take a look at covariance from StatsBase.jl: cov(X, w::AbstractWeights; mean=nothing, vardim=1, corrected=false) and think about adding the mean and w arguments (see also Supporting passing in weights #10).
Adding prec definitely fits this package. It could have the same (or very similar) API as cov.
Definitely, it can be added to documentation. Unfortunately I don't know what's the state-of-the-art. I've only recently started working with statistics for relatively high-dimensional functional data and discovered that standard covariance estimators are a poor choice there.
Yes, that's a good idea. We should also have some benchmarks. I'll work on that soon.

tlienart · 2018-12-12T23:35:32Z

Ok, on the API front I'll just follow your lead, if you think it's worth experimenting with a method= go for it. Re Simple and Uncorrected, it would maybe be more consistent then to call have a struct Simple with field corrected=false/true and then a unified call with method=Simple(corrected=true)

struct Simple <: CovarianceEstimator
    corrected::Bool
end
Simple() = Simple(false)
Simple(corrected=false) = Simple(corrected)
Simple(; corrected=false) = Simple(corrected)

Ok for (2, 3 and 4). I think for (3) what we can do now is just keep track of them in the docstrings of the methods and update over time if the methods are improved. I'll try adding a few of those.

tlienart · 2018-12-14T11:10:04Z

Another note: I'm currently looking at http://strimmerlab.org/publications/journals/shrinkcov2005.pdf the nice thing is that it suggests all 6 common targets for linear-shrinkage estimator (table 2 p11). Target B corresponds to our chen target and target F to the ledoitwolf one.

They show the optimal shrinkage intensity for all forms using Ledoit Wolf's theorem which is quite neat. (I should check numerically that their nice formula for the optimal intensity corresponds to what we've computed in which case it's quite a bit simpler).

Anyway this may suggest a more generic API maybe for

Linear shrinkage of the form (1-rho)F+rho S for some F and with rho computed via LW as in the Schafer-Strimmer paper or via RBLW as in chen
Nonlinear (SVD-based) shrinkage of the form UDU' for U the eigenvectors of S and some appropriately manipulated D
other estimators

Otherwise we'll end up with estimators with names SchafferStrimmerA or SchafferStrimmer(target=:A) which is quite awkward especially given that they basically just use LedoitWolf. (for reference, my draft branch for this stuff: https://github.com/mateuszbaran/CovarianceEstimation.jl/tree/tl-chqbc, sticking to the current API)

mateuszbaran · 2018-12-14T15:08:55Z

Merging Simple and Corrected into one estimator is a good idea. I'm even thinking now that we could extend it with weights and mean and use cov from StatsBase.jl.

The six targets and optimal shrinking coefficients in that paper look really neat. Good work!

I like the idea of a more generic API. I've started working on it here: https://github.com/mateuszbaran/CovarianceEstimation.jl/commits/new-api-test . It make things like cov(X, SchafferStrimmerA) possible, which is quite compact. I've also reduced code duplication. There is still some work on separating calculation of shrinkage targets and optimal coefficients. What do you think?

tlienart · 2018-12-15T08:29:17Z

Maybe targets can just be symbols instead of empty structs? and on top of that you'd want to be able to allow synonyms for the targets so that a user can refer to them using different ways (I think we should kind of maybe inspire ourselves from DiffEq and how they do to list all of the bazillion of ODE solvers that are out there)

Maybe something like this?

const Shrinkage = Union{Symbol, Real}

struct LinearShrinkage  <: CovarianceEstimator
    shrinkage::Shrinkage
    target::Symbol
    # constructors with checks
end

and then maybe have acceptable targets in a big Dictionary allowing for synonyms like

const targets = Dict{String, Symbol}(
    "ledoitwolf" ==> :constant_correlation, 
    "constant-correlation" ==> :constant_correlation, 
    # ... 
    )

with relevant

function lw_optimalshrinkage(target, args...)
    target == :constant_correlation && lw_optimalshrinkage_constant_correlation(args...)
    target == : ... # etc.
end

and then eventually

cov(X, method = LinearShrinkage(), X) # defaults to Ledoitwolf with optimal shrinkage
cov(X, method = LinearShrinkage(target="constant-correlation", shrinkage=0.5))

and variants?

Either way I think this can mature a bit as we go, it doesn't prevent us from coding the methods and refactoring later :)

mateuszbaran · 2018-12-15T20:21:14Z

Actually, DiffEq uses a ton of empty structs: https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/algorithms.jl . It's primarily for polymorphic dispatch. In my proposal you can easily add a new target/shrinkage estimation method by adding a new method to targetandshrinkage. We could probably use Val{Symbol} as well but I think it's a less commonly used.

Here:

struct LinearShrinkage  <: CovarianceEstimator
    shrinkage::Shrinkage
    target::Symbol
end

It's not a good idea to keep shrinkage non-concretely typed. It can cause worse performance: https://docs.julialang.org/en/v1/manual/performance-tips/index.html#Avoid-fields-with-abstract-type-1 .

And I don't really see the benefit of referring to targets by Strings. We can use a Dict{Symbol, Symbol} or global constants for synonyms.

I'd really like to use polymorphic dispatch for targetandshrinkage so empty structs and Val{Symbol} are the only choices that will work (AFAIK).

I agree API design shouldn't be rushed. Modifying it will be much less work than actually implementing algorithms :).

tlienart · 2018-12-16T00:37:09Z

Ok cool, that makes sense to me. I'm not familiar with Val{Symbol}.

Re performance issue, I don't think it matters given that we're not doing operations on CovarianceEstimator objects, they're created once and it's the cov call internals that should be optimised. But I agree with the spirit, so fine by me!

I'll merge your new-api-test branch into my current one and try to implement the methods there and add tests.

…on linear shrinkage targets with lw shrinkage

mateuszbaran · 2018-12-16T13:48:09Z

Great! Apart from #18 I'm going to work on nonlinear Ledoit-Wolf estimator soon so we'll see how this API works in practice.

You are right, this is a minor performance issue. Anyway I think it's better to make the type concrete here than to worry where exactly it's going to be optimized. Functions in Julia tend to have non-concretely typed arguments so this type instability could propagate quite far. BTW, Val types are described here: https://docs.julialang.org/en/v1/manual/types/#%22Value-types%22-1 .

tlienart closed this as completed Dec 14, 2018

tlienart reopened this Dec 14, 2018

mateuszbaran added a commit that referenced this issue Dec 14, 2018

WIP: an API redesign (discussed in #17).

434a73c

tlienart added a commit that referenced this issue Dec 16, 2018

big PR following API discussion in #17 and implementation of the comm…

48dcb98

…on linear shrinkage targets with lw shrinkage

tlienart mentioned this issue Dec 16, 2018

Implementation of methods with API modifications #18

Merged

tlienart mentioned this issue Dec 16, 2018

Fix readme given API changes #20

Closed

tlienart added the api label Dec 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API discussion #17

API discussion #17

tlienart commented Dec 12, 2018 •

edited

Loading

mateuszbaran commented Dec 12, 2018

tlienart commented Dec 12, 2018

tlienart commented Dec 14, 2018 •

edited

Loading

mateuszbaran commented Dec 14, 2018

tlienart commented Dec 15, 2018 •

edited

Loading

mateuszbaran commented Dec 15, 2018

tlienart commented Dec 16, 2018

mateuszbaran commented Dec 16, 2018

API discussion #17

API discussion #17

Comments

tlienart commented Dec 12, 2018 • edited Loading

mateuszbaran commented Dec 12, 2018

tlienart commented Dec 12, 2018

tlienart commented Dec 14, 2018 • edited Loading

mateuszbaran commented Dec 14, 2018

tlienart commented Dec 15, 2018 • edited Loading

mateuszbaran commented Dec 15, 2018

tlienart commented Dec 16, 2018

mateuszbaran commented Dec 16, 2018

tlienart commented Dec 12, 2018 •

edited

Loading

tlienart commented Dec 14, 2018 •

edited

Loading

tlienart commented Dec 15, 2018 •

edited

Loading