Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of methods with API modifications #18

Merged
merged 6 commits into from
Dec 16, 2018
Merged

Conversation

tlienart
Copy link
Collaborator

@tlienart tlienart commented Dec 16, 2018

Hello,

I'm sorry for the big PR ... I hope you won't mind the changes too much!

Here's a brief summary:

  • API
    • I kept your suggestions from new-api-test (see linearshrinkage.jl)
    • I removed the explicit structures for LedoitWolf, OAS etc which now didn't seem so useful anymore
    • I implemented the common targets from the paper as well as the ledoit-wolf optimal shrinkage for each as per the paper, I also added the oas and rblw shrinkage method for target DiagonalCommonVariance
    • I changed your targetandshrinkage for a linear_shrinkage function which does the same thing as what you were doing but adds the shrinkage
    • I reverted to the use of :auto for LedoitWolf shrinkage intensity, :lw gives the same thing, it allows to have one target and three shrinkage intensity (:lw, :rblw: and :oas) I think that it actually makes more sense than :optimal

Now things look like

cov(X, LinearShrinkageEstimator(DiagonalCommonVariance(), :rblw))

(we can introduce tricks to make this less verbose later on as per #17 )

  • Docstrings
    • a few adds and typo fixes
  • Utils
    • safe01clamp works even if x is NaN (in the pathological case where S=F it's possible though unlikely that we get a 0.0/0.0, since S=F shrinkage doesn't matter --> returning 0.0, that's what you use to do, I think)
    • linshrink is the same as the old shrink but uses the safe01clamp
  • File naming
    • simplecov corresponds to your old basicmethods, I thought that simple made more sense since the estimator is called Simple
    • linearshrinkage has all the commmon targets + shrinkage, I thought this made sense and it's only 200 loc so I thought it wasn't too bad to put everything in one place

Stuff that may not be obvious:

  • sum_var_sij corresponds to the recurring numerator in http://strimmerlab.org/publications/journals/shrinkcov2005.pdf table 2
  • sum_fij corresponds to the sum of the fij for i \neq j also in that table
  • observe that the denominator in all the expressions of lambda^star can always be written as sum((S-F).^2)
  • you had tr(X*X) via dot(X, transpose(X)) but actually in our case X is symmetric and so this is just sum(X.^2)

The tests are essentially the same in terms of what they test, just the few modifications here and there to match the API changes.

Let me know what you think! and apologies again for the big changes in one shot...

Note: assuming you're happy with the changes, I'll fix the README.

@mateuszbaran mateuszbaran mentioned this pull request Dec 16, 2018
@mateuszbaran
Copy link
Owner

Very good PR! I don't mind it's big. A few comments:

  1. I'd prefer using dispatch instead of manually testing for if \lambda isa Symbol but it's a minor change that can be done separately. Instead of
function linear_shrinkage(::CommonCovariance, Xc::AbstractMatrix,
                              S::AbstractMatrix, λ::Shrinkage, n::Int, p::Int)

we'd have

function linear_shrinkage(::CommonCovariance, Xc::AbstractMatrix,
                          S::AbstractMatrix, λ::Real, n::Int, p::Int)

with most of the current code (without if \lambda isa Symbol part) and

function linear_shrinkage(::CommonCovariance, Xc::AbstractMatrix,
                          S::AbstractMatrix, ::Union{Val{:auto}, Val{:lw}}, n::Int, p::Int)

that calls the previous function with appropriately calculated \lambda.

  1. I'm not quite sure why we would need that safe clamp function. I mean, if linshrink is given a NaN then there is likely a problem with a function that called linshrink that we shouldn't hide.

Good work :)

@tlienart
Copy link
Collaborator Author

Yeah I thought about 1, I'll do that.

Re (2) I agree with you, probably not great practice. I think the NaN case is very unlikely, +-large number is much more likely but that gets clamped so it's fine.

Do you think it makes sense to have the clamping in linshrink or would you rather have a clamp before every linshrink?

@mateuszbaran
Copy link
Owner

I think it's OK to have clamp in linshrink. One could make a similar argument as in the NaN case but I think it's a very minor issue.

@codecov-io
Copy link

codecov-io commented Dec 16, 2018

Codecov Report

Merging #18 into master will decrease coverage by 34.73%.
The diff coverage is 56.48%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #18       +/-   ##
===========================================
- Coverage   92.39%   57.65%   -34.74%     
===========================================
  Files           4        3        -1     
  Lines          92      111       +19     
===========================================
- Hits           85       64       -21     
- Misses          7       47       +40
Impacted Files Coverage Δ
src/simplecov.jl 100% <100%> (ø)
src/utils.jl 100% <100%> (ø) ⬆️
src/linearshrinkage.jl 53.92% <53.92%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08f8745...c5cc923. Read the comment docs.

@tlienart
Copy link
Collaborator Author

ok, I've put it out now and think it's ok actually, give me one more commit before merging, I'm putting the target into functions to help maintainability and reduce code duplication

@tlienart
Copy link
Collaborator Author

ok done, feel free to merge if you're happy with the changes.

One side note that I forgot to add, I started using \lambda instead of \rho for shrinkage as I thought it was clearer to distinguish with the p (number of features).

@mateuszbaran mateuszbaran merged commit 694825f into master Dec 16, 2018
@tlienart tlienart deleted the tl-wip branch December 18, 2018 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants