Skip to content

Commit

Permalink
updated vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
ksidorczuk committed Apr 1, 2020
1 parent 148d3d9 commit f28acdc
Showing 1 changed file with 15 additions and 7 deletions.
22 changes: 15 additions & 7 deletions vignettes/overview.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -248,21 +248,29 @@ $$

However we have important restriction that $n_{1,\cdot} = n_{1,1} + n_{1,0}$ and
$n_{\cdot, 1} = n_{1,1} + n_{0,1}$ are known and fixed as they describe the number
of ,,ones" for target and feature respectively.
of "ones" for target and feature respectively.

This might look very complicated but this restrictions in fact simplifies
This might look very complicated but this restriction in fact simplifies
our computation significantly.

Observe that $n_{1,1}$ is from range $[0,min(n_{\cdot, 1}, n_{1, \cdot})]$.
So we get probability of certain contingency table as conditional distribution,
as impose restrictions on two parameters $n_{\cdot, 1} $ and $n_{1, \cdot}$
as impose restrictions on two parameters $n_{\cdot, 1}$ and $n_{1, \cdot}$
We can compute IG for each possible value of $n_{1,1}$ and finally we get
distribution of Information Gain under hypothesis that target and feature
are independent.

Having exact distribution lets us perform permutation test much quicker as we
no longer need to perform any replications. Furthermore, by using
exact test we will get precise values of tails which was not guaranteed with
random permutations.
The calculation of distributions is performed by `distr_crit` function.
To facilitate time-consuming computations when dealing with a very large
number of features, we introduce a possibility to set the limit of
calculated contingence matrices using `iter_limit` parameter. By default,
IG is calculated for 200 contingence matrices, therefore we get an
approximate distribution of Information Gain.

Having exact or even approximate (when limiting the number of calculated
contingence matrices) distribution lets us perform permutation test much
quicker as we no longer need to perform any replications. Furthermore,
by using exact test we will get precise values of tails which was not
guaranteed with random permutations.

# References

0 comments on commit f28acdc

Please sign in to comment.