Skip to content

Latest commit

 

History

History
201 lines (140 loc) · 3.67 KB

likelihood_how-to.md

File metadata and controls

201 lines (140 loc) · 3.67 KB

Notes on likelihood

Here we document how to work with likelihoods.

First let’s read in the data we generated from the multi-species competitive model.

x <- readRDS('lvMultiSim.Rds')
plot(sort(x, decreasing = TRUE), log = 'y', 
     xlab = 'Species rank', ylab = 'Abundance')

We’re going to fit the log-series distribution to the data. Here we make a function to compute the log-series.

lseries <- function(b, n) {
    1 / log(1 / (1 - exp(-b))) * exp(-b * n) / n
}

hist(x, probability = TRUE)

points(1:50, lseries(0.1, 1:50), col = 'red', type = 'b')
points(1:50, lseries(0.5, 1:50), col = 'orange', type = 'b')
points(1:50, lseries(0.01, 1:50), col = 'blue', type = 'b')

A likelihood is simply the probability of observing the data. Because these probabilities are typically very small, we most often work in log transformed probability, or log likelihood.

sum(log(lseries(0.001, x)))
## [1] -226.1041
sum(log(lseries(0.01, x)))
## [1] -207.8483
sum(log(lseries(0.5, x)))
## [1] -418.8434

## quartz_off_screen 
##                 2

Fitting the model to the data precisely means finding the parameter value that maximizes the likelihood function.

llLSeries <- function(b, n) {
    sum(log(lseries(b, n)))
}


optimize(llLSeries, interval = c(0, 10), n = x, maximum = TRUE)
## $maximum
## [1] 0.02533946
## 
## $objective
## [1] -204.2378

We can did all that likelihood by hand, but we can also use the pika package

# devtools::install_github('ajrominger/pika')

library(pika)
s <- sad(x, 'lseries')
s
## species abundance distribution modeled by "lseries" with parameter 
## [1] 0.02535964
## includes data
logLik(s)
## 'log Lik.' -204.2378 (df=1)

The pika package let’s us do nicer plots, and calculate a z-statistic, which is a goodness of fit measure.

plot(s, ptype = 'rad')

logLikZ(s)
## [1] 3.331813
pchisq(3.33, df = 1, lower.tail = FALSE)
## [1] 0.06802687

Model comparison necessitates that we penalize models for their complexity (i.e. number of parameters). We do this with AIC

ls <- sad(x, 'lseries')
ln <- sad(x, 'plnorm')
nb <- sad(x, 'tnegb')

AIC(ls)
## [1] 410.4756
AIC(ln)
## [1] 397.503
AIC(nb)
## [1] 394.2482
plot(ls, ptype = 'rad', log = 'y', main = 'logseries')

plot(ln, ptype = 'rad', log = 'y', main = 'lognormal')
## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
## collapsing to unique 'x' values

plot(nb, ptype = 'rad', log = 'y', main = 'negbinom')

logLikZ(ls)
## [1] 3.331813
logLikZ(nb)
## [1] 2.101964e-08

Let’s see if output from a neutral model looks any different

library(roleR)

p <- untbParams(1000, 100000, 200, 0.01, 0.1, 'oceanic_island', 
                50000, 50000)
neutMod <- roleModel(p)
neutMod <- iterModel(neutMod)
neutModFin <- getFinalState(neutMod)
y <- getSumStats(neutModFin, list(abund = rawAbundance))
y <- y$abund$abund
y <- y[y > 0]

untbNB <- sad(y, 'tnegb')
plot(untbNB, ptype = 'rad', log = 'y')

logLikZ(untbNB)
## [1] 0.000393621