Skip to content

Latest commit

 

History

History
2608 lines (2291 loc) · 107 KB

Chrotomyini_compgenus_analyses_05022022.md

File metadata and controls

2608 lines (2291 loc) · 107 KB
title author date output
Chrotomys analysis compilation - genera
S.M. Smith
5/2/2022
html_document
keep_md
true

Load up Chrotomyini trabecular bone architecture (TBA) data and standardize variables:

d <- read.csv(file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\05062022 Philippine Murids segmentation parameters and morphological data - TBA data total BoneJ (full).csv", header = T)

d <- d[d$tribe=="chroto",c(1:2, 4:23)]

d <- 
  d %>% 
  mutate(bvtv = as.numeric(bvtv))

d <- 
  d %>%
  mutate(mass_s = rethinking::standardize(log10(mass_g)),
         elev_s = rethinking::standardize(elev), 
         bvtv_s = rethinking::standardize(bvtv),
         tbth_s = rethinking::standardize(tbth),
         tbsp_s = rethinking::standardize(tbsp),
         conn_s = rethinking::standardize(conn),
         cond_s = rethinking::standardize(m_connd),
         cond_s2 = rethinking::standardize(connd), 
         da_s = rethinking::standardize(da))

# remove C. gonzalesi and R. isarogensis, singletons:
d <- 
  d %>% 
  filter(taxon!="Chrotomys_gonzalesi") %>% 
  filter(taxon!="Rhynchomys_isarogensis")

# Make categorical vars into factors
d <- 
  d %>%
  mutate(loco = factor(loco), 
         hab_simp = factor(hab_simp),
         genus = factor(genus))

# Specify colors for plots:
cols = c("#86acca","#ab685b", "#3370a3", "#1f314c","#5a9fa8")

Load in phylogeny: REMEMBER: A <- ape::vcv.phylo(phylo), add corr = T if your tree is NOT SCALED TO 1.

ch.tre <- read.nexus(file = "G:\\My Drive\\Philippine rodents\\Chrotomys\\analysis\\SMS_PRUNED_and_COLLAPSED_03292022_OTUsrenamed_Rowsey_PhBgMCC_LzChrotomyini.nex")

ch <- ape::vcv.phylo(ch.tre, corr = T)

d <- 
  d %>% 
  mutate(phylo = taxon)

#######################################################################

Comparisons of several models for each trabecular bone metric

#######################################################################

I'm going to put together and compare a few different models for each trabecular bone metric, with a few combinations of potential model components. This is for the purpose of not only comparing fit (i.e., how good each model is at predicting out-of-sample), but also to understand how the addition of predictors changes the models' estimates. I'll also include some assessments of the influence of individual specimens on the model fit.

######################################################################################

Bone Volume Fraction (BV.TV), and some general methodological considerations

######################################################################################

For comparison: a model that doesn't include genus as a predictor:

# BV.TV by mass
ch.36 <- 
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 5,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.36")
print(ch.36)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: bvtv_s ~ 1 + mass_s 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.02      0.10    -0.22     0.18 1.00     3283     2968
## mass_s        0.56      0.10     0.36     0.76 1.00     3677     2967
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.83      0.08     0.70     0.99 1.00     4107     2968
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

A scatterplot of the above model, with genera color-coded (the same color code to be used for the remainder of this document):

range(d$mass_s)
## [1] -1.866104  1.451091
nd <- tibble(mass_s = seq(from = -2, to = 1.75, length.out = 67))

fitted(ch.36,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = d, aes(y = bvtv_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  xlim(min(nd), max(nd))+
  labs(x = "Mass in grams (standardized)",
       y = "bone volume fraction (standardized)") 

From this plot, you can really see that genus is closely linked to body mass. You can also see that there is definitely some correlation between mass and BV.TV but the relationship has a lot of error. You can also detect this from the model parameters: mass_s (the influence of mass, a.k.a slope, listed under "Population-level effects") = 0.56, with CIs 0.36-0.76, and sigma = 0.83.

Now a series of models that estimate a separate intercept (= estimate of mean BV.TV) for each genus, which here is a proxy for substrate use. I printed each model for quick inspection, and at the end of all the models, there is a plot including all four of them.

# By genus only
ch.24 <- 
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.24")
print(ch.24)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: bvtv_s ~ 0 + genus 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys           0.18      0.19    -0.21     0.56 1.00     5779     2647
## genusArchboldomys    -0.82      0.32    -1.43    -0.19 1.00     6098     3360
## genusChrotomys        0.69      0.19     0.33     1.06 1.00     6144     2987
## genusRhynchomys      -0.20      0.34    -0.85     0.45 1.00     5480     2936
## genusSoricomys       -0.63      0.19    -1.01    -0.25 1.00     5510     2876
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.84      0.08     0.70     1.00 1.00     5015     2956
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus and mass
ch.25 <- 
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.25")
print(ch.25)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: bvtv_s ~ 0 + genus + mass_s 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys           0.06      0.17    -0.28     0.42 1.00     3220     2224
## genusArchboldomys    -0.26      0.33    -0.90     0.37 1.00     2355     2392
## genusChrotomys       -0.22      0.30    -0.78     0.38 1.00     1679     2052
## genusRhynchomys      -1.32      0.42    -2.10    -0.50 1.00     1809     2618
## genusSoricomys        0.70      0.40    -0.10     1.45 1.00     1512     2107
## mass_s                1.00      0.27     0.46     1.52 1.00     1389     1970
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.74      0.07     0.62     0.90 1.00     3381     2588
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus and phylogeny/intraspecific variance, no mass
ch.44 <- 
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 0 + genus + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.89), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.44")
print(ch.44)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: bvtv_s ~ 0 + genus + (1 | gr(phylo, cov = ch)) + (1 | taxon) 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~phylo (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.62      0.36     0.04     1.37 1.00     1133     1770
## 
## ~taxon (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.40      0.27     0.01     1.00 1.00     1019     1542
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys           0.13      0.54    -1.00     1.18 1.00     3425     2716
## genusArchboldomys    -0.55      0.65    -1.78     0.77 1.00     4065     3026
## genusChrotomys        0.56      0.55    -0.59     1.64 1.00     3648     2616
## genusRhynchomys      -0.13      0.65    -1.41     1.15 1.00     4472     2900
## genusSoricomys       -0.48      0.55    -1.52     0.63 1.00     3431     2942
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.70      0.07     0.58     0.85 1.00     5518     3021
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus, mass, and phylogeny/intraspecific variance
ch.26 <-
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.85), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.26")
print(ch.26)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: bvtv_s ~ 0 + genus + mass_s + (1 | gr(phylo, cov = ch)) + (1 | taxon) 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~phylo (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.50      0.32     0.03     1.19 1.00     1273     2007
## 
## ~taxon (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.36      0.24     0.02     0.91 1.00     1039     1655
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys           0.07      0.48    -0.92     1.04 1.00     3331     2470
## genusArchboldomys    -0.27      0.61    -1.44     0.98 1.00     3997     2763
## genusChrotomys       -0.02      0.55    -1.12     1.07 1.00     3260     2484
## genusRhynchomys      -0.78      0.65    -2.00     0.54 1.00     3363     2894
## genusSoricomys        0.35      0.61    -0.88     1.57 1.00     3543     2774
## mass_s                0.78      0.31     0.16     1.41 1.00     3657     3065
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.67      0.07     0.56     0.82 1.00     4037     2400
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

A four-part plot including all of the models:

ch.24_halfeye <- ch.24 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "BV.TV", y = "genus")+
  ggtitle(label = "BV.TV by genus only")

ch.25_halfeye <- ch.25 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "BV.TV by genus/mass")+
  labs(x = "BV.TV", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.44_halfeye <- ch.44 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "BVTV by genus/phylo/intrasp")+
  labs(x = "BV.TV", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.26_halfeye <- ch.26 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "BV.TV by genus/mass/phylo/intrasp.")+
  labs(x = "BV.TV", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.24_halfeye/ch.44_halfeye|ch.25_halfeye/ch.26_halfeye

Takeaways from this set of plots:

  1. Accounting for mass in estimation of BV.TV makes the relative estimates of BV.TV across genera change pretty dramatically. For its body size, Soricomys has relatively high BV.TV, and Rhynchomys has low BV.TV for its body size. Chrotomys, Archboldomys, and Apomys have estimated BV.TV values that are pretty close to the mean (represented by the vertical dotted line). One way to tell the significance of the differences between genera is to subtract the posterior probability distributions from one another.The difference between distributions is "significant" if the distribution of differences between them does not overlap 0. I'm not really a fan of statistical significance but it's good to have some definition of that I guess.

Here are the difference distributions between pairs of genera for the model including only genus and mass as predictors (upper right in the preceding plot):

ch.25 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Between-genus BV.TV difference distributions")+
  labs(x = "difference", y = "comparison")

From this comparison, it looks like Rhynchomys has significantly lower bone volume fraction than pretty much everything it is compared to, with the possible exception of Archboldomys (6th row down) - the distribution of the Rhynchomys-Archboldomys difference overlaps 0 slightly. It also appears that Soricomys has significantly higher BV.TV than Archboldomys.

This brings me to my second takeaway from the 4-part plot.

  1. The genus/mass model doesn't include the effects of phylogenetic correlation or intraspecific variance. When we estimate BV.TV (with genus and mass as predictors) with a phylogenetic multilevel model, and consider intraspecific variance as well (lower right in the 4-part plot above), the difference distributions look like this:
ch.26 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Between-genus BV.TV difference distributions")+
  labs(x = "difference", y = "comparison")

The relative location of each difference distribution hasn't changed dramatically (e.g., note location of Rhynchomys comparisons relative to the mean), but the probability density is much more spread out. This is telling us that, because our specimens share some features as a result of phylogenetic structure, the model is less sure that there are significant differences between genera that are related specifically to their generic properties and not to phylogenetic correlation.

So how strong is the influence of phylogeny on BV.TV? The model outputs a value that is called "sd_phylo__Intercept", which, as I understand it, is an estimate of the amount of variance in the data that is related to phylogenetic correlation structure. We can use that to calculate the more familiar lambda, a.k.a. phylogenetic signal.

(A reminder for me of what phylogenetic signal is: it is an intra-class correlation coefficient (ICC), which tells you how much things that are in groups together resemble each other, quantitatively speaking. 0 means that the individuals don't show any similarity that can be attributed to their relatedness (no phylogenetic structure). 1 Means that the trait you're looking at evolves exactly as expected under Brownian Motion evolution - that is, the phylogenetic structure underlying the data set explains ALL of the variation you see among the individual measurements. This is different from considering intraspecific variation as a potential source of covariance. Including the (1|taxon) term accounts for group-level similarity that is related to taxon (species) ONLY, irrespective of broader phylogenetic structure (it does not refer to the phylogenetic covariance matrix in its calculations). I feel like the more times I try and explain this to myself, the better I will understand it. I've always found PCM concepts somewhat opaque and I'm really trying to be clear about them so that I can mayyybe more precisely explain what my results mean. Thanks for sticking with me.)

Here's a calculation of lambda for ch.26, the model that includes genus, mass, phylogeny, and intraspecific variance:

hyp <- "sd_phylo__Intercept^2 / (sd_phylo__Intercept^2 + sd_taxon__Intercept^2 + sigma^2) = 0"
h.bvtv <- hypothesis(ch.26, hyp, class = NULL)
h.bvtv
## Hypothesis Tests for class :
##                 Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (sd_phylo__Interc... = 0     0.29      0.23        0     0.74         NA
##   Post.Prob Star
## 1        NA    *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.

The estimate of lambda here is 0.29. I would say that's medium-low? But why should we trust a point estimate here (especially when the upper 95% CI is 0.74)? We can look at lambda as a probability distribution instead. This will show us where the majority of the probability density actually falls.

ph.bvtv.pl <- ggplot() +
  geom_density(aes(x = h.bvtv$samples$H1), fill = "red", alpha = 0.5) +
  theme_bw() +
  xlim(0,1) +
  labs(y = "density", x = "lambda: bone volume fraction")
ph.bvtv.pl

I had never thought of lambda as a distribution before this and I think it makes these results both easier and more difficult to understand. From this density plot, we can tell that there's a density peak around, say, 0.03. I would call that a low value for lambda. HOWEVER - just because that is a local peak does not mean that it represents the majority of the probability density. There is a huge amount of probability density (area under the curve) between lambda = 0.125 and ~0.56. So it makes sense that the point estimate of lambda is in the middle-ish of the values between the peak (~0.03) and the point where the density drops off (~0.56).

What the hell does this mean about the actual influence of phylogeny on BV.TV? I'm still trying to figure that out. Phylogenetic structure in these data clearly explains some of the variation in BV.TV across genera - which, yeah, of course it does, you're using genus as a proxy for substrate-use groups (e.g., Chrotomys = varying degrees of digging, Rhynchomys = bipedal hopping, etc). But after accounting for mass and group-level effects (phylo/intraspecific var) in our model, there are still differences between genera. They aren't necessarily "significant", according to the difference distributions shown above, but there are definitely signals that could be related to organismal ecology.

Ok departing from the discussion of the plots above: I'm gonna do some model comparison to see which model is best at predicting out-of-sample (which is the "best fit" for the data). I'm going to use two methods: the Widely Applicable Information Criterion (WAIC) and Leave-One-Out cross-validation (LOO).

ch.36 <-  add_criterion(ch.36, c("waic", "loo")) # mass only
ch.24 <-  add_criterion(ch.24, c("waic", "loo")) # genus only 
ch.25 <-  add_criterion(ch.25, c("waic", "loo")) # genus and mass
ch.44 <-  add_criterion(ch.44, c("waic", "loo")) # genus and phylogeny
ch.26 <-  add_criterion(ch.26, c("waic", "loo")) # genus mass phylogeny

loo(ch.36) # mass only : 0 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -83.9 4.9
## p_loo         2.7 0.5
## looic       167.9 9.8
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
loo(ch.24) # genus only : 0 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate  SE
## elpd_loo    -85.2 4.8
## p_loo         4.6 0.7
## looic       170.4 9.6
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
loo(ch.25) # genus and mass : 0 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.0  5.3
## p_loo         5.6  0.9
## looic       155.9 10.7
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
loo(ch.44) # genus and phylogeny : 1 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -76.8  6.4
## p_loo        10.5  2.0
## looic       153.6 12.9
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     66    98.5%   431       
##  (0.5, 0.7]   (ok)        1     1.5%   1970      
##    (0.7, 1]   (bad)       0     0.0%   <NA>      
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
loo(ch.26) # genus mass phylogeny : 1 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -74.6  6.6
## p_loo        10.8  2.0
## looic       149.1 13.3
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     66    98.5%   1034      
##  (0.5, 0.7]   (ok)        1     1.5%   310       
##    (0.7, 1]   (bad)       0     0.0%   <NA>      
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.

This analysis (LOO) can tell you, among other things, if there are specimens that are having a strong effect on the model fit. It does this by leaving one specimen out at a time (hence the name) and seeing how well the model can predict the value for that left-out specimen when it is not a part of the model. Specimens that are highly influential have a high Pareto K value. In two of the models here (ch.44 and ch.26, the phylogenetic models), one specimen has a relatively high Pareto K. Let's see if it's the same specimen in both.

tibble(k   = ch.44$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 190968 Soricomys leonardocoi, K = 0.59
## # A tibble: 67 x 3
##        k   row specimen                    
##    <dbl> <int> <chr>                       
##  1 0.591    60 190968 Soricomys_leonardocoi
##  2 0.468    19 193523 Archboldomys_maximus 
##  3 0.425    14 216435 Apomys_sierrae       
##  4 0.417    62 188313 Soricomys_montanus   
##  5 0.417    22 193526 Archboldomys_maximus 
##  6 0.414    16 216462 Apomys_sierrae       
##  7 0.413    51 167307 Soricomys_kalinga    
##  8 0.410    26 221833 Chrotomys_mindorensis
##  9 0.375    52 167309 Soricomys_kalinga    
## 10 0.367    38 188348 Chrotomys_whiteheadi 
## # ... with 57 more rows
tibble(k   = ch.26$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216462 Apomys sierrae, K = 0.69
## # A tibble: 67 x 3
##        k   row specimen                    
##    <dbl> <int> <chr>                       
##  1 0.693    16 216462 Apomys_sierrae       
##  2 0.495    19 193523 Archboldomys_maximus 
##  3 0.423    62 188313 Soricomys_montanus   
##  4 0.418    26 221833 Chrotomys_mindorensis
##  5 0.417     3 218312 Apomys_banahao       
##  6 0.416    22 193526 Archboldomys_maximus 
##  7 0.409    43 193744 Chrotomys_whiteheadi 
##  8 0.406    60 190968 Soricomys_leonardocoi
##  9 0.364    14 216435 Apomys_sierrae       
## 10 0.357    20 193524 Archboldomys_maximus 
## # ... with 57 more rows

Why are these specimens potentially having such a strong effect?

  1. FMNH 216462, Apomys sierrae: This animal has an anomalously low bone volume fraction compared to all other individuals in its species (17% bone versus all other specimens 29-37% bone). It has the shortest head and body length at 135 mm, compared to ~140-150 for all other specimens. It is also unusual in that it only has five lumbar vertebrae rather than 6. I still analyzed the third lumbar position for it, but it may represent a veeery slightly different functional position in the lumbar column. However - looking at the specimen just now in Dragonfly, all the vertebrae kind of look like that. So It might just be a weirdo.

  2. FMNH 190968, Soricomys leonardocoi: This animal also has anomalously low bone volume fraction for its species (14% bone as opposed to all others 19-22% bone). However, this is in an analysis (ch.44) that does not account for body mass. 190968 weighs 35 grams, and the average mass of the species is 32 grams, so it's slightly heavier than average and has a lower BV.TV. In the phylogenetic model that has mass as a predictor (ch.26), its Pareto K is below the "good" threshold of 0.5 (K = 0.41) - it doesn't have a super strong effect on that model fit. So incorporating mass apparently allows the model to not be surprised about this specimen's BV.TV. I'm not sure I have an adequate explanation for how that is happening.

Should I remove these guys, at least the Apomys? I don't know if that's necessary. It seems like it represents true variation in the population, and I'm loath do get rid of it because of that. I could try using a Student distribution for the posterior probability distribution, which would result in a model that is less surprised by outliers, but I haven't decided if I think the problem is extreme enough for that to be necessary.

Now I'm going to compare across models to see which is the best at predicting BV.TV, and by how much. This won't give any information about the causality of the variables involved, but it can tell how big of a difference the inclusion or exclusion of a variable makes in the predictive accuracy of the model.

bvtv_comp <- loo_compare(ch.36, ch.24, ch.25, ch.44, ch.26, criterion = "waic")

# Convert to WAIC differences among models, including standard error term.
cbind(waic_diff = bvtv_comp[, 1] * -2,
      se        = bvtv_comp[, 2] * 2)
##       waic_diff       se
## ch.26  0.000000 0.000000
## ch.44  4.577944 3.682324
## ch.25  7.343953 6.397010
## ch.36 19.396496 9.877691
## ch.24 21.905571 9.678130
bvtv_comp[, 7:8] %>% 
  data.frame() %>% 
  rownames_to_column("model_name") %>% 
  mutate(model_name = fct_reorder(model_name, waic, .desc = T)) %>% 
  ggplot(aes(x = waic, y = model_name, 
             xmin = waic - se_waic, 
             xmax = waic + se_waic)) +
  geom_pointrange(shape = 21) +
  labs(title = "WAIC comparison across BV.TV estimation models",
       x = "WAIC", y = "model") +
  theme(axis.ticks.y = element_blank())

So in this comparison, the WAIC (= Widely Applicable Information Criterion) indicates that ch.26, the phylogenetic model (+ intraspecific variance) that includes genus and mass, is the best predictive model, but not by much - its standard error (horizontal line on the plot) overlaps that of ch.44 (phylo only) and ch.25 (genus and mass).

Based on what I know about trabecular bone morphology, and what I've seen so far re: correlation of body mass with TBA metrics, it doesn't make sense to exclude mass from the model. Body mass has an effect on skeletal structure to varying degrees, depending on the metric of interest. so even though the phylogenetic model that does not include mass as a predictor (ch.44) doesn't perform that much worse than the model including mass (ch.26), I'm not going to extensively consider models that don't take mass into account. I will, however, compare across metrics to see the difference in the influence of mass on the TBA metrics that I'm examining. This is part of the overall goal of the study: to figure out a little more about what the various trabecular bone metrics are being influenced by/are associated with, in addition to the hypothesized biomechanical function of trabecular bone variation.

I think I'll keep doing comparisons with the no-phylogeny type models for now (ch.25 and ch.36, above) because I think that comparison is relevant to understanding HOW phylogenetic structure affects results. I know people always are saying, don't include both phylogenetic and non-phylogenetic analyses in your results, but I think it's useful. Thoughts on this point would be appreciated.

So, I'm going to try and explain some of my thoughts about what these results mean biomechanically and ecologically. But first I need to look at the other trabecular bone metrics here.

######################################

Trabecular Thickness (Tb.Th)

######################################

A model without genus, mass as the only predictor:

# Tb.Th by mass
ch.40 <- 
  brm(data = d, 
      family = gaussian,
      tbth_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.40")
print(ch.40)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: tbth_s ~ 1 + mass_s 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.01      0.05    -0.10     0.08 1.00     3830     2767
## mass_s        0.93      0.04     0.84     1.01 1.00     4168     2809
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.37      0.03     0.31     0.44 1.00     3707     2787
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

A scatterplot of the above model, with genera color-coded:

ch.40.pl <- fitted(ch.40,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = d, aes(y = tbth_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  labs(x = "Mass in grams (standardized)",
       y = "trabecular thickness (standardized)") 
ch.40.pl

Daaang look at that correlation! It's not all that unexpected based on what I've seen in the literature - Tb.Th is VERY closely associated with mass, such that it is almost entirely explained by it. Note the mass_s value in the model: 0.93, with sigma = 0.37. mass_s in the BV.TV model was 0.56, with sigma = 0.83. The relationship with Tb.Th is much stronger. It's possible that there is essentially no information other than body size information in this metric! Let's look closer.

# By genus only
ch.50 <- 
  brm(data = d, 
      family = gaussian,
      tbth_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.50")
print(ch.50)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: tbth_s ~ 0 + genus 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys           0.02      0.12    -0.22     0.25 1.00     5777     2753
## genusArchboldomys    -0.79      0.20    -1.18    -0.38 1.00     5610     2896
## genusChrotomys        1.01      0.11     0.78     1.22 1.00     5427     2858
## genusRhynchomys       0.75      0.20     0.37     1.14 1.00     4999     3067
## genusSoricomys       -1.16      0.12    -1.39    -0.91 1.00     5335     3059
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.49      0.05     0.41     0.59 1.00     4411     3210
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus and mass
ch.51 <- 
  brm(data = d, 
      family = gaussian,
      tbth_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.51")
print(ch.51)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: tbth_s ~ 0 + genus + mass_s 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.12      0.08    -0.28     0.03 1.00     3396     2855
## genusArchboldomys    -0.07      0.16    -0.39     0.24 1.00     2016     2182
## genusChrotomys       -0.10      0.15    -0.39     0.20 1.00     1210     1966
## genusRhynchomys      -0.64      0.21    -1.04    -0.23 1.00     1370     2205
## genusSoricomys        0.46      0.20     0.06     0.85 1.00     1188     1893
## mass_s                1.19      0.14     0.91     1.46 1.00     1121     1722
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.33      0.03     0.28     0.40 1.00     2932     2529
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus, mass, and phylogeny/intraspecific variance
ch.27 <-
  brm(data = d, 
      family = gaussian,
      tbth_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.27")
print(ch.27)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: tbth_s ~ 0 + genus + mass_s + (1 | gr(phylo, cov = ch)) + (1 | taxon) 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~phylo (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.34      0.21     0.02     0.84 1.00     1149     1668
## 
## ~taxon (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.21      0.14     0.01     0.55 1.00     1121     1738
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.09      0.35    -0.81     0.65 1.00     2781     2444
## genusArchboldomys    -0.19      0.43    -1.07     0.64 1.00     3964     2764
## genusChrotomys        0.12      0.36    -0.60     0.83 1.00     3035     2325
## genusRhynchomys      -0.27      0.45    -1.13     0.68 1.00     3218     2694
## genusSoricomys        0.10      0.41    -0.73     0.89 1.00     3026     2737
## mass_s                0.93      0.16     0.61     1.24 1.00     3301     2569
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.28      0.03     0.23     0.34 1.00     4174     2920
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

A four part plot like the one I did for BV.TV, but instead of the phylogenetic model with no mass, I reproduced the model with mass only (no genus):

ch.50_halfeye <- ch.50 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "Tb.Th", y = "genus")+
  ggtitle(label = "Tb.Th by genus only")

ch.51_halfeye <- ch.51 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Tb.Th by genus/mass")+
  labs(x = "Tb.Th", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.27_halfeye <- ch.27 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Tb.Th by genus/mass/phylo/intrasp.")+
  labs(x = "Tb.Th", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.40.pl.nokey <- ch.40.pl +
  theme(legend.position = "none")
ch.50_halfeye/ch.40.pl.nokey|ch.51_halfeye/ch.27_halfeye

From this set of plots, I think we can say that Tb.Th variance is almost totally explained by body mass and phylogenetic covariance structure. The change in estimated difference among genera between the genus/mass model and the phylogenetic model is dramatic. Here's comparison plots. I truncated the genus names so that the plots are a little easier to fit next to each other (they just have the "omys" part removed):

diff.tbth.ch.51 <- ch.51 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  mutate(.variable = str_replace_all(.variable, "omys", "")) %>% 
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Tb.Th difference, genus+mass")+
  labs(x = "difference", y = "comparison")+
  theme(plot.title = element_text(size = 9))+
  scale_x_continuous(limits = c(-3, 3))

diff.tbth.ch.27 <- ch.27 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  mutate(.variable = str_replace_all(.variable, "omys", "")) %>% 
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Tb.Th difference, genus+mass+phylo")+
  labs(x = "difference", y = "comparison")+
  theme(plot.title = element_text(size = 9))+
  scale_x_continuous(limits = c(-3, 3))

diff.tbth.ch.51|diff.tbth.ch.27
## Warning: Removed 8 rows containing missing values (stat_slabinterval).

Also - note the width of the distributions in the plot on the right (I made the scales the same so you can see the difference).

In light of this dramatic difference, let's look at lambda for this one:

hyp <- "sd_phylo__Intercept^2 / (sd_phylo__Intercept^2 + sd_taxon__Intercept^2 + sigma^2) = 0"
h.tbth <- hypothesis(ch.27, hyp, class = NULL)
h.tbth
## Hypothesis Tests for class :
##                 Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (sd_phylo__Interc... = 0     0.43      0.27        0     0.87         NA
##   Post.Prob Star
## 1        NA    *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.

Oooookay so the estimate of lambda is 0.43, with an upper CI of 0.87. The plot:

ph.tbth.pl <- ggplot() +
  geom_density(aes(x = h.tbth$samples$H1), fill = "orange", alpha = 0.5) +
  theme_bw() +
  xlim(0,1) +
  labs(y = "density", x = "lambda: trabecular thickness")
ph.tbth.pl

I would say in this one that most of the probability density is towards the higher end of the range, suggesting that there's a pretty big influence of phylogeny on this particular metric. Compare with the one for BV.TV:

ph.bvtv.pl/ph.tbth.pl

So on the whole, there's likely to be more of a phylogenetic signal in Tb.Th than BV.TV. There is that peak on the far left (at around 0.03), similar in location to the one in the BV.TV lambda, but the peak is lower and so represents a smaller proportion of the total probaility density.

Model comparison and check for outliers:

ch.40 <- add_criterion(ch.40, c("loo", "waic"))
ch.50 <- add_criterion(ch.50, c("loo", "waic"))
ch.51 <- add_criterion(ch.51, c("loo", "waic"))
ch.27 <- add_criterion(ch.27, c("loo", "waic"))

loo(ch.40) # mass only : 0 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -29.6  5.4
## p_loo         2.8  0.6
## looic        59.1 10.8
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
loo(ch.50) # genus only : 0 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -49.9  8.0
## p_loo         5.6  1.3
## looic        99.9 16.0
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
loo(ch.51) # mass and genus : 1 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -24.1  5.2
## p_loo         7.0  1.3
## looic        48.3 10.4
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     66    98.5%   608       
##  (0.5, 0.7]   (ok)        1     1.5%   741       
##    (0.7, 1]   (bad)       0     0.0%   <NA>      
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
loo(ch.27) # mass genus and phylo: 1 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -16.3  5.7
## p_loo        11.2  1.8
## looic        32.6 11.4
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     66    98.5%   710       
##  (0.5, 0.7]   (ok)        1     1.5%   983       
##    (0.7, 1]   (bad)       0     0.0%   <NA>      
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.

Once again the models with multiple predictors (ch.51 and ch.27) have 1 specimen each with excessive influence. Who are they?

tibble(k   = ch.51$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 193526 Archboldomys maximus, K = 0.52
## # A tibble: 67 x 3
##        k   row specimen                   
##    <dbl> <int> <chr>                      
##  1 0.516    22 193526 Archboldomys_maximus
##  2 0.436    46 189834 Rhynchomys_labo     
##  3 0.379    37 198736 Chrotomys_silaceus  
##  4 0.369    44 189832 Rhynchomys_labo     
##  5 0.354    35 193731 Chrotomys_silaceus  
##  6 0.314    49 189839 Rhynchomys_labo     
##  7 0.311    24 214320 Archboldomys_maximus
##  8 0.308    34 193726 Chrotomys_silaceus  
##  9 0.301    20 193524 Archboldomys_maximus
## 10 0.300    14 216435 Apomys_sierrae      
## # ... with 57 more rows
tibble(k   = ch.27$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216435 Apomys sierrae, K = 0.52
## # A tibble: 67 x 3
##        k   row specimen                    
##    <dbl> <int> <chr>                       
##  1 0.516    14 216435 Apomys_sierrae       
##  2 0.494     9 236310 Apomys_datae         
##  3 0.488    46 189834 Rhynchomys_labo      
##  4 0.479    32 193720 Chrotomys_silaceus   
##  5 0.463    43 193744 Chrotomys_whiteheadi 
##  6 0.458    22 193526 Archboldomys_maximus 
##  7 0.390     4 218313 Apomys_banahao       
##  8 0.374    31 193708 Chrotomys_silaceus   
##  9 0.338    26 221833 Chrotomys_mindorensis
## 10 0.332    18 216464 Apomys_sierrae       
## # ... with 57 more rows

Ok so. Since the cutoff for being considered "ok" in these is 0.50, and both of them have K = 0.52ish, I think they are probably not doing anything crazy to the model. Certainly nothing as dramatic as the Apomys sierrae specimen was doing to the BV.TV model (for which K = 0.69).

Compare models directly:

tbth_comp <- loo_compare(ch.40, ch.50, ch.51, ch.27, criterion = "waic") 

# Convert to WAIC differences among models, including standard error term.
cbind(waic_diff = tbth_comp[, 1] * -2,
      se        = tbth_comp[, 2] * 2)
##       waic_diff       se
## ch.27   0.00000  0.00000
## ch.51  16.18253  7.05953
## ch.40  27.29115 11.13251
## ch.50  67.88670 16.17353
tbth_comp[, 7:8] %>% 
  data.frame() %>% 
  rownames_to_column("model_name") %>% 
  mutate(model_name = fct_reorder(model_name, waic, .desc = T)) %>% 
  ggplot(aes(x = waic, y = model_name, 
             xmin = waic - se_waic, 
             xmax = waic + se_waic)) +
  geom_pointrange(shape = 21) +
  labs(title = "WAIC comparison across Tb.Th estimation models",
       x = "WAIC", y = "model") +
  theme(axis.ticks.y = element_blank())

So: genus only (ch.50) performed the worst; mass only (ch.40) and genus+mass (ch.51) performed ok but not as well as the phylogenetic model (there is plenty of overlap in the WAIC standard error among them though). The phylogenetic model (+ intraspecific variance) with genus and mass as predictors will be the best at predicting out-of-sample. This is the same ranking of models that we found in BV.TV.

What does that mean? Does it mean anything? Think about it.

Conclusion from this section: Tb.Th is maybe not giving any information about organismal ecology (substrate use) here. Instead, most of the variance is related to phylogenetic covariance structure, intraspecific variance, and body mass.

####################################

Trabecular Spacing (Tb.Sp)

####################################

A model without genus, mass as the only predictor:

# Tb.Sp by mass
ch.52 <- 
  brm(data = d, 
      family = gaussian,
      tbsp_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.52")
print(ch.52)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: tbsp_s ~ 1 + mass_s 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.01      0.11    -0.21     0.23 1.00     4658     2689
## mass_s        0.38      0.11     0.16     0.61 1.00     4347     2686
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.95      0.08     0.80     1.13 1.00     4339     2840
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Ha ha, sigma = 0.95. Guess what this plot is gonna look like... not a line, that's for sure. Plot:

ch.52.pl <- fitted(ch.52,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = d, aes(y = tbsp_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  labs(x = "Mass in grams (standardized)",
       y = "trabecular spacing (standardized)") 
ch.52.pl

Right off the bat we can tell that there is a lot more variance in Tb.Sp than in either of the other metrics we've looked at so far - but there still seems to be some correlation with size. I'm pretty sure that the Archboldomys and Soricomys way up at the top there have caused trouble in previous sets of analyses - they have portions of the vertebral centrum that are basically empty, which means that the "spacing" in that portion of the structure is the same size as the entire inner diameter of the centrum. The Apomys way down at the bottom is a known weirdo also (maybe a really old individual). These three will probably have lots of influence in these models. We'll get to that below.

# By genus only
ch.53 <- 
  brm(data = d, 
      family = gaussian,
      tbsp_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.53")
print(ch.53)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: tbsp_s ~ 0 + genus 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.18      0.22    -0.63     0.26 1.00     6040     2777
## genusArchboldomys    -0.15      0.37    -0.86     0.56 1.00     6304     3009
## genusChrotomys        0.36      0.21    -0.04     0.77 1.00     4726     3062
## genusRhynchomys       0.87      0.37     0.14     1.61 1.00     5363     2955
## genusSoricomys       -0.44      0.22    -0.85    -0.01 1.00     5985     2878
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.95      0.09     0.79     1.13 1.00     4968     3142
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus and mass
ch.54 <- 
  brm(data = d, 
      family = gaussian,
      tbsp_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.54")
print(ch.54)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: tbsp_s ~ 0 + genus + mass_s 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.22      0.22    -0.67     0.21 1.00     3358     2279
## genusArchboldomys     0.00      0.39    -0.77     0.75 1.00     2813     2222
## genusChrotomys        0.14      0.34    -0.51     0.82 1.00     1976     2501
## genusRhynchomys       0.60      0.48    -0.35     1.54 1.00     2305     2675
## genusSoricomys       -0.12      0.45    -1.02     0.75 1.00     1981     2479
## mass_s                0.25      0.30    -0.35     0.83 1.00     1756     2036
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.95      0.09     0.80     1.13 1.00     2949     2310
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus, mass, and phylogeny/intraspecific variance
ch.29 <-
  brm(data = d, 
      family = gaussian,
      tbsp_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.29")

print(ch.29)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: tbsp_s ~ 0 + genus + mass_s + (1 | gr(phylo, cov = ch)) + (1 | taxon) 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~phylo (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.40      0.30     0.01     1.11 1.00     1483     1883
## 
## ~taxon (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.33      0.23     0.01     0.89 1.00     1317     1919
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.18      0.44    -1.04     0.76 1.00     3202     2639
## genusArchboldomys     0.01      0.57    -1.13     1.14 1.00     3800     2759
## genusChrotomys        0.08      0.53    -1.01     1.09 1.00     2862     2420
## genusRhynchomys       0.44      0.66    -0.92     1.70 1.00     3420     2801
## genusSoricomys       -0.08      0.59    -1.29     1.09 1.00     3180     2935
## mass_s                0.28      0.34    -0.40     0.93 1.00     3085     2674
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.91      0.09     0.76     1.11 1.00     4414     2601
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Four part plot:

ch.53_halfeye <- ch.53 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "Tb.Sp", y = "genus")+
  ggtitle(label = "Tb.Sp by genus only")

ch.54_halfeye <- ch.54 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Tb.Sp by genus/mass")+
  labs(x = "Tb.Sp", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.29_halfeye <- ch.29 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Tb.Sp by genus/mass/phylo/intrasp.")+
  labs(x = "Tb.Sp", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.52.pl.nokey <- ch.52.pl +
  theme(legend.position = "none")
ch.53_halfeye/ch.52.pl.nokey|ch.54_halfeye/ch.29_halfeye

From this set of plots, it looks like there is maybe something interesting going on with Rhynchomys. Even though there is almost full overlap between a lot of the probability distributions (even before considering phylogenetic covariance), the mean estimate for Rhynchomys departs from the global mean more than any of the other genera. This might also be driven by the large RANGE of values for Tb.Sp, which is especially common in Rhynchomys and Archboldomys. Sometimes they have areas where the "trabecular spacing" is equal to the entire internal diameter of the vertebral centrum because there are no trabeculae at all. I have a feeling that there will be a lot of outliers driving the patterns in this one.

Comparison plots:

diff.tbth.ch.54 <- ch.54 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  mutate(.variable = str_replace_all(.variable, "omys", "")) %>% 
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Tb.Sp difference, genus+mass")+
  labs(x = "difference", y = "comparison")+
  theme(plot.title = element_text(size = 9))+
  scale_x_continuous(limits = c(-3, 3))

diff.tbth.ch.29 <- ch.29 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  mutate(.variable = str_replace_all(.variable, "omys", "")) %>% 
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Tb.Sp difference, genus+mass+phylo")+
  labs(x = "difference", y = "comparison")+
  theme(plot.title = element_text(size = 9))+
  scale_x_continuous(limits = c(-3, 3))

diff.tbth.ch.54|diff.tbth.ch.29
## Warning: Removed 17 rows containing missing values (stat_slabinterval).
## Warning: Removed 56 rows containing missing values (stat_slabinterval).

Sure enough, the ones involving Rhynchomys have the greatest mean departure from 0, but none of the differences look particularly significant. Gonna try to figure out what's driving this.

ch.52 <- add_criterion(ch.52, c("loo", "waic"))
ch.53 <- add_criterion(ch.53, c("loo", "waic"))
ch.54 <- add_criterion(ch.54, c("loo", "waic"))
ch.29 <- add_criterion(ch.29, c("loo", "waic"))

loo(ch.52) # mass only : 0 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -92.9  7.2
## p_loo         3.1  0.9
## looic       185.8 14.4
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
loo(ch.53) # genus only : 2 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -94.6  7.3
## p_loo         6.4  1.9
## looic       189.2 14.5
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     65    97.0%   2156      
##  (0.5, 0.7]   (ok)        2     3.0%   294       
##    (0.7, 1]   (bad)       0     0.0%   <NA>      
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
loo(ch.54) # mass and genus : 1 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -94.9  7.1
## p_loo         6.5  1.8
## looic       189.7 14.2
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     66    98.5%   812       
##  (0.5, 0.7]   (ok)        1     1.5%   313       
##    (0.7, 1]   (bad)       0     0.0%   <NA>      
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
loo(ch.29) # mass genus and phylo: 1 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -94.2  7.2
## p_loo         9.8  2.2
## looic       188.4 14.5
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     66    98.5%   812       
##  (0.5, 0.7]   (ok)        1     1.5%   379       
##    (0.7, 1]   (bad)       0     0.0%   <NA>      
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.

Ok so three of them have outliers! Are they all three gonna be different? Is it gonna be that troublemaking Apomys again?

tibble(k   = ch.53$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k))
## # A tibble: 67 x 3
##        k   row specimen                    
##    <dbl> <int> <chr>                       
##  1 0.636    22 193526 Archboldomys_maximus 
##  2 0.592    60 190968 Soricomys_leonardocoi
##  3 0.413    47 189835 Rhynchomys_labo      
##  4 0.366    14 216435 Apomys_sierrae       
##  5 0.361    49 189839 Rhynchomys_labo      
##  6 0.340    48 189836 Rhynchomys_labo      
##  7 0.300    46 189834 Rhynchomys_labo      
##  8 0.289    62 188313 Soricomys_montanus   
##  9 0.227    23 193943 Archboldomys_maximus 
## 10 0.223    26 221833 Chrotomys_mindorensis
## # ... with 57 more rows
# 193526 Archboldomys maximus, K = 0.64 AND
# 190968 Soricomys leonardocoi, K = 0.59

tibble(k   = ch.54$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 193526 Archboldomys maximus, K = 0.52
## # A tibble: 67 x 3
##        k   row specimen                    
##    <dbl> <int> <chr>                       
##  1 0.523    22 193526 Archboldomys_maximus 
##  2 0.425    60 190968 Soricomys_leonardocoi
##  3 0.409    21 193525 Archboldomys_maximus 
##  4 0.368    46 189834 Rhynchomys_labo      
##  5 0.344    41 188354 Chrotomys_whiteheadi 
##  6 0.336    40 188352 Chrotomys_whiteheadi 
##  7 0.316    14 216435 Apomys_sierrae       
##  8 0.315    48 189836 Rhynchomys_labo      
##  9 0.287    23 193943 Archboldomys_maximus 
## 10 0.285    25 221831 Chrotomys_mindorensis
## # ... with 57 more rows
tibble(k   = ch.29$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 193526 Archboldomys maximus, K = 0.55
## # A tibble: 67 x 3
##        k   row specimen                    
##    <dbl> <int> <chr>                       
##  1 0.549    22 193526 Archboldomys_maximus 
##  2 0.432    62 188313 Soricomys_montanus   
##  3 0.424    60 190968 Soricomys_leonardocoi
##  4 0.397    53 175554 Soricomys_kalinga    
##  5 0.388     4 218313 Apomys_banahao       
##  6 0.384    14 216435 Apomys_sierrae       
##  7 0.383    26 221833 Chrotomys_mindorensis
##  8 0.380    19 193523 Archboldomys_maximus 
##  9 0.369    47 189835 Rhynchomys_labo      
## 10 0.364    50 167305 Soricomys_kalinga    
## # ... with 57 more rows

Some thoughts:

  1. 193526 Archboldomys maximus is the most influential specimen in all three versions of the model (actually also in all four of the models - I checked ch.52 also), but it doesn't have an excessively high Pareto K, especially in the models that include mass as a predictor. In looking at the specimen's TBA results, it has the lowest BV.TV value of any specimen in the sample (12% bone). It is not the smallest specimen of its species. The only other specimens that comes close to having such a low BV.TV are 190968 (Soricomys leonardocoi) and 188313 (S. montanus), both of which have 14% bone. I think the Archboldomys is the most outstanding partially because it differs dramatically from ALL its conspecifics (Tb.Sp = 0.047 versus all others 0.028 or less), whereas among Soricomys, there is a greater spread of values within each species.
  2. Is this a result of some allometric effect where you lose trabeculae as you get really small? This is a suspicion I've had for a while. It's probably worth doing some allometric scaling investigations here - I haven't been paying attention to that particularly. According to Doube et al. 2011, larger animals have relatively larger trabeculae that are relatively further apart, and that there are fewer of them per unit volume. Their analysis includes shrews down to 3g in mass, so that definitely encompasses the size range of Soricomys and Archboldomys. I'll have to look more closely at their results though, because I don't know how densely they have sampled the 3g-30g range of body masses. Having looked at so many very small species with almost totally empty centra, I am eager to know if my suspicion is founded in biological fact.

Model comparisons:

tbsp_comp <- loo_compare(ch.52, ch.53, ch.54, ch.29, criterion = "waic") %>% 
  print(simplify = F)
##       elpd_diff se_diff elpd_waic se_elpd_waic p_waic se_p_waic waic  se_waic
## ch.52   0.0       0.0   -92.9       7.2          3.1    0.9     185.8  14.4  
## ch.29  -1.1       2.4   -94.0       7.2          9.6    2.1     188.0  14.4  
## ch.53  -1.5       1.9   -94.4       7.2          6.3    1.8     188.8  14.4  
## ch.54  -1.9       1.7   -94.8       7.1          6.4    1.7     189.5  14.2
# Convert to WAIC differences among models, including standard error term.
cbind(waic_diff = tbsp_comp[, 1] * -2,
      se        = tbsp_comp[, 2] * 2)
##       waic_diff       se
## ch.52  0.000000 0.000000
## ch.29  2.203877 4.795751
## ch.53  3.045581 3.825730
## ch.54  3.734829 3.493159
tbsp_comp[, 7:8] %>% 
  data.frame() %>% 
  rownames_to_column("model_name") %>% 
  mutate(model_name = fct_reorder(model_name, waic, .desc = T)) %>% 
  ggplot(aes(x = waic, y = model_name, 
             xmin = waic - se_waic, 
             xmax = waic + se_waic)) +
  geom_pointrange(shape = 21) +
  labs(title = "WAIC comparison across Tb.Sp estimation models",
       x = "WAIC", y = "model") +
  theme(axis.ticks.y = element_blank())

They are all the same, for all intents and purposes. Does this mean that the addition of predictors/information in the different models gives you NO additional capacity to predict Tb.Sp? Would this change if I removed the most extreme outliers? It seems like that's maybe worth a shot. This is the only metric so far where I feel like removing the outliers is maybe justified, but I also think that they may indicate something biologically relevant.

Let's look at some more TBA metrics before I go excluding anything.

######################################

Connectivity Density (Conn.D)####

######################################

A model without genus, mass as the only predictor:

# Conn.D by mass
ch.55 <- 
  brm(data = d, 
      family = gaussian,
      cond_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.55")
print(ch.55)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: cond_s ~ 1 + mass_s 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.00      0.09    -0.18     0.18 1.00     3308     3057
## mass_s       -0.68      0.09    -0.85    -0.49 1.00     3736     2763
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.76      0.07     0.64     0.90 1.00     3406     3039
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Pretty large error, and a strong-ish inverse correlation with mass. Plot:

ch.55.pl <- fitted(ch.55,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = d, aes(y = cond_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  labs(x = "Mass in grams (standardized)",
       y = "connectivity density (standardized)")+
  geom_text_repel(data = d %>% filter(specno %in% c("193526", "193523", "190968", "216435")),  
                  aes(y = cond_s, label = specno), 
                  size = 3,)
ch.55.pl

In comparison to the Tb.Sp model, it has less error overall, but there is still a lot of spread around the mean estimate. One Archboldomys and one Apomys are very noticeably above the line this time. I went ahead and labeled them - note that the super high Archboldomys (193523) is not the same specimen that has the super high trabecular spacing (193526, also labeled). This makes sense - larger spaces means lower overall density. S. leonardocoi (190968) makes another appearance pretty far away from its congeners, but it does have one other specimen down there with it, so maybe it's not unreasonably low.

That Apomys sierrae (216435) is absolutely abnormal, based on its detachment from all the rest of the Apomys. Also it's HUGE (105g, all others in the species are 81-95g). I had previously flagged it just from how weird it looked qualitatively, and LRH and I discussed taking a look at it to see if it's really old or something. We have yet to do that. I'm gonna run these analyses with it included and see how much trouble it causes.

Model permutations:

# By genus only

ch.56 <- 
  brm(data = d, 
      family = gaussian,
      cond_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.56")
print(ch.56)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: cond_s ~ 0 + genus 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.18      0.18    -0.54     0.18 1.00     5216     2949
## genusArchboldomys     0.17      0.30    -0.42     0.74 1.00     6659     3452
## genusChrotomys       -0.61      0.18    -0.96    -0.26 1.00     5200     2853
## genusRhynchomys      -0.52      0.31    -1.14     0.10 1.00     5864     3579
## genusSoricomys        1.00      0.18     0.63     1.36 1.00     5273     3050
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.79      0.07     0.67     0.95 1.00     4761     2806
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

What the hell is Soricomys doing??? (...again)

# By genus and mass
ch.57 <- 
  brm(data = d, 
      family = gaussian,
      cond_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.57")
print(ch.57)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: cond_s ~ 0 + genus + mass_s 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.11      0.19    -0.46     0.25 1.00     3504     2687
## genusArchboldomys    -0.20      0.34    -0.87     0.46 1.00     2779     2544
## genusChrotomys       -0.02      0.29    -0.60     0.54 1.00     1663     2317
## genusRhynchomys       0.18      0.41    -0.64     0.98 1.00     1868     2489
## genusSoricomys        0.14      0.39    -0.62     0.91 1.00     1494     2003
## mass_s               -0.64      0.27    -1.16    -0.11 1.00     1339     1724
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.77      0.07     0.64     0.91 1.00     3094     2445
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus, mass, and phylogeny/intraspecific variance
ch.58 <-
  brm(data = d, 
      family = gaussian,
      cond_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.58")

print(ch.58)
## Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta
## above 0.98 may help. See http://mc-stan.org/misc/warnings.html#divergent-
## transitions-after-warmup
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: cond_s ~ 0 + genus + mass_s + (1 | gr(phylo, cov = ch)) + (1 | taxon) 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~phylo (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.31      0.23     0.01     0.86 1.00     1639     2236
## 
## ~taxon (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.20      0.16     0.01     0.59 1.00     1606     2084
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.09      0.36    -0.80     0.67 1.00     2857     2397
## genusArchboldomys    -0.17      0.48    -1.10     0.85 1.00     3743     2712
## genusChrotomys       -0.01      0.43    -0.88     0.89 1.00     2479     2733
## genusRhynchomys       0.17      0.53    -0.90     1.22 1.00     3142     2831
## genusSoricomys        0.12      0.50    -0.91     1.09 1.00     2754     3042
## mass_s               -0.66      0.29    -1.24    -0.07 1.00     2576     2818
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.76      0.07     0.64     0.91 1.00     5514     2400
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

4-part plot:

ch.56_halfeye <- ch.56 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "Conn.D", y = "genus")+
  ggtitle(label = "Conn.D by genus only")

ch.57_halfeye <- ch.57 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Conn.D by genus/mass")+
  labs(x = "Conn.D", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.58_halfeye <- ch.58 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Conn.D by genus/mass/phylo/intrasp.")+
  labs(x = "Conn.D", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.55.pl.nokey <- ch.55.pl +
  theme(legend.position = "none")
ch.56_halfeye/ch.55.pl.nokey|ch.57_halfeye/ch.58_halfeye

Apart from the weird Apomys, it seems like the variance in Conn.D decreases with increasing body size. Is that just because we're getting a larger sample of bone in the larger animals, so the differences come out in the wash? I seem to remember a paper where they did TBA analyses on pieces of trabecular bone that gradually decrease in size relative to the whole bone, and they found a point at which the estimate becomes unreliable because it reaches the scale where trabecular bone appears heterogeneous. This might be the one where they established the "cutoff" for connectivity, saying you can't reasonably analyze a piece of trabecular bone unless it has AT LEAST X many trabeculae in there.

This again suggests to me that I need to come up with some way of quantifying the heterogeneity of trabecular structure in these animals. I found a paper that did it with pixel intensity on representative slices. Amson's global compactness measure gets at this to some degree because it measures slice by slice, but I'm not interested in taking an average there because it loses all of the spatial resolution, which is what I'm interested in.

I'm mostly certain that there will be no significant differences if I subtract the probability distributions. I think the next step here is to consider the effect of A. sierrae 216435.

Model comparison:

ch.55 <- add_criterion(ch.55, c("loo", "waic"))
ch.56 <- add_criterion(ch.56, c("loo", "waic"))
ch.57 <- add_criterion(ch.57, c("loo", "waic"))
ch.58 <- add_criterion(ch.58, c("loo", "waic"))

loo(ch.55) # mass only : 0 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -78.1  8.9
## p_loo         3.9  1.5
## looic       156.2 17.9
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
loo(ch.56) # genus only : 1 Pareto K > 0.5
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -82.5  8.1
## p_loo         6.2  1.8
## looic       165.0 16.2
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     66    98.5%   1043      
##  (0.5, 0.7]   (ok)        1     1.5%   405       
##    (0.7, 1]   (bad)       0     0.0%   <NA>      
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.
loo(ch.57) # mass and genus : 1 Pareto K > 0.7
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -81.5 10.2
## p_loo         8.0  3.1
## looic       163.0 20.4
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     66    98.5%   732       
##  (0.5, 0.7]   (ok)        0     0.0%   <NA>      
##    (0.7, 1]   (bad)       1     1.5%   14        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
loo(ch.58) # mass genus and phylo: 1 Pareto K > 0.7
## 
## Computed from 4000 by 67 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -82.0  9.9
## p_loo        10.2  3.3
## looic       163.9 19.8
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     66    98.5%   1054      
##  (0.5, 0.7]   (ok)        0     0.0%   <NA>      
##    (0.7, 1]   (bad)       1     1.5%   83        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
# All right who is it? I'm sure I already know. 
tibble(k   = ch.57$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216435 Apomys sierrae, K = 0.92, to the surprise of nobody.
## # A tibble: 67 x 3
##        k   row specimen                    
##    <dbl> <int> <chr>                       
##  1 0.927    14 216435 Apomys_sierrae       
##  2 0.498    19 193523 Archboldomys_maximus 
##  3 0.364    22 193526 Archboldomys_maximus 
##  4 0.321    58 190964 Soricomys_leonardocoi
##  5 0.310    32 193720 Chrotomys_silaceus   
##  6 0.246    23 193943 Archboldomys_maximus 
##  7 0.239    51 167307 Soricomys_kalinga    
##  8 0.237    62 188313 Soricomys_montanus   
##  9 0.234    49 189839 Rhynchomys_labo      
## 10 0.234    24 214320 Archboldomys_maximus 
## # ... with 57 more rows
tibble(k   = ch.58$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216435 Apomys sierrae, K = 0.77.
## # A tibble: 67 x 3
##        k   row specimen                    
##    <dbl> <int> <chr>                       
##  1 0.775    14 216435 Apomys_sierrae       
##  2 0.476    19 193523 Archboldomys_maximus 
##  3 0.420    58 190964 Soricomys_leonardocoi
##  4 0.377    53 175554 Soricomys_kalinga    
##  5 0.363    64 193515 Soricomys_montanus   
##  6 0.350    25 221831 Chrotomys_mindorensis
##  7 0.328    46 189834 Rhynchomys_labo      
##  8 0.309    48 189836 Rhynchomys_labo      
##  9 0.309    35 193731 Chrotomys_silaceus   
## 10 0.308    62 188313 Soricomys_montanus   
## # ... with 57 more rows

This suggests to me that removing that outlier is pretty much essential to understanding the pattern overall. ##########################################

Conn.D without A. sierrae 216435

##########################################

dc <- d[d$specno!="216435",]

ch.59 <- 
  brm(data = dc, 
      family = gaussian,
      cond_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.59")
print(ch.59)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: cond_s ~ 1 + mass_s 
##    Data: dc (Number of observations: 66) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.05      0.08    -0.21     0.12 1.00     3884     3271
## mass_s       -0.70      0.08    -0.87    -0.53 1.00     3697     2790
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.68      0.06     0.57     0.81 1.00     3703     3014
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
ch.59.pl <- fitted(ch.59,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = dc, aes(y = cond_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  labs(x = "Mass in grams (standardized)",
       y = "connectivity density (standardized)")+
  geom_text_repel(data = d %>% filter(specno %in% c("193525", "193523", "190968", "190977")),  
                  aes(y = cond_s, label = specno), 
                  size = 3,)
ch.59.pl

How's it look?

ch.59 <- add_criterion(ch.59, c("loo", "waic"))
loo(ch.59)
## 
## Computed from 4000 by 66 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo    -69.3  6.8
## p_loo         3.3  0.8
## looic       138.6 13.6
## ------
## Monte Carlo SE of elpd_loo is 0.0.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
tibble(k   = ch.59$criteria$loo$diagnostics$pareto_k,
       row = 1:66,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216435 Apomys sierrae, K = 0.77.
## # A tibble: 66 x 3
##        k   row specimen                    
##    <dbl> <int> <chr>                       
##  1 0.322    61 190977 Soricomys_leonardocoi
##  2 0.315    21 193525 Archboldomys_maximus 
##  3 0.293    52 167309 Soricomys_kalinga    
##  4 0.221    57 190962 Soricomys_leonardocoi
##  5 0.216    45 189833 Rhynchomys_labo      
##  6 0.173    50 167305 Soricomys_kalinga    
##  7 0.162    59 190965 Soricomys_leonardocoi
##  8 0.157    34 193726 Chrotomys_silaceus   
##  9 0.155     8 236309 Apomys_datae         
## 10 0.124    66 193519 Soricomys_montanus   
## # ... with 56 more rows

Much better. However, this makes it so that if I want to compare directly across the models I have to re-run all of them without that specimen. Let's see how that affects the results.

# By genus only

ch.60 <- 
  brm(data = dc, 
      family = gaussian,
      cond_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.60")
print(ch.60)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: cond_s ~ 0 + genus 
##    Data: dc (Number of observations: 66) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.33      0.17    -0.66     0.02 1.00     5013     3089
## genusArchboldomys     0.17      0.30    -0.41     0.76 1.00     5637     2921
## genusChrotomys       -0.61      0.17    -0.93    -0.28 1.00     6219     3484
## genusRhynchomys      -0.53      0.29    -1.11     0.04 1.00     6368     3035
## genusSoricomys        1.01      0.17     0.68     1.34 1.00     5799     2904
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.73      0.07     0.61     0.87 1.00     5587     3003
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus and mass
ch.61 <- 
  brm(data = dc, 
      family = gaussian,
      cond_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.57")
print(ch.61)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: cond_s ~ 0 + genus + mass_s 
##    Data: d (Number of observations: 67) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.11      0.19    -0.46     0.25 1.00     3504     2687
## genusArchboldomys    -0.20      0.34    -0.87     0.46 1.00     2779     2544
## genusChrotomys       -0.02      0.29    -0.60     0.54 1.00     1663     2317
## genusRhynchomys       0.18      0.41    -0.64     0.98 1.00     1868     2489
## genusSoricomys        0.14      0.39    -0.62     0.91 1.00     1494     2003
## mass_s               -0.64      0.27    -1.16    -0.11 1.00     1339     1724
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.77      0.07     0.64     0.91 1.00     3094     2445
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# By genus, mass, and phylogeny/intraspecific variance
ch.62 <-
  brm(data = dc, 
      family = gaussian,
      cond_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.62")

print(ch.62)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: cond_s ~ 0 + genus + mass_s + (1 | gr(phylo, cov = ch)) + (1 | taxon) 
##    Data: dc (Number of observations: 66) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~phylo (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.29      0.23     0.01     0.87 1.00     1539     2242
## 
## ~taxon (Number of levels: 11) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.18      0.15     0.01     0.58 1.00     1880     1777
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genusApomys          -0.24      0.34    -0.91     0.48 1.00     2240     2031
## genusArchboldomys    -0.29      0.46    -1.20     0.67 1.00     2916     2212
## genusChrotomys        0.15      0.40    -0.69     0.94 1.00     2913     2568
## genusRhynchomys       0.35      0.52    -0.71     1.33 1.00     3362     2636
## genusSoricomys       -0.11      0.50    -1.09     0.88 1.00     2969     2867
## mass_s               -0.84      0.28    -1.39    -0.30 1.00     3000     2782
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.66      0.06     0.55     0.79 1.00     5837     3001
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Four-part plot:

ch.60_halfeye <- ch.60 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "Conn.D", y = "genus")+
  ggtitle(label = "Conn.D by genus only")

ch.61_halfeye <- ch.61 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Conn.D by genus/mass")+
  labs(x = "Conn.D", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.62_halfeye <- ch.62 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Conn.D by genus/mass/phylo")+
  labs(x = "Conn.D", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.59.pl.nokey <- ch.59.pl +
  theme(legend.position = "none")
ch.60_halfeye/ch.59.pl.nokey|ch.61_halfeye/ch.62_halfeye

Well. It looks like Rhynchomys is maybe the interesting guy here. A higher Conn.D than anyone else (but not significantly so). Could this have something to do with the variety of forces on the veretbral column of ricochetal animals? Look into this. Since it seems like degree of anisotropy is not very reliable when you get down to this small of a bone sample, connectivity density could help in understanding the overall structure.