Chrotomyini_compgenus_analyses_05022022.Rmd

---
title: "Chrotomys analysis compilation - genera"
author: "S.M. Smith"
date: "5/2/2022"
output: 
  html_document: 
    keep_md: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
pacman::p_load(install = F, "ape", "bayesplot","BiocManager", "brms", "broom", "dagitty", "devtools", "flextable", "ggdark", "ggmcmc", "ggrepel", "gtools", "lattice","loo", "patchwork", "rcartocolor", "Rcpp", "remotes", "rstan", "StanHeaders", "statebins", "tidybayes", "viridis", "viridisLite", "pacman")
```

Load up Chrotomyini trabecular bone architecture (TBA) data and standardize variables:
```{r}
d <- read.csv(file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\05062022 Philippine Murids segmentation parameters and morphological data - TBA data total BoneJ (full).csv", header = T)

d <- d[d$tribe=="chroto",c(1:2, 4:23)]

d <- 
  d %>% 
  mutate(bvtv = as.numeric(bvtv))

d <- 
  d %>%
  mutate(mass_s = rethinking::standardize(log10(mass_g)),
         elev_s = rethinking::standardize(elev), 
         bvtv_s = rethinking::standardize(bvtv),
         tbth_s = rethinking::standardize(tbth),
         tbsp_s = rethinking::standardize(tbsp),
         conn_s = rethinking::standardize(conn),
         cond_s = rethinking::standardize(m_connd),
         cond_s2 = rethinking::standardize(connd), 
         da_s = rethinking::standardize(da))

# remove C. gonzalesi and R. isarogensis, singletons:
d <- 
  d %>% 
  filter(taxon!="Chrotomys_gonzalesi") %>% 
  filter(taxon!="Rhynchomys_isarogensis")

# Make categorical vars into factors
d <- 
  d %>%
  mutate(loco = factor(loco), 
         hab_simp = factor(hab_simp),
         genus = factor(genus))

# Specify colors for plots:
cols = c("#86acca","#ab685b", "#3370a3", "#1f314c","#5a9fa8")
```

Load in phylogeny: 
REMEMBER: A <- ape::vcv.phylo(phylo), add corr = T if your tree is NOT SCALED TO 1. 
```{r}
ch.tre <- read.nexus(file = "G:\\My Drive\\Philippine rodents\\Chrotomys\\analysis\\SMS_PRUNED_and_COLLAPSED_03292022_OTUsrenamed_Rowsey_PhBgMCC_LzChrotomyini.nex")

ch <- ape::vcv.phylo(ch.tre, corr = T)

d <- 
  d %>% 
  mutate(phylo = taxon)
```

#######################################################################
#### Comparisons of several models for each trabecular bone metric ####
#######################################################################

I'm going to put together and compare a few different models for each trabecular bone metric, with a few combinations of potential model components. This is for the purpose of not only comparing fit (i.e., how good each model is at predicting out-of-sample), but also to understand how the addition of predictors changes the models' estimates. I'll also include some assessments of the influence of individual specimens on the model fit.

######################################################################################
#### Bone Volume Fraction (BV.TV), and some general methodological considerations ####
######################################################################################

For comparison: a model that doesn't include genus as a predictor:

```{r}
# BV.TV by mass
ch.36 <- 
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 5,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.36")
print(ch.36)
```

A scatterplot of the above model, with genera color-coded (the same color code to be used for the remainder of this document):
```{r}
range(d$mass_s)
nd <- tibble(mass_s = seq(from = -2, to = 1.75, length.out = 67))

fitted(ch.36,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = d, aes(y = bvtv_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  xlim(min(nd), max(nd))+
  labs(x = "Mass in grams (standardized)",
       y = "bone volume fraction (standardized)") 
```

From this plot, you can really see that genus is closely linked to body mass. You can also see that there is definitely some correlation between mass and BV.TV but the relationship has a lot of error. You can also detect this from the model parameters: mass_s (the influence of mass, a.k.a slope, listed under "Population-level effects") = 0.56, with CIs 0.36-0.76, and sigma = 0.83. 

Now a series of models that estimate a separate intercept (= estimate of mean BV.TV) for each genus, which here is a proxy for substrate use. I printed each model for quick inspection, and at the end of all the models, there is a plot including all four of them.

```{r}
# By genus only
ch.24 <- 
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.24")
print(ch.24)
```


```{r}
# By genus and mass
ch.25 <- 
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.25")
print(ch.25)
```


```{r}
# By genus and phylogeny/intraspecific variance, no mass
ch.44 <- 
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 0 + genus + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.89), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.44")
print(ch.44)
```


```{r}
# By genus, mass, and phylogeny/intraspecific variance
ch.26 <-
  brm(data = d, 
      family = gaussian,
      bvtv_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.85), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.26")
print(ch.26)
```

A four-part plot including all of the models:

```{r}
ch.24_halfeye <- ch.24 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "BV.TV", y = "genus")+
  ggtitle(label = "BV.TV by genus only")

ch.25_halfeye <- ch.25 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "BV.TV by genus/mass")+
  labs(x = "BV.TV", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.44_halfeye <- ch.44 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "BVTV by genus/phylo/intrasp")+
  labs(x = "BV.TV", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.26_halfeye <- ch.26 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "BV.TV by genus/mass/phylo/intrasp.")+
  labs(x = "BV.TV", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.24_halfeye/ch.44_halfeye|ch.25_halfeye/ch.26_halfeye
```

Takeaways from this set of plots:

1) Accounting for mass in estimation of BV.TV makes the relative estimates of BV.TV across genera change pretty dramatically. For its body size, Soricomys has relatively high BV.TV, and Rhynchomys has low BV.TV for its body size. Chrotomys, Archboldomys, and Apomys have estimated BV.TV values that are pretty close to the mean (represented by the vertical dotted line). One way to tell the significance of the differences between genera is to subtract the posterior probability distributions from one another.The difference between distributions is "significant" if the distribution of differences between them does not overlap 0. I'm not really a fan of statistical significance but it's good to have some definition of that I guess.

Here are the difference distributions between pairs of genera for the model including only genus and mass as predictors (upper right in the preceding plot):
```{r}
ch.25 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Between-genus BV.TV difference distributions")+
  labs(x = "difference", y = "comparison")
```

From this comparison, it looks like Rhynchomys has significantly lower bone volume fraction than pretty much everything it is compared to, with the possible exception of Archboldomys (6th row down) - the distribution of the Rhynchomys-Archboldomys difference overlaps 0 slightly. It also appears that Soricomys has significantly higher BV.TV than Archboldomys.

This brings me to my second takeaway from the 4-part plot. 

2) The genus/mass model doesn't include the effects of phylogenetic correlation or intraspecific variance. When we estimate BV.TV (with genus and mass as predictors) with a phylogenetic multilevel model, and consider intraspecific variance as well (lower right in the 4-part plot above), the difference distributions look like this:

```{r}
ch.26 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Between-genus BV.TV difference distributions")+
  labs(x = "difference", y = "comparison")
```

The relative location of each difference distribution hasn't changed dramatically (e.g., note location of Rhynchomys comparisons relative to the mean), but the probability density is much more spread out. This is telling us that, because our specimens share some features as a result of phylogenetic structure, the model is less sure that there are significant differences between genera that are related specifically to their generic properties and not to phylogenetic correlation.

So how strong is the influence of phylogeny on BV.TV? The model outputs a value that is called "sd_phylo__Intercept", which, as I understand it, is an estimate of the amount of variance in the data that is related to phylogenetic correlation structure. We can use that to calculate the more familiar lambda, a.k.a. phylogenetic signal. 

(A reminder for me of what phylogenetic signal is: it is an intra-class correlation coefficient (ICC), which tells you how much things that are in groups together resemble each other, quantitatively speaking. 0 means that the individuals don't show any similarity that can be attributed to their relatedness (no phylogenetic structure). 1 Means that the trait you're looking at evolves exactly as expected under Brownian Motion evolution - that is, the phylogenetic structure underlying the data set explains ALL of the variation you see among the individual measurements. This is different from considering intraspecific variation as a potential source of covariance. Including the (1|taxon) term accounts for group-level similarity that is related to taxon (species) ONLY, irrespective of broader phylogenetic structure (it does not refer to the phylogenetic covariance matrix in its calculations). I feel like the more times I try and explain this to myself, the better I will understand it. I've always found PCM concepts somewhat opaque and I'm really trying to be clear about them so that I can mayyybe more precisely explain what my results mean. Thanks for sticking with me.)

Here's a calculation of lambda for ch.26, the model that includes genus, mass, phylogeny, and intraspecific variance:
```{r}
hyp <- "sd_phylo__Intercept^2 / (sd_phylo__Intercept^2 + sd_taxon__Intercept^2 + sigma^2) = 0"
h.bvtv <- hypothesis(ch.26, hyp, class = NULL)
h.bvtv
```

The estimate of lambda here is 0.29. I would say that's medium-low? But why should we trust a point estimate here (especially when the upper 95% CI is 0.74)? We can look at lambda as a probability distribution instead. This will show us where the majority of the probability density actually falls.

```{r}
ph.bvtv.pl <- ggplot() +
  geom_density(aes(x = h.bvtv$samples$H1), fill = "red", alpha = 0.5) +
  theme_bw() +
  xlim(0,1) +
  labs(y = "density", x = "lambda: bone volume fraction")
ph.bvtv.pl
```

I had never thought of lambda as a distribution before this and I think it makes these results both easier and more difficult to understand. From this density plot, we can tell that there's a density peak around, say, 0.03. I would call that a low value for lambda. HOWEVER - just because that is a local peak does not mean that it represents the majority of the probability density. There is a huge amount of probability density (area under the curve) between lambda = 0.125 and ~0.56. So it makes sense that the point estimate of lambda is in the middle-ish of the values between the peak (~0.03) and the point where the density drops off (~0.56).

What the hell does this mean about the actual influence of phylogeny on BV.TV? I'm still trying to figure that out. Phylogenetic structure in these data clearly explains some of the variation in BV.TV across genera - which, yeah, of course it does, you're using genus as a proxy for substrate-use groups (e.g., Chrotomys = varying degrees of digging, Rhynchomys = bipedal hopping, etc). But after accounting for mass and group-level effects (phylo/intraspecific var) in our model, there are still differences between genera. They aren't necessarily "significant", according to the difference distributions shown above, but there are definitely signals that could be related to organismal ecology.

Ok departing from the discussion of the plots above: I'm gonna do some model comparison to see which model is best at predicting out-of-sample (which is the "best fit" for the data). I'm going to use two methods: the Widely Applicable Information Criterion (WAIC) and Leave-One-Out cross-validation (LOO).

```{r}
ch.36 <-  add_criterion(ch.36, c("waic", "loo")) # mass only
ch.24 <-  add_criterion(ch.24, c("waic", "loo")) # genus only 
ch.25 <-  add_criterion(ch.25, c("waic", "loo")) # genus and mass
ch.44 <-  add_criterion(ch.44, c("waic", "loo")) # genus and phylogeny
ch.26 <-  add_criterion(ch.26, c("waic", "loo")) # genus mass phylogeny

loo(ch.36) # mass only : 0 Pareto K > 0.5
loo(ch.24) # genus only : 0 Pareto K > 0.5
loo(ch.25) # genus and mass : 0 Pareto K > 0.5
loo(ch.44) # genus and phylogeny : 1 Pareto K > 0.5
loo(ch.26) # genus mass phylogeny : 1 Pareto K > 0.5
```

This analysis (LOO) can tell you, among other things, if there are specimens that are having a strong effect on the model fit. It does this by leaving one specimen out at a time (hence the name) and seeing how well the model can predict the value for that left-out specimen when it is not a part of the model. Specimens that are highly influential have a high Pareto K value. In two of the models here (ch.44 and ch.26, the phylogenetic models), one specimen has a relatively high Pareto K. Let's see if it's the same specimen in both.

```{r}
tibble(k   = ch.44$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 190968 Soricomys leonardocoi, K = 0.59

tibble(k   = ch.26$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216462 Apomys sierrae, K = 0.69
```

Why are these specimens potentially having such a strong effect?

1) FMNH 216462, Apomys sierrae: This animal has an anomalously low bone volume fraction compared to all other individuals in its species (17% bone versus all other specimens 29-37% bone). It has the shortest head and body length at 135 mm, compared to ~140-150 for all other specimens. It is also unusual in that it only has five lumbar vertebrae rather than 6. I still analyzed the third lumbar position for it, but it may represent a veeery slightly different functional position in the lumbar column. However - looking at the specimen just now in Dragonfly, all the vertebrae kind of look like that. So It might just be a weirdo.

2) FMNH 190968, Soricomys leonardocoi: This animal also has anomalously low bone volume fraction for its species (14% bone as opposed to all others 19-22% bone). However, this is in an analysis (ch.44) that does not account for body mass. 190968 weighs 35 grams, and the average mass of the species is 32 grams, so it's slightly heavier than average and has a lower BV.TV. In the phylogenetic model that has mass as a predictor (ch.26), its Pareto K is below the "good" threshold of 0.5 (K = 0.41) - it doesn't have a super strong effect on that model fit. So incorporating mass apparently allows the model to not be surprised about this specimen's BV.TV. I'm not sure I have an adequate explanation for how that is happening.

Should I remove these guys, at least the Apomys? I don't know if that's necessary. It seems like it represents true variation in the population, and I'm loath do get rid of it because of that. I could try using a Student distribution for the posterior probability distribution, which would result in a model that is less surprised by outliers, but I haven't decided if I think the problem is extreme enough for that to be necessary.

Now I'm going to compare across models to see which is the best at predicting BV.TV, and by how much. This won't give any information about the causality of the variables involved, but it can tell how big of a difference the inclusion or exclusion of a variable makes in the predictive accuracy of the model.

```{r}
bvtv_comp <- loo_compare(ch.36, ch.24, ch.25, ch.44, ch.26, criterion = "waic")

# Convert to WAIC differences among models, including standard error term.
cbind(waic_diff = bvtv_comp[, 1] * -2,
      se        = bvtv_comp[, 2] * 2)

bvtv_comp[, 7:8] %>% 
  data.frame() %>% 
  rownames_to_column("model_name") %>% 
  mutate(model_name = fct_reorder(model_name, waic, .desc = T)) %>% 
  ggplot(aes(x = waic, y = model_name, 
             xmin = waic - se_waic, 
             xmax = waic + se_waic)) +
  geom_pointrange(shape = 21) +
  labs(title = "WAIC comparison across BV.TV estimation models",
       x = "WAIC", y = "model") +
  theme(axis.ticks.y = element_blank())
```

So in this comparison, the WAIC (= Widely Applicable Information Criterion) indicates that ch.26, the phylogenetic model (+ intraspecific variance) that includes genus and mass, is the best predictive model, but not by much - its standard error (horizontal line on the plot) overlaps that of ch.44 (phylo only) and ch.25 (genus and mass). 

Based on what I know about trabecular bone morphology, and what I've seen so far re: correlation of body mass with TBA metrics, it doesn't make sense to exclude mass from the model. Body mass has an effect on skeletal structure to varying degrees, depending on the metric of interest. so even though the phylogenetic model that does not include mass as a predictor (ch.44) doesn't perform that much worse than the model including mass (ch.26), I'm not going to extensively consider models that don't take mass into account. I will, however, compare across metrics to see the difference in the influence of mass on the TBA metrics that I'm examining. This is part of the overall goal of the study: to figure out a little more about what the various trabecular bone metrics are being influenced by/are associated with, in addition to the hypothesized biomechanical function of trabecular bone variation.

I think I'll keep doing comparisons with the no-phylogeny type models for now (ch.25 and ch.36, above) because I think that comparison is relevant to understanding HOW phylogenetic structure affects results. I know people always are saying, don't include both phylogenetic and non-phylogenetic analyses in your results, but I think it's useful. Thoughts on this point would be appreciated. 

So, I'm going to try and explain some of my thoughts about what these results mean biomechanically and ecologically. But first I need to look at the other trabecular bone metrics here. 

######################################
#### Trabecular Thickness (Tb.Th) ####
######################################

A model without genus, mass as the only predictor:

```{r}
# Tb.Th by mass
ch.40 <- 
  brm(data = d, 
      family = gaussian,
      tbth_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.40")
print(ch.40)
```

A scatterplot of the above model, with genera color-coded:
```{r}
ch.40.pl <- fitted(ch.40,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = d, aes(y = tbth_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  labs(x = "Mass in grams (standardized)",
       y = "trabecular thickness (standardized)") 
ch.40.pl
```


Daaang look at that correlation! It's not all that unexpected based on what I've seen in the literature - Tb.Th is VERY closely associated with mass, such that it is almost entirely explained by it. Note the mass_s value in the model: 0.93, with sigma = 0.37. mass_s in the BV.TV model was 0.56, with sigma = 0.83. The relationship with Tb.Th is much stronger. It's possible that there is essentially no information other than body size information in this metric! Let's look closer.

```{r}
# By genus only
ch.50 <- 
  brm(data = d, 
      family = gaussian,
      tbth_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.50")
print(ch.50)
```

```{r}
# By genus and mass
ch.51 <- 
  brm(data = d, 
      family = gaussian,
      tbth_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.51")
print(ch.51)
```

```{r}
# By genus, mass, and phylogeny/intraspecific variance
ch.27 <-
  brm(data = d, 
      family = gaussian,
      tbth_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.27")
print(ch.27)
```

A four part plot like the one I did for BV.TV, but instead of the phylogenetic model with no mass, I reproduced the model with mass only (no genus):
```{r}
ch.50_halfeye <- ch.50 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "Tb.Th", y = "genus")+
  ggtitle(label = "Tb.Th by genus only")

ch.51_halfeye <- ch.51 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Tb.Th by genus/mass")+
  labs(x = "Tb.Th", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.27_halfeye <- ch.27 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Tb.Th by genus/mass/phylo/intrasp.")+
  labs(x = "Tb.Th", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.40.pl.nokey <- ch.40.pl +
  theme(legend.position = "none")
ch.50_halfeye/ch.40.pl.nokey|ch.51_halfeye/ch.27_halfeye
```

From this set of plots, I think we can say that Tb.Th variance is almost totally explained by body mass and phylogenetic covariance structure. The change in estimated difference among genera between the genus/mass model and the phylogenetic model is dramatic. Here's comparison plots. I truncated the genus names so that the plots are a little easier to fit next to each other (they just have the "omys" part removed):
```{r}
diff.tbth.ch.51 <- ch.51 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  mutate(.variable = str_replace_all(.variable, "omys", "")) %>% 
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Tb.Th difference, genus+mass")+
  labs(x = "difference", y = "comparison")+
  theme(plot.title = element_text(size = 9))+
  scale_x_continuous(limits = c(-3, 3))

diff.tbth.ch.27 <- ch.27 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  mutate(.variable = str_replace_all(.variable, "omys", "")) %>% 
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Tb.Th difference, genus+mass+phylo")+
  labs(x = "difference", y = "comparison")+
  theme(plot.title = element_text(size = 9))+
  scale_x_continuous(limits = c(-3, 3))

diff.tbth.ch.51|diff.tbth.ch.27
```

Also - note the width of the distributions in the plot on the right (I made the scales the same so you can see the difference).

In light of this dramatic difference, let's look at lambda for this one:
```{r}
hyp <- "sd_phylo__Intercept^2 / (sd_phylo__Intercept^2 + sd_taxon__Intercept^2 + sigma^2) = 0"
h.tbth <- hypothesis(ch.27, hyp, class = NULL)
h.tbth
```

Oooookay so the estimate of lambda is 0.43, with an upper CI of 0.87. The plot:
```{r}
ph.tbth.pl <- ggplot() +
  geom_density(aes(x = h.tbth$samples$H1), fill = "orange", alpha = 0.5) +
  theme_bw() +
  xlim(0,1) +
  labs(y = "density", x = "lambda: trabecular thickness")
ph.tbth.pl
```

I would say in this one that most of the probability density is towards the higher end of the range, suggesting that there's a pretty big influence of phylogeny on this particular metric. Compare with the one for BV.TV:
```{r}
ph.bvtv.pl/ph.tbth.pl
```

So on the whole, there's likely to be more of a phylogenetic signal in Tb.Th than BV.TV. There is that peak on the far left (at around 0.03), similar in location to the one in the BV.TV lambda, but the peak is lower and so represents a smaller proportion of the total probaility density.

Model comparison and check for outliers:
```{r}
ch.40 <- add_criterion(ch.40, c("loo", "waic"))
ch.50 <- add_criterion(ch.50, c("loo", "waic"))
ch.51 <- add_criterion(ch.51, c("loo", "waic"))
ch.27 <- add_criterion(ch.27, c("loo", "waic"))

loo(ch.40) # mass only : 0 Pareto K > 0.5
loo(ch.50) # genus only : 0 Pareto K > 0.5
loo(ch.51) # mass and genus : 1 Pareto K > 0.5
loo(ch.27) # mass genus and phylo: 1 Pareto K > 0.5
```

Once again the models with multiple predictors (ch.51 and ch.27) have 1 specimen each with excessive influence. Who are they?
```{r}
tibble(k   = ch.51$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 193526 Archboldomys maximus, K = 0.52
tibble(k   = ch.27$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216435 Apomys sierrae, K = 0.52
```

Ok so. Since the cutoff for being considered "ok" in these is 0.50, and both of them have K = 0.52ish, I think they are probably not doing anything crazy to the model. Certainly nothing as dramatic as the Apomys sierrae specimen was doing to the BV.TV model (for which K = 0.69).

Compare models directly:
```{r}
tbth_comp <- loo_compare(ch.40, ch.50, ch.51, ch.27, criterion = "waic") 

# Convert to WAIC differences among models, including standard error term.
cbind(waic_diff = tbth_comp[, 1] * -2,
      se        = tbth_comp[, 2] * 2)

tbth_comp[, 7:8] %>% 
  data.frame() %>% 
  rownames_to_column("model_name") %>% 
  mutate(model_name = fct_reorder(model_name, waic, .desc = T)) %>% 
  ggplot(aes(x = waic, y = model_name, 
             xmin = waic - se_waic, 
             xmax = waic + se_waic)) +
  geom_pointrange(shape = 21) +
  labs(title = "WAIC comparison across Tb.Th estimation models",
       x = "WAIC", y = "model") +
  theme(axis.ticks.y = element_blank())
```

So: genus only (ch.50) performed the worst; mass only (ch.40) and genus+mass (ch.51) performed ok but not as well as the phylogenetic model (there is plenty of overlap in the WAIC standard error among them though). The phylogenetic model (+ intraspecific variance) with genus and mass as predictors will be the best at predicting out-of-sample. This is the same ranking of models that we found in BV.TV.

What does that mean? Does it mean anything? Think about it.

Conclusion from this section: Tb.Th is maybe not giving any information about organismal ecology (substrate use) here. Instead, most of the variance is related to phylogenetic covariance structure, intraspecific variance, and body mass. 

####################################
#### Trabecular Spacing (Tb.Sp) ####
####################################

A model without genus, mass as the only predictor:
```{r}
# Tb.Sp by mass
ch.52 <- 
  brm(data = d, 
      family = gaussian,
      tbsp_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.52")
print(ch.52)
```

Ha ha, sigma = 0.95. Guess what this plot is gonna look like... not a line, that's for sure.
Plot:
```{r}
ch.52.pl <- fitted(ch.52,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = d, aes(y = tbsp_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  labs(x = "Mass in grams (standardized)",
       y = "trabecular spacing (standardized)") 
ch.52.pl
```

Right off the bat we can tell that there is a lot more variance in Tb.Sp than in either of the other metrics we've looked at so far - but there still seems to be some correlation with size. I'm pretty sure that the Archboldomys and Soricomys way up at the top there have caused trouble in previous sets of analyses - they have portions of the vertebral centrum that are basically empty, which means that the "spacing" in that portion of the structure is the same size as the entire inner diameter of the centrum. The Apomys way down at the bottom is a known weirdo also (maybe a really old individual). These three will probably have lots of influence in these models. We'll get to that below.

```{r}
# By genus only
ch.53 <- 
  brm(data = d, 
      family = gaussian,
      tbsp_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.53")
print(ch.53)
```

```{r}
# By genus and mass
ch.54 <- 
  brm(data = d, 
      family = gaussian,
      tbsp_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.54")
print(ch.54)
```

```{r}
# By genus, mass, and phylogeny/intraspecific variance
ch.29 <-
  brm(data = d, 
      family = gaussian,
      tbsp_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.29")

print(ch.29)
```

Four part plot:
```{r}
ch.53_halfeye <- ch.53 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "Tb.Sp", y = "genus")+
  ggtitle(label = "Tb.Sp by genus only")

ch.54_halfeye <- ch.54 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Tb.Sp by genus/mass")+
  labs(x = "Tb.Sp", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.29_halfeye <- ch.29 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Tb.Sp by genus/mass/phylo/intrasp.")+
  labs(x = "Tb.Sp", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.52.pl.nokey <- ch.52.pl +
  theme(legend.position = "none")
ch.53_halfeye/ch.52.pl.nokey|ch.54_halfeye/ch.29_halfeye
```

From this set of plots, it looks like there is maybe something interesting going on with Rhynchomys. Even though there is almost full overlap between a lot of the probability distributions (even before considering phylogenetic covariance), the mean estimate for Rhynchomys departs from the global mean more than any of the other genera. This might also be driven by the large RANGE of values for Tb.Sp, which is especially common in Rhynchomys and Archboldomys. Sometimes they have areas where the "trabecular spacing" is equal to the entire internal diameter of the vertebral centrum because there are no trabeculae at all. I have a feeling that there will be a lot of outliers driving the patterns in this one. 

Comparison plots:
```{r}
diff.tbth.ch.54 <- ch.54 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  mutate(.variable = str_replace_all(.variable, "omys", "")) %>% 
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Tb.Sp difference, genus+mass")+
  labs(x = "difference", y = "comparison")+
  theme(plot.title = element_text(size = 9))+
  scale_x_continuous(limits = c(-3, 3))

diff.tbth.ch.29 <- ch.29 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  compare_levels(.value, by = .variable) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  mutate(.variable = str_replace_all(.variable, "omys", "")) %>% 
  ungroup() %>%
  mutate(loco = reorder(.variable, .value)) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye() +
  geom_vline(xintercept = 0, linetype = "dashed") +
  ggtitle(label = "Tb.Sp difference, genus+mass+phylo")+
  labs(x = "difference", y = "comparison")+
  theme(plot.title = element_text(size = 9))+
  scale_x_continuous(limits = c(-3, 3))

diff.tbth.ch.54|diff.tbth.ch.29
```

Sure enough, the ones involving Rhynchomys have the greatest mean departure from 0, but none of the differences look particularly significant. Gonna try to figure out what's driving this.

```{r}
ch.52 <- add_criterion(ch.52, c("loo", "waic"))
ch.53 <- add_criterion(ch.53, c("loo", "waic"))
ch.54 <- add_criterion(ch.54, c("loo", "waic"))
ch.29 <- add_criterion(ch.29, c("loo", "waic"))

loo(ch.52) # mass only : 0 Pareto K > 0.5
loo(ch.53) # genus only : 2 Pareto K > 0.5
loo(ch.54) # mass and genus : 1 Pareto K > 0.5
loo(ch.29) # mass genus and phylo: 1 Pareto K > 0.5
```

Ok so three of them have outliers! Are they all three gonna be different? Is it gonna be that troublemaking Apomys again?

```{r}
tibble(k   = ch.53$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k))
# 193526 Archboldomys maximus, K = 0.64 AND
# 190968 Soricomys leonardocoi, K = 0.59

tibble(k   = ch.54$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 193526 Archboldomys maximus, K = 0.52

tibble(k   = ch.29$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 193526 Archboldomys maximus, K = 0.55
```

Some thoughts:

1) 193526 Archboldomys maximus is the most influential specimen in all three versions of the model (actually also in all four of the models - I checked ch.52 also), but it doesn't have an excessively high Pareto K, especially in the models that include mass as a predictor. In looking at the specimen's TBA results, it has the lowest BV.TV value of any specimen in the sample (12% bone). It is not the smallest specimen of its species. The only other specimens that comes close to having such a low BV.TV are 190968 (Soricomys leonardocoi) and 188313 (S. montanus), both of which have 14% bone. I think the Archboldomys is the most outstanding partially because it differs dramatically from ALL its conspecifics (Tb.Sp = 0.047 versus all others 0.028 or less), whereas among Soricomys, there is a greater spread of values within each species. 
2) Is this a result of some allometric effect where you lose trabeculae as you get really small? This is a suspicion I've had for a while. It's probably worth doing some allometric scaling investigations here - I haven't been paying attention to that particularly. According to Doube et al. 2011, larger animals have relatively larger trabeculae that are relatively further apart, and that there are fewer of them per unit volume. Their analysis includes shrews down to 3g in mass, so that definitely encompasses the size range of Soricomys and Archboldomys. I'll have to look more closely at their results though, because I don't know how densely they have sampled the 3g-30g range of body masses. Having looked at so many very small species with almost totally empty centra, I am eager to know if my suspicion is founded in biological fact. 

Model comparisons:
```{r}
tbsp_comp <- loo_compare(ch.52, ch.53, ch.54, ch.29, criterion = "waic") %>% 
  print(simplify = F)

# Convert to WAIC differences among models, including standard error term.
cbind(waic_diff = tbsp_comp[, 1] * -2,
      se        = tbsp_comp[, 2] * 2)

tbsp_comp[, 7:8] %>% 
  data.frame() %>% 
  rownames_to_column("model_name") %>% 
  mutate(model_name = fct_reorder(model_name, waic, .desc = T)) %>% 
  ggplot(aes(x = waic, y = model_name, 
             xmin = waic - se_waic, 
             xmax = waic + se_waic)) +
  geom_pointrange(shape = 21) +
  labs(title = "WAIC comparison across Tb.Sp estimation models",
       x = "WAIC", y = "model") +
  theme(axis.ticks.y = element_blank())

```

They are all the same, for all intents and purposes. Does this mean that the addition of predictors/information in the different models gives you NO additional capacity to predict Tb.Sp? Would this change if I removed the most extreme outliers? It seems like that's maybe worth a shot. This is the only metric so far where I feel like removing the outliers is maybe justified, but I also think that they may indicate something biologically relevant. 

Let's look at some more TBA metrics before I go excluding anything.  

######################################
#### Connectivity Density (Conn.D)####
######################################

A model without genus, mass as the only predictor:
```{r}
# Conn.D by mass
ch.55 <- 
  brm(data = d, 
      family = gaussian,
      cond_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.55")
print(ch.55)
```

Pretty large error, and a strong-ish inverse correlation with mass. Plot:
```{r}
ch.55.pl <- fitted(ch.55,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = d, aes(y = cond_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  labs(x = "Mass in grams (standardized)",
       y = "connectivity density (standardized)")+
  geom_text_repel(data = d %>% filter(specno %in% c("193526", "193523", "190968", "216435")),  
                  aes(y = cond_s, label = specno), 
                  size = 3,)
ch.55.pl
```

In comparison to the Tb.Sp model, it has less error overall, but there is still a lot of spread around the mean estimate. One Archboldomys and one Apomys are very noticeably above the line this time. I went ahead and labeled them - note that the super high Archboldomys (193523) is not the same specimen that has the super high trabecular spacing (193526, also labeled). This makes sense - larger spaces means lower overall density. S. leonardocoi (190968) makes another appearance pretty far away from its congeners, but it does have one other specimen down there with it, so maybe it's not unreasonably low. 

That Apomys sierrae (216435) is absolutely abnormal, based on its detachment from all the rest of the Apomys. Also it's HUGE (105g, all others in the species are 81-95g). I had previously flagged it just from how weird it looked qualitatively, and LRH and I discussed taking a look at it to see if it's really old or something. We have yet to do that. I'm gonna run these analyses with it included and see how much trouble it causes.

Model permutations:
```{r}
# By genus only

ch.56 <- 
  brm(data = d, 
      family = gaussian,
      cond_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.56")
print(ch.56)
```

What the hell is Soricomys doing??? (...again)

```{r}
# By genus and mass
ch.57 <- 
  brm(data = d, 
      family = gaussian,
      cond_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.57")
print(ch.57)
```

```{r}
# By genus, mass, and phylogeny/intraspecific variance
ch.58 <-
  brm(data = d, 
      family = gaussian,
      cond_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.58")

print(ch.58)
```

4-part plot:

```{r}
ch.56_halfeye <- ch.56 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "Conn.D", y = "genus")+
  ggtitle(label = "Conn.D by genus only")

ch.57_halfeye <- ch.57 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Conn.D by genus/mass")+
  labs(x = "Conn.D", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.58_halfeye <- ch.58 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>% 
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Conn.D by genus/mass/phylo/intrasp.")+
  labs(x = "Conn.D", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.55.pl.nokey <- ch.55.pl +
  theme(legend.position = "none")
ch.56_halfeye/ch.55.pl.nokey|ch.57_halfeye/ch.58_halfeye
```

Apart from the weird Apomys, it seems like the variance in Conn.D decreases with increasing body size. Is that just because we're getting a larger sample of bone in the larger animals, so the differences come out in the wash? I seem to remember a paper where they did TBA analyses on pieces of trabecular bone that gradually decrease in size relative to the whole bone, and they found a point at which the estimate becomes unreliable because it reaches the scale where trabecular bone appears heterogeneous. This might be the one where they established the "cutoff" for connectivity, saying you can't reasonably analyze a piece of trabecular bone unless it has AT LEAST X many trabeculae in there. 

This again suggests to me that I need to come up with some way of quantifying the heterogeneity of trabecular structure in these animals. I found a paper that did it with pixel intensity on representative slices. Amson's global compactness measure gets at this to some degree because it measures slice by slice, but I'm not interested in taking an average there because it loses all of the spatial resolution, which is what I'm interested in. 

I'm  mostly certain that there will be no significant differences if I subtract the probability distributions. I think the next step here is to consider the effect of A. sierrae 216435.

Model comparison:
```{r}
ch.55 <- add_criterion(ch.55, c("loo", "waic"))
ch.56 <- add_criterion(ch.56, c("loo", "waic"))
ch.57 <- add_criterion(ch.57, c("loo", "waic"))
ch.58 <- add_criterion(ch.58, c("loo", "waic"))

loo(ch.55) # mass only : 0 Pareto K > 0.5
loo(ch.56) # genus only : 1 Pareto K > 0.5
loo(ch.57) # mass and genus : 1 Pareto K > 0.7
loo(ch.58) # mass genus and phylo: 1 Pareto K > 0.7

# All right who is it? I'm sure I already know. 
tibble(k   = ch.57$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216435 Apomys sierrae, K = 0.92, to the surprise of nobody.
tibble(k   = ch.58$criteria$loo$diagnostics$pareto_k,
       row = 1:67,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216435 Apomys sierrae, K = 0.77.

```

This suggests to me that removing that outlier is pretty much essential to understanding the pattern overall. 
##########################################
#### Conn.D without A. sierrae 216435 ####
##########################################
```{r}
dc <- d[d$specno!="216435",]

ch.59 <- 
  brm(data = dc, 
      family = gaussian,
      cond_s ~ 1+mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.59")
print(ch.59)

ch.59.pl <- fitted(ch.59,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  # plot
  ggplot(aes(x = mass_s)) +
  geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
              stat = "identity",
              alpha = 1/5, size = 1, color = "black") +
  geom_point(data = dc, aes(y = cond_s, color = genus), size = 4)+
  scale_color_manual(values = cols)+
  labs(x = "Mass in grams (standardized)",
       y = "connectivity density (standardized)")+
  geom_text_repel(data = d %>% filter(specno %in% c("193525", "193523", "190968", "190977")),  
                  aes(y = cond_s, label = specno), 
                  size = 3,)
ch.59.pl
```

How's it look?
```{r}
ch.59 <- add_criterion(ch.59, c("loo", "waic"))
loo(ch.59)

tibble(k   = ch.59$criteria$loo$diagnostics$pareto_k,
       row = 1:66,
       specimen = paste(d$specno[row], d$taxon[row])) %>% 
  arrange(desc(k)) # 216435 Apomys sierrae, K = 0.77.
```

Much better. However, this makes it so that if I want to compare directly across the models I have to re-run all of them without that specimen. Let's see how that affects the results. 
```{r}
# By genus only

ch.60 <- 
  brm(data = dc, 
      family = gaussian,
      cond_s ~ 0 + genus,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.60")
print(ch.60)

# By genus and mass
ch.61 <- 
  brm(data = dc, 
      family = gaussian,
      cond_s ~ 0 + genus + mass_s,
      prior = c(prior(normal(0, 1), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.57")
print(ch.61)

# By genus, mass, and phylogeny/intraspecific variance
ch.62 <-
  brm(data = dc, 
      family = gaussian,
      cond_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
      control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
      prior = c(
        prior(normal(0, 1), class = b, coef = genusApomys),
        prior(normal(0, 1), class = b, coef = genusArchboldomys),
        prior(normal(0, 1), class = b, coef = genusChrotomys),
        prior(normal(0, 1), class = b, coef = genusRhynchomys),
        prior(normal(0, 1), class = b, coef = genusSoricomys),
        prior(normal(0, 1), class = b, coef = mass_s),
        prior(normal(0, 1), class = sd),
        prior(exponential(1), class = sigma)
        ),
      data2 = list(ch = ch),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.62")

print(ch.62)
```

Four-part plot:
```{r}
ch.60_halfeye <- ch.60 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))+
  labs(x = "Conn.D", y = "genus")+
  ggtitle(label = "Conn.D by genus only")

ch.61_halfeye <- ch.61 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Conn.D by genus/mass")+
  labs(x = "Conn.D", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.62_halfeye <- ch.62 %>%
  gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
  mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeye(aes(fill = .variable), 
               point_fill = "#000000", 
               shape = 21, 
               point_size = 3, 
               point_color = "#FFFFFF",
               interval_size = 10,
               interval_color = "grey40",
               .width = .89) +
  scale_fill_manual(values = cols)+
  geom_vline(xintercept = 0, linetype = "dashed")+
  ggtitle(label = "Conn.D by genus/mass/phylo")+
  labs(x = "Conn.D", y = "genus")+
  theme(legend.position = "none", 
        plot.title = element_text(size = 9))

ch.59.pl.nokey <- ch.59.pl +
  theme(legend.position = "none")
ch.60_halfeye/ch.59.pl.nokey|ch.61_halfeye/ch.62_halfeye
```

Well. It looks like Rhynchomys is maybe the interesting guy here. A higher Conn.D than anyone else (but not significantly so). Could this have something to do with the variety of forces on the veretbral column of ricochetal animals? Look into this. Since it seems like degree of anisotropy is not very reliable when you get down to this small of a bone sample, connectivity density could help in understanding the overall structure.