-
Notifications
You must be signed in to change notification settings - Fork 0
/
Chrotomyini_compgenus_analyses_05022022.Rmd
1290 lines (1119 loc) · 61.6 KB
/
Chrotomyini_compgenus_analyses_05022022.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Chrotomys analysis compilation - genera"
author: "S.M. Smith"
date: "5/2/2022"
output:
html_document:
keep_md: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
pacman::p_load(install = F, "ape", "bayesplot","BiocManager", "brms", "broom", "dagitty", "devtools", "flextable", "ggdark", "ggmcmc", "ggrepel", "gtools", "lattice","loo", "patchwork", "rcartocolor", "Rcpp", "remotes", "rstan", "StanHeaders", "statebins", "tidybayes", "viridis", "viridisLite", "pacman")
```
Load up Chrotomyini trabecular bone architecture (TBA) data and standardize variables:
```{r}
d <- read.csv(file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\05062022 Philippine Murids segmentation parameters and morphological data - TBA data total BoneJ (full).csv", header = T)
d <- d[d$tribe=="chroto",c(1:2, 4:23)]
d <-
d %>%
mutate(bvtv = as.numeric(bvtv))
d <-
d %>%
mutate(mass_s = rethinking::standardize(log10(mass_g)),
elev_s = rethinking::standardize(elev),
bvtv_s = rethinking::standardize(bvtv),
tbth_s = rethinking::standardize(tbth),
tbsp_s = rethinking::standardize(tbsp),
conn_s = rethinking::standardize(conn),
cond_s = rethinking::standardize(m_connd),
cond_s2 = rethinking::standardize(connd),
da_s = rethinking::standardize(da))
# remove C. gonzalesi and R. isarogensis, singletons:
d <-
d %>%
filter(taxon!="Chrotomys_gonzalesi") %>%
filter(taxon!="Rhynchomys_isarogensis")
# Make categorical vars into factors
d <-
d %>%
mutate(loco = factor(loco),
hab_simp = factor(hab_simp),
genus = factor(genus))
# Specify colors for plots:
cols = c("#86acca","#ab685b", "#3370a3", "#1f314c","#5a9fa8")
```
Load in phylogeny:
REMEMBER: A <- ape::vcv.phylo(phylo), add corr = T if your tree is NOT SCALED TO 1.
```{r}
ch.tre <- read.nexus(file = "G:\\My Drive\\Philippine rodents\\Chrotomys\\analysis\\SMS_PRUNED_and_COLLAPSED_03292022_OTUsrenamed_Rowsey_PhBgMCC_LzChrotomyini.nex")
ch <- ape::vcv.phylo(ch.tre, corr = T)
d <-
d %>%
mutate(phylo = taxon)
```
#######################################################################
#### Comparisons of several models for each trabecular bone metric ####
#######################################################################
I'm going to put together and compare a few different models for each trabecular bone metric, with a few combinations of potential model components. This is for the purpose of not only comparing fit (i.e., how good each model is at predicting out-of-sample), but also to understand how the addition of predictors changes the models' estimates. I'll also include some assessments of the influence of individual specimens on the model fit.
######################################################################################
#### Bone Volume Fraction (BV.TV), and some general methodological considerations ####
######################################################################################
For comparison: a model that doesn't include genus as a predictor:
```{r}
# BV.TV by mass
ch.36 <-
brm(data = d,
family = gaussian,
bvtv_s ~ 1+mass_s,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.36")
print(ch.36)
```
A scatterplot of the above model, with genera color-coded (the same color code to be used for the remainder of this document):
```{r}
range(d$mass_s)
nd <- tibble(mass_s = seq(from = -2, to = 1.75, length.out = 67))
fitted(ch.36,
newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
# plot
ggplot(aes(x = mass_s)) +
geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
stat = "identity",
alpha = 1/5, size = 1, color = "black") +
geom_point(data = d, aes(y = bvtv_s, color = genus), size = 4)+
scale_color_manual(values = cols)+
xlim(min(nd), max(nd))+
labs(x = "Mass in grams (standardized)",
y = "bone volume fraction (standardized)")
```
From this plot, you can really see that genus is closely linked to body mass. You can also see that there is definitely some correlation between mass and BV.TV but the relationship has a lot of error. You can also detect this from the model parameters: mass_s (the influence of mass, a.k.a slope, listed under "Population-level effects") = 0.56, with CIs 0.36-0.76, and sigma = 0.83.
Now a series of models that estimate a separate intercept (= estimate of mean BV.TV) for each genus, which here is a proxy for substrate use. I printed each model for quick inspection, and at the end of all the models, there is a plot including all four of them.
```{r}
# By genus only
ch.24 <-
brm(data = d,
family = gaussian,
bvtv_s ~ 0 + genus,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.24")
print(ch.24)
```
```{r}
# By genus and mass
ch.25 <-
brm(data = d,
family = gaussian,
bvtv_s ~ 0 + genus + mass_s,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.25")
print(ch.25)
```
```{r}
# By genus and phylogeny/intraspecific variance, no mass
ch.44 <-
brm(data = d,
family = gaussian,
bvtv_s ~ 0 + genus + (1|gr(phylo, cov = ch)) + (1|taxon),
control = list(adapt_delta = 0.89), #inserted to decrease the number of divergent transitions here
prior = c(
prior(normal(0, 1), class = b, coef = genusApomys),
prior(normal(0, 1), class = b, coef = genusArchboldomys),
prior(normal(0, 1), class = b, coef = genusChrotomys),
prior(normal(0, 1), class = b, coef = genusRhynchomys),
prior(normal(0, 1), class = b, coef = genusSoricomys),
prior(normal(0, 1), class = sd),
prior(exponential(1), class = sigma)
),
data2 = list(ch = ch),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.44")
print(ch.44)
```
```{r}
# By genus, mass, and phylogeny/intraspecific variance
ch.26 <-
brm(data = d,
family = gaussian,
bvtv_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
control = list(adapt_delta = 0.85), #inserted to decrease the number of divergent transitions here
prior = c(
prior(normal(0, 1), class = b, coef = genusApomys),
prior(normal(0, 1), class = b, coef = genusArchboldomys),
prior(normal(0, 1), class = b, coef = genusChrotomys),
prior(normal(0, 1), class = b, coef = genusRhynchomys),
prior(normal(0, 1), class = b, coef = genusSoricomys),
prior(normal(0, 1), class = b, coef = mass_s),
prior(normal(0, 1), class = sd),
prior(exponential(1), class = sigma)
),
data2 = list(ch = ch),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.26")
print(ch.26)
```
A four-part plot including all of the models:
```{r}
ch.24_halfeye <- ch.24 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
theme(legend.position = "none",
plot.title = element_text(size = 9))+
labs(x = "BV.TV", y = "genus")+
ggtitle(label = "BV.TV by genus only")
ch.25_halfeye <- ch.25 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
ggtitle(label = "BV.TV by genus/mass")+
labs(x = "BV.TV", y = "genus")+
theme(legend.position = "none",
plot.title = element_text(size = 9))
ch.44_halfeye <- ch.44 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
ggtitle(label = "BVTV by genus/phylo/intrasp")+
labs(x = "BV.TV", y = "genus")+
theme(legend.position = "none",
plot.title = element_text(size = 9))
ch.26_halfeye <- ch.26 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
ggtitle(label = "BV.TV by genus/mass/phylo/intrasp.")+
labs(x = "BV.TV", y = "genus")+
theme(legend.position = "none",
plot.title = element_text(size = 9))
ch.24_halfeye/ch.44_halfeye|ch.25_halfeye/ch.26_halfeye
```
Takeaways from this set of plots:
1) Accounting for mass in estimation of BV.TV makes the relative estimates of BV.TV across genera change pretty dramatically. For its body size, Soricomys has relatively high BV.TV, and Rhynchomys has low BV.TV for its body size. Chrotomys, Archboldomys, and Apomys have estimated BV.TV values that are pretty close to the mean (represented by the vertical dotted line). One way to tell the significance of the differences between genera is to subtract the posterior probability distributions from one another.The difference between distributions is "significant" if the distribution of differences between them does not overlap 0. I'm not really a fan of statistical significance but it's good to have some definition of that I guess.
Here are the difference distributions between pairs of genera for the model including only genus and mass as predictors (upper right in the preceding plot):
```{r}
ch.25 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
compare_levels(.value, by = .variable) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ungroup() %>%
mutate(loco = reorder(.variable, .value)) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye() +
geom_vline(xintercept = 0, linetype = "dashed") +
ggtitle(label = "Between-genus BV.TV difference distributions")+
labs(x = "difference", y = "comparison")
```
From this comparison, it looks like Rhynchomys has significantly lower bone volume fraction than pretty much everything it is compared to, with the possible exception of Archboldomys (6th row down) - the distribution of the Rhynchomys-Archboldomys difference overlaps 0 slightly. It also appears that Soricomys has significantly higher BV.TV than Archboldomys.
This brings me to my second takeaway from the 4-part plot.
2) The genus/mass model doesn't include the effects of phylogenetic correlation or intraspecific variance. When we estimate BV.TV (with genus and mass as predictors) with a phylogenetic multilevel model, and consider intraspecific variance as well (lower right in the 4-part plot above), the difference distributions look like this:
```{r}
ch.26 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
compare_levels(.value, by = .variable) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ungroup() %>%
mutate(loco = reorder(.variable, .value)) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye() +
geom_vline(xintercept = 0, linetype = "dashed") +
ggtitle(label = "Between-genus BV.TV difference distributions")+
labs(x = "difference", y = "comparison")
```
The relative location of each difference distribution hasn't changed dramatically (e.g., note location of Rhynchomys comparisons relative to the mean), but the probability density is much more spread out. This is telling us that, because our specimens share some features as a result of phylogenetic structure, the model is less sure that there are significant differences between genera that are related specifically to their generic properties and not to phylogenetic correlation.
So how strong is the influence of phylogeny on BV.TV? The model outputs a value that is called "sd_phylo__Intercept", which, as I understand it, is an estimate of the amount of variance in the data that is related to phylogenetic correlation structure. We can use that to calculate the more familiar lambda, a.k.a. phylogenetic signal.
(A reminder for me of what phylogenetic signal is: it is an intra-class correlation coefficient (ICC), which tells you how much things that are in groups together resemble each other, quantitatively speaking. 0 means that the individuals don't show any similarity that can be attributed to their relatedness (no phylogenetic structure). 1 Means that the trait you're looking at evolves exactly as expected under Brownian Motion evolution - that is, the phylogenetic structure underlying the data set explains ALL of the variation you see among the individual measurements. This is different from considering intraspecific variation as a potential source of covariance. Including the (1|taxon) term accounts for group-level similarity that is related to taxon (species) ONLY, irrespective of broader phylogenetic structure (it does not refer to the phylogenetic covariance matrix in its calculations). I feel like the more times I try and explain this to myself, the better I will understand it. I've always found PCM concepts somewhat opaque and I'm really trying to be clear about them so that I can mayyybe more precisely explain what my results mean. Thanks for sticking with me.)
Here's a calculation of lambda for ch.26, the model that includes genus, mass, phylogeny, and intraspecific variance:
```{r}
hyp <- "sd_phylo__Intercept^2 / (sd_phylo__Intercept^2 + sd_taxon__Intercept^2 + sigma^2) = 0"
h.bvtv <- hypothesis(ch.26, hyp, class = NULL)
h.bvtv
```
The estimate of lambda here is 0.29. I would say that's medium-low? But why should we trust a point estimate here (especially when the upper 95% CI is 0.74)? We can look at lambda as a probability distribution instead. This will show us where the majority of the probability density actually falls.
```{r}
ph.bvtv.pl <- ggplot() +
geom_density(aes(x = h.bvtv$samples$H1), fill = "red", alpha = 0.5) +
theme_bw() +
xlim(0,1) +
labs(y = "density", x = "lambda: bone volume fraction")
ph.bvtv.pl
```
I had never thought of lambda as a distribution before this and I think it makes these results both easier and more difficult to understand. From this density plot, we can tell that there's a density peak around, say, 0.03. I would call that a low value for lambda. HOWEVER - just because that is a local peak does not mean that it represents the majority of the probability density. There is a huge amount of probability density (area under the curve) between lambda = 0.125 and ~0.56. So it makes sense that the point estimate of lambda is in the middle-ish of the values between the peak (~0.03) and the point where the density drops off (~0.56).
What the hell does this mean about the actual influence of phylogeny on BV.TV? I'm still trying to figure that out. Phylogenetic structure in these data clearly explains some of the variation in BV.TV across genera - which, yeah, of course it does, you're using genus as a proxy for substrate-use groups (e.g., Chrotomys = varying degrees of digging, Rhynchomys = bipedal hopping, etc). But after accounting for mass and group-level effects (phylo/intraspecific var) in our model, there are still differences between genera. They aren't necessarily "significant", according to the difference distributions shown above, but there are definitely signals that could be related to organismal ecology.
Ok departing from the discussion of the plots above: I'm gonna do some model comparison to see which model is best at predicting out-of-sample (which is the "best fit" for the data). I'm going to use two methods: the Widely Applicable Information Criterion (WAIC) and Leave-One-Out cross-validation (LOO).
```{r}
ch.36 <- add_criterion(ch.36, c("waic", "loo")) # mass only
ch.24 <- add_criterion(ch.24, c("waic", "loo")) # genus only
ch.25 <- add_criterion(ch.25, c("waic", "loo")) # genus and mass
ch.44 <- add_criterion(ch.44, c("waic", "loo")) # genus and phylogeny
ch.26 <- add_criterion(ch.26, c("waic", "loo")) # genus mass phylogeny
loo(ch.36) # mass only : 0 Pareto K > 0.5
loo(ch.24) # genus only : 0 Pareto K > 0.5
loo(ch.25) # genus and mass : 0 Pareto K > 0.5
loo(ch.44) # genus and phylogeny : 1 Pareto K > 0.5
loo(ch.26) # genus mass phylogeny : 1 Pareto K > 0.5
```
This analysis (LOO) can tell you, among other things, if there are specimens that are having a strong effect on the model fit. It does this by leaving one specimen out at a time (hence the name) and seeing how well the model can predict the value for that left-out specimen when it is not a part of the model. Specimens that are highly influential have a high Pareto K value. In two of the models here (ch.44 and ch.26, the phylogenetic models), one specimen has a relatively high Pareto K. Let's see if it's the same specimen in both.
```{r}
tibble(k = ch.44$criteria$loo$diagnostics$pareto_k,
row = 1:67,
specimen = paste(d$specno[row], d$taxon[row])) %>%
arrange(desc(k)) # 190968 Soricomys leonardocoi, K = 0.59
tibble(k = ch.26$criteria$loo$diagnostics$pareto_k,
row = 1:67,
specimen = paste(d$specno[row], d$taxon[row])) %>%
arrange(desc(k)) # 216462 Apomys sierrae, K = 0.69
```
Why are these specimens potentially having such a strong effect?
1) FMNH 216462, Apomys sierrae: This animal has an anomalously low bone volume fraction compared to all other individuals in its species (17% bone versus all other specimens 29-37% bone). It has the shortest head and body length at 135 mm, compared to ~140-150 for all other specimens. It is also unusual in that it only has five lumbar vertebrae rather than 6. I still analyzed the third lumbar position for it, but it may represent a veeery slightly different functional position in the lumbar column. However - looking at the specimen just now in Dragonfly, all the vertebrae kind of look like that. So It might just be a weirdo.
2) FMNH 190968, Soricomys leonardocoi: This animal also has anomalously low bone volume fraction for its species (14% bone as opposed to all others 19-22% bone). However, this is in an analysis (ch.44) that does not account for body mass. 190968 weighs 35 grams, and the average mass of the species is 32 grams, so it's slightly heavier than average and has a lower BV.TV. In the phylogenetic model that has mass as a predictor (ch.26), its Pareto K is below the "good" threshold of 0.5 (K = 0.41) - it doesn't have a super strong effect on that model fit. So incorporating mass apparently allows the model to not be surprised about this specimen's BV.TV. I'm not sure I have an adequate explanation for how that is happening.
Should I remove these guys, at least the Apomys? I don't know if that's necessary. It seems like it represents true variation in the population, and I'm loath do get rid of it because of that. I could try using a Student distribution for the posterior probability distribution, which would result in a model that is less surprised by outliers, but I haven't decided if I think the problem is extreme enough for that to be necessary.
Now I'm going to compare across models to see which is the best at predicting BV.TV, and by how much. This won't give any information about the causality of the variables involved, but it can tell how big of a difference the inclusion or exclusion of a variable makes in the predictive accuracy of the model.
```{r}
bvtv_comp <- loo_compare(ch.36, ch.24, ch.25, ch.44, ch.26, criterion = "waic")
# Convert to WAIC differences among models, including standard error term.
cbind(waic_diff = bvtv_comp[, 1] * -2,
se = bvtv_comp[, 2] * 2)
bvtv_comp[, 7:8] %>%
data.frame() %>%
rownames_to_column("model_name") %>%
mutate(model_name = fct_reorder(model_name, waic, .desc = T)) %>%
ggplot(aes(x = waic, y = model_name,
xmin = waic - se_waic,
xmax = waic + se_waic)) +
geom_pointrange(shape = 21) +
labs(title = "WAIC comparison across BV.TV estimation models",
x = "WAIC", y = "model") +
theme(axis.ticks.y = element_blank())
```
So in this comparison, the WAIC (= Widely Applicable Information Criterion) indicates that ch.26, the phylogenetic model (+ intraspecific variance) that includes genus and mass, is the best predictive model, but not by much - its standard error (horizontal line on the plot) overlaps that of ch.44 (phylo only) and ch.25 (genus and mass).
Based on what I know about trabecular bone morphology, and what I've seen so far re: correlation of body mass with TBA metrics, it doesn't make sense to exclude mass from the model. Body mass has an effect on skeletal structure to varying degrees, depending on the metric of interest. so even though the phylogenetic model that does not include mass as a predictor (ch.44) doesn't perform that much worse than the model including mass (ch.26), I'm not going to extensively consider models that don't take mass into account. I will, however, compare across metrics to see the difference in the influence of mass on the TBA metrics that I'm examining. This is part of the overall goal of the study: to figure out a little more about what the various trabecular bone metrics are being influenced by/are associated with, in addition to the hypothesized biomechanical function of trabecular bone variation.
I think I'll keep doing comparisons with the no-phylogeny type models for now (ch.25 and ch.36, above) because I think that comparison is relevant to understanding HOW phylogenetic structure affects results. I know people always are saying, don't include both phylogenetic and non-phylogenetic analyses in your results, but I think it's useful. Thoughts on this point would be appreciated.
So, I'm going to try and explain some of my thoughts about what these results mean biomechanically and ecologically. But first I need to look at the other trabecular bone metrics here.
######################################
#### Trabecular Thickness (Tb.Th) ####
######################################
A model without genus, mass as the only predictor:
```{r}
# Tb.Th by mass
ch.40 <-
brm(data = d,
family = gaussian,
tbth_s ~ 1+mass_s,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.40")
print(ch.40)
```
A scatterplot of the above model, with genera color-coded:
```{r}
ch.40.pl <- fitted(ch.40,
newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
# plot
ggplot(aes(x = mass_s)) +
geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
stat = "identity",
alpha = 1/5, size = 1, color = "black") +
geom_point(data = d, aes(y = tbth_s, color = genus), size = 4)+
scale_color_manual(values = cols)+
labs(x = "Mass in grams (standardized)",
y = "trabecular thickness (standardized)")
ch.40.pl
```
Daaang look at that correlation! It's not all that unexpected based on what I've seen in the literature - Tb.Th is VERY closely associated with mass, such that it is almost entirely explained by it. Note the mass_s value in the model: 0.93, with sigma = 0.37. mass_s in the BV.TV model was 0.56, with sigma = 0.83. The relationship with Tb.Th is much stronger. It's possible that there is essentially no information other than body size information in this metric! Let's look closer.
```{r}
# By genus only
ch.50 <-
brm(data = d,
family = gaussian,
tbth_s ~ 0 + genus,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.50")
print(ch.50)
```
```{r}
# By genus and mass
ch.51 <-
brm(data = d,
family = gaussian,
tbth_s ~ 0 + genus + mass_s,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.51")
print(ch.51)
```
```{r}
# By genus, mass, and phylogeny/intraspecific variance
ch.27 <-
brm(data = d,
family = gaussian,
tbth_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
prior = c(
prior(normal(0, 1), class = b, coef = genusApomys),
prior(normal(0, 1), class = b, coef = genusArchboldomys),
prior(normal(0, 1), class = b, coef = genusChrotomys),
prior(normal(0, 1), class = b, coef = genusRhynchomys),
prior(normal(0, 1), class = b, coef = genusSoricomys),
prior(normal(0, 1), class = b, coef = mass_s),
prior(normal(0, 1), class = sd),
prior(exponential(1), class = sigma)
),
data2 = list(ch = ch),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.27")
print(ch.27)
```
A four part plot like the one I did for BV.TV, but instead of the phylogenetic model with no mass, I reproduced the model with mass only (no genus):
```{r}
ch.50_halfeye <- ch.50 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
theme(legend.position = "none",
plot.title = element_text(size = 9))+
labs(x = "Tb.Th", y = "genus")+
ggtitle(label = "Tb.Th by genus only")
ch.51_halfeye <- ch.51 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
ggtitle(label = "Tb.Th by genus/mass")+
labs(x = "Tb.Th", y = "genus")+
theme(legend.position = "none",
plot.title = element_text(size = 9))
ch.27_halfeye <- ch.27 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
ggtitle(label = "Tb.Th by genus/mass/phylo/intrasp.")+
labs(x = "Tb.Th", y = "genus")+
theme(legend.position = "none",
plot.title = element_text(size = 9))
ch.40.pl.nokey <- ch.40.pl +
theme(legend.position = "none")
ch.50_halfeye/ch.40.pl.nokey|ch.51_halfeye/ch.27_halfeye
```
From this set of plots, I think we can say that Tb.Th variance is almost totally explained by body mass and phylogenetic covariance structure. The change in estimated difference among genera between the genus/mass model and the phylogenetic model is dramatic. Here's comparison plots. I truncated the genus names so that the plots are a little easier to fit next to each other (they just have the "omys" part removed):
```{r}
diff.tbth.ch.51 <- ch.51 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
compare_levels(.value, by = .variable) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
mutate(.variable = str_replace_all(.variable, "omys", "")) %>%
ungroup() %>%
mutate(loco = reorder(.variable, .value)) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye() +
geom_vline(xintercept = 0, linetype = "dashed") +
ggtitle(label = "Tb.Th difference, genus+mass")+
labs(x = "difference", y = "comparison")+
theme(plot.title = element_text(size = 9))+
scale_x_continuous(limits = c(-3, 3))
diff.tbth.ch.27 <- ch.27 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
compare_levels(.value, by = .variable) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
mutate(.variable = str_replace_all(.variable, "omys", "")) %>%
ungroup() %>%
mutate(loco = reorder(.variable, .value)) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye() +
geom_vline(xintercept = 0, linetype = "dashed") +
ggtitle(label = "Tb.Th difference, genus+mass+phylo")+
labs(x = "difference", y = "comparison")+
theme(plot.title = element_text(size = 9))+
scale_x_continuous(limits = c(-3, 3))
diff.tbth.ch.51|diff.tbth.ch.27
```
Also - note the width of the distributions in the plot on the right (I made the scales the same so you can see the difference).
In light of this dramatic difference, let's look at lambda for this one:
```{r}
hyp <- "sd_phylo__Intercept^2 / (sd_phylo__Intercept^2 + sd_taxon__Intercept^2 + sigma^2) = 0"
h.tbth <- hypothesis(ch.27, hyp, class = NULL)
h.tbth
```
Oooookay so the estimate of lambda is 0.43, with an upper CI of 0.87. The plot:
```{r}
ph.tbth.pl <- ggplot() +
geom_density(aes(x = h.tbth$samples$H1), fill = "orange", alpha = 0.5) +
theme_bw() +
xlim(0,1) +
labs(y = "density", x = "lambda: trabecular thickness")
ph.tbth.pl
```
I would say in this one that most of the probability density is towards the higher end of the range, suggesting that there's a pretty big influence of phylogeny on this particular metric. Compare with the one for BV.TV:
```{r}
ph.bvtv.pl/ph.tbth.pl
```
So on the whole, there's likely to be more of a phylogenetic signal in Tb.Th than BV.TV. There is that peak on the far left (at around 0.03), similar in location to the one in the BV.TV lambda, but the peak is lower and so represents a smaller proportion of the total probaility density.
Model comparison and check for outliers:
```{r}
ch.40 <- add_criterion(ch.40, c("loo", "waic"))
ch.50 <- add_criterion(ch.50, c("loo", "waic"))
ch.51 <- add_criterion(ch.51, c("loo", "waic"))
ch.27 <- add_criterion(ch.27, c("loo", "waic"))
loo(ch.40) # mass only : 0 Pareto K > 0.5
loo(ch.50) # genus only : 0 Pareto K > 0.5
loo(ch.51) # mass and genus : 1 Pareto K > 0.5
loo(ch.27) # mass genus and phylo: 1 Pareto K > 0.5
```
Once again the models with multiple predictors (ch.51 and ch.27) have 1 specimen each with excessive influence. Who are they?
```{r}
tibble(k = ch.51$criteria$loo$diagnostics$pareto_k,
row = 1:67,
specimen = paste(d$specno[row], d$taxon[row])) %>%
arrange(desc(k)) # 193526 Archboldomys maximus, K = 0.52
tibble(k = ch.27$criteria$loo$diagnostics$pareto_k,
row = 1:67,
specimen = paste(d$specno[row], d$taxon[row])) %>%
arrange(desc(k)) # 216435 Apomys sierrae, K = 0.52
```
Ok so. Since the cutoff for being considered "ok" in these is 0.50, and both of them have K = 0.52ish, I think they are probably not doing anything crazy to the model. Certainly nothing as dramatic as the Apomys sierrae specimen was doing to the BV.TV model (for which K = 0.69).
Compare models directly:
```{r}
tbth_comp <- loo_compare(ch.40, ch.50, ch.51, ch.27, criterion = "waic")
# Convert to WAIC differences among models, including standard error term.
cbind(waic_diff = tbth_comp[, 1] * -2,
se = tbth_comp[, 2] * 2)
tbth_comp[, 7:8] %>%
data.frame() %>%
rownames_to_column("model_name") %>%
mutate(model_name = fct_reorder(model_name, waic, .desc = T)) %>%
ggplot(aes(x = waic, y = model_name,
xmin = waic - se_waic,
xmax = waic + se_waic)) +
geom_pointrange(shape = 21) +
labs(title = "WAIC comparison across Tb.Th estimation models",
x = "WAIC", y = "model") +
theme(axis.ticks.y = element_blank())
```
So: genus only (ch.50) performed the worst; mass only (ch.40) and genus+mass (ch.51) performed ok but not as well as the phylogenetic model (there is plenty of overlap in the WAIC standard error among them though). The phylogenetic model (+ intraspecific variance) with genus and mass as predictors will be the best at predicting out-of-sample. This is the same ranking of models that we found in BV.TV.
What does that mean? Does it mean anything? Think about it.
Conclusion from this section: Tb.Th is maybe not giving any information about organismal ecology (substrate use) here. Instead, most of the variance is related to phylogenetic covariance structure, intraspecific variance, and body mass.
####################################
#### Trabecular Spacing (Tb.Sp) ####
####################################
A model without genus, mass as the only predictor:
```{r}
# Tb.Sp by mass
ch.52 <-
brm(data = d,
family = gaussian,
tbsp_s ~ 1+mass_s,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.52")
print(ch.52)
```
Ha ha, sigma = 0.95. Guess what this plot is gonna look like... not a line, that's for sure.
Plot:
```{r}
ch.52.pl <- fitted(ch.52,
newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
# plot
ggplot(aes(x = mass_s)) +
geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
stat = "identity",
alpha = 1/5, size = 1, color = "black") +
geom_point(data = d, aes(y = tbsp_s, color = genus), size = 4)+
scale_color_manual(values = cols)+
labs(x = "Mass in grams (standardized)",
y = "trabecular spacing (standardized)")
ch.52.pl
```
Right off the bat we can tell that there is a lot more variance in Tb.Sp than in either of the other metrics we've looked at so far - but there still seems to be some correlation with size. I'm pretty sure that the Archboldomys and Soricomys way up at the top there have caused trouble in previous sets of analyses - they have portions of the vertebral centrum that are basically empty, which means that the "spacing" in that portion of the structure is the same size as the entire inner diameter of the centrum. The Apomys way down at the bottom is a known weirdo also (maybe a really old individual). These three will probably have lots of influence in these models. We'll get to that below.
```{r}
# By genus only
ch.53 <-
brm(data = d,
family = gaussian,
tbsp_s ~ 0 + genus,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.53")
print(ch.53)
```
```{r}
# By genus and mass
ch.54 <-
brm(data = d,
family = gaussian,
tbsp_s ~ 0 + genus + mass_s,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.54")
print(ch.54)
```
```{r}
# By genus, mass, and phylogeny/intraspecific variance
ch.29 <-
brm(data = d,
family = gaussian,
tbsp_s ~ 0 + genus + mass_s + (1|gr(phylo, cov = ch)) + (1|taxon),
control = list(adapt_delta = 0.98), #inserted to decrease the number of divergent transitions here
prior = c(
prior(normal(0, 1), class = b, coef = genusApomys),
prior(normal(0, 1), class = b, coef = genusArchboldomys),
prior(normal(0, 1), class = b, coef = genusChrotomys),
prior(normal(0, 1), class = b, coef = genusRhynchomys),
prior(normal(0, 1), class = b, coef = genusSoricomys),
prior(normal(0, 1), class = b, coef = mass_s),
prior(normal(0, 1), class = sd),
prior(exponential(1), class = sigma)
),
data2 = list(ch = ch),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.29")
print(ch.29)
```
Four part plot:
```{r}
ch.53_halfeye <- ch.53 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
theme(legend.position = "none",
plot.title = element_text(size = 9))+
labs(x = "Tb.Sp", y = "genus")+
ggtitle(label = "Tb.Sp by genus only")
ch.54_halfeye <- ch.54 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
ggtitle(label = "Tb.Sp by genus/mass")+
labs(x = "Tb.Sp", y = "genus")+
theme(legend.position = "none",
plot.title = element_text(size = 9))
ch.29_halfeye <- ch.29 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(aes(fill = .variable),
point_fill = "#000000",
shape = 21,
point_size = 3,
point_color = "#FFFFFF",
interval_size = 10,
interval_color = "grey40",
.width = .89) +
scale_fill_manual(values = cols)+
geom_vline(xintercept = 0, linetype = "dashed")+
ggtitle(label = "Tb.Sp by genus/mass/phylo/intrasp.")+
labs(x = "Tb.Sp", y = "genus")+
theme(legend.position = "none",
plot.title = element_text(size = 9))
ch.52.pl.nokey <- ch.52.pl +
theme(legend.position = "none")
ch.53_halfeye/ch.52.pl.nokey|ch.54_halfeye/ch.29_halfeye
```
From this set of plots, it looks like there is maybe something interesting going on with Rhynchomys. Even though there is almost full overlap between a lot of the probability distributions (even before considering phylogenetic covariance), the mean estimate for Rhynchomys departs from the global mean more than any of the other genera. This might also be driven by the large RANGE of values for Tb.Sp, which is especially common in Rhynchomys and Archboldomys. Sometimes they have areas where the "trabecular spacing" is equal to the entire internal diameter of the vertebral centrum because there are no trabeculae at all. I have a feeling that there will be a lot of outliers driving the patterns in this one.
Comparison plots:
```{r}
diff.tbth.ch.54 <- ch.54 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
compare_levels(.value, by = .variable) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
mutate(.variable = str_replace_all(.variable, "omys", "")) %>%
ungroup() %>%
mutate(loco = reorder(.variable, .value)) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye() +
geom_vline(xintercept = 0, linetype = "dashed") +
ggtitle(label = "Tb.Sp difference, genus+mass")+
labs(x = "difference", y = "comparison")+
theme(plot.title = element_text(size = 9))+
scale_x_continuous(limits = c(-3, 3))
diff.tbth.ch.29 <- ch.29 %>%
gather_draws(b_genusApomys,b_genusArchboldomys, b_genusChrotomys, b_genusRhynchomys, b_genusSoricomys) %>%
compare_levels(.value, by = .variable) %>%
mutate(.variable = str_replace_all(.variable, "b_genus", "")) %>%
mutate(.variable = str_replace_all(.variable, "omys", "")) %>%
ungroup() %>%
mutate(loco = reorder(.variable, .value)) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye() +
geom_vline(xintercept = 0, linetype = "dashed") +
ggtitle(label = "Tb.Sp difference, genus+mass+phylo")+
labs(x = "difference", y = "comparison")+
theme(plot.title = element_text(size = 9))+
scale_x_continuous(limits = c(-3, 3))
diff.tbth.ch.54|diff.tbth.ch.29
```
Sure enough, the ones involving Rhynchomys have the greatest mean departure from 0, but none of the differences look particularly significant. Gonna try to figure out what's driving this.
```{r}
ch.52 <- add_criterion(ch.52, c("loo", "waic"))
ch.53 <- add_criterion(ch.53, c("loo", "waic"))
ch.54 <- add_criterion(ch.54, c("loo", "waic"))
ch.29 <- add_criterion(ch.29, c("loo", "waic"))
loo(ch.52) # mass only : 0 Pareto K > 0.5
loo(ch.53) # genus only : 2 Pareto K > 0.5
loo(ch.54) # mass and genus : 1 Pareto K > 0.5
loo(ch.29) # mass genus and phylo: 1 Pareto K > 0.5
```
Ok so three of them have outliers! Are they all three gonna be different? Is it gonna be that troublemaking Apomys again?
```{r}
tibble(k = ch.53$criteria$loo$diagnostics$pareto_k,
row = 1:67,
specimen = paste(d$specno[row], d$taxon[row])) %>%
arrange(desc(k))
# 193526 Archboldomys maximus, K = 0.64 AND
# 190968 Soricomys leonardocoi, K = 0.59
tibble(k = ch.54$criteria$loo$diagnostics$pareto_k,
row = 1:67,
specimen = paste(d$specno[row], d$taxon[row])) %>%
arrange(desc(k)) # 193526 Archboldomys maximus, K = 0.52
tibble(k = ch.29$criteria$loo$diagnostics$pareto_k,
row = 1:67,
specimen = paste(d$specno[row], d$taxon[row])) %>%
arrange(desc(k)) # 193526 Archboldomys maximus, K = 0.55
```
Some thoughts:
1) 193526 Archboldomys maximus is the most influential specimen in all three versions of the model (actually also in all four of the models - I checked ch.52 also), but it doesn't have an excessively high Pareto K, especially in the models that include mass as a predictor. In looking at the specimen's TBA results, it has the lowest BV.TV value of any specimen in the sample (12% bone). It is not the smallest specimen of its species. The only other specimens that comes close to having such a low BV.TV are 190968 (Soricomys leonardocoi) and 188313 (S. montanus), both of which have 14% bone. I think the Archboldomys is the most outstanding partially because it differs dramatically from ALL its conspecifics (Tb.Sp = 0.047 versus all others 0.028 or less), whereas among Soricomys, there is a greater spread of values within each species.
2) Is this a result of some allometric effect where you lose trabeculae as you get really small? This is a suspicion I've had for a while. It's probably worth doing some allometric scaling investigations here - I haven't been paying attention to that particularly. According to Doube et al. 2011, larger animals have relatively larger trabeculae that are relatively further apart, and that there are fewer of them per unit volume. Their analysis includes shrews down to 3g in mass, so that definitely encompasses the size range of Soricomys and Archboldomys. I'll have to look more closely at their results though, because I don't know how densely they have sampled the 3g-30g range of body masses. Having looked at so many very small species with almost totally empty centra, I am eager to know if my suspicion is founded in biological fact.
Model comparisons:
```{r}
tbsp_comp <- loo_compare(ch.52, ch.53, ch.54, ch.29, criterion = "waic") %>%
print(simplify = F)
# Convert to WAIC differences among models, including standard error term.
cbind(waic_diff = tbsp_comp[, 1] * -2,
se = tbsp_comp[, 2] * 2)
tbsp_comp[, 7:8] %>%
data.frame() %>%
rownames_to_column("model_name") %>%
mutate(model_name = fct_reorder(model_name, waic, .desc = T)) %>%
ggplot(aes(x = waic, y = model_name,
xmin = waic - se_waic,
xmax = waic + se_waic)) +
geom_pointrange(shape = 21) +
labs(title = "WAIC comparison across Tb.Sp estimation models",
x = "WAIC", y = "model") +
theme(axis.ticks.y = element_blank())
```
They are all the same, for all intents and purposes. Does this mean that the addition of predictors/information in the different models gives you NO additional capacity to predict Tb.Sp? Would this change if I removed the most extreme outliers? It seems like that's maybe worth a shot. This is the only metric so far where I feel like removing the outliers is maybe justified, but I also think that they may indicate something biologically relevant.
Let's look at some more TBA metrics before I go excluding anything.
######################################
#### Connectivity Density (Conn.D)####
######################################
A model without genus, mass as the only predictor:
```{r}
# Conn.D by mass
ch.55 <-
brm(data = d,
family = gaussian,
cond_s ~ 1+mass_s,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.55")
print(ch.55)
```
Pretty large error, and a strong-ish inverse correlation with mass. Plot:
```{r}
ch.55.pl <- fitted(ch.55,
newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
# plot
ggplot(aes(x = mass_s)) +
geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
stat = "identity",
alpha = 1/5, size = 1, color = "black") +
geom_point(data = d, aes(y = cond_s, color = genus), size = 4)+
scale_color_manual(values = cols)+
labs(x = "Mass in grams (standardized)",
y = "connectivity density (standardized)")+
geom_text_repel(data = d %>% filter(specno %in% c("193526", "193523", "190968", "216435")),
aes(y = cond_s, label = specno),
size = 3,)
ch.55.pl
```
In comparison to the Tb.Sp model, it has less error overall, but there is still a lot of spread around the mean estimate. One Archboldomys and one Apomys are very noticeably above the line this time. I went ahead and labeled them - note that the super high Archboldomys (193523) is not the same specimen that has the super high trabecular spacing (193526, also labeled). This makes sense - larger spaces means lower overall density. S. leonardocoi (190968) makes another appearance pretty far away from its congeners, but it does have one other specimen down there with it, so maybe it's not unreasonably low.
That Apomys sierrae (216435) is absolutely abnormal, based on its detachment from all the rest of the Apomys. Also it's HUGE (105g, all others in the species are 81-95g). I had previously flagged it just from how weird it looked qualitatively, and LRH and I discussed taking a look at it to see if it's really old or something. We have yet to do that. I'm gonna run these analyses with it included and see how much trouble it causes.
Model permutations:
```{r}
# By genus only
ch.56 <-
brm(data = d,
family = gaussian,
cond_s ~ 0 + genus,
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
file = "G:\\My Drive\\Philippine rodents\\chrotomyini\\fits\\ch.56")
print(ch.56)
```
What the hell is Soricomys doing??? (...again)
```{r}
# By genus and mass
ch.57 <-