-
Notifications
You must be signed in to change notification settings - Fork 11
/
training-resources-and-plans.Rmd
129 lines (75 loc) · 10.6 KB
/
training-resources-and-plans.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# Training Resources and Plans
***This section under revision as we consider alternate resources.***
If you are planning on spending significant time improving your data science and modeling skills, you will want to create a _training plan_.
## Training plan components
- A description of your goals for the training plans and to which EHA projects
and activities you will apply your skills
- A list of courses/tutorials your plan to complete
- The total time the courses will take to complete.
- The time frame over which you expect to complete them
- The name of a _peer learner_. You should have a peer learner at EcoHealth who will be a partner over the course of your training. This may be someone working on a similar training plan or someone with knowledge of the material already. They should play some or all of these roles:
- Accountability: Your peer learner should know about your training plan and its time frame, and check in on how you are doing.
- Co-learning: Your learning peer and you may want to schedule times to watch course
videos and complete exercises together
- Review: Especially for materials without automating, your peer learner should be able to look at your work and provide feedback
- Motivation: Your peer learner should make your training fun and tell you that you rock.
When your supervisor signs off on your training plan, contact Megan Walsh to
provide you with any subscriptions for the period you need them.
## Training Plans
These are some suggestions for assembling resources and courses into training plans. This is of course a small fraction of the many learning and teaching resources available. Consult your peers, supervisor, and the **\#data-sci-discuss** Slack channel to find courses or resources on the topics you require. If you use a new resource or course, please add to to this page so others can learn from your impressions of it!
### Introductory Programming materials
- [Hands-On Programming with R](https://rstudio-education.github.io/hopr/starting.html): An introduction to R for non-programmers with a focus on project based learning.
- [Introduction to R](https://cengel.github.io/R-intro/): This introduction is designed to get you familiar with R quickly. It covers the basics and explains how to work with common data types in R.
- [Eloquent Javascript](https://eloquentjavascript.net/index.html) A project based book that will take you from the basics to creating websites. For R users, the chapter on data structures is especially helpful for understanding JSON and `jsonlite`.
### Better Managing Data
- [Johns Hopkins Introduction to the Tidyverse](https://jhudatascience.org/tidyversecourse/intro.html) This chapter focuses on the foundational concept of tidy data. That is rectangular data where one row = one observation, all variables are store in their own column, and all cells have a value, even if that value is NA.
- If you are mostly working in spreadsheets but collaborating with R or Python users, or just trying to organize a lot of spreadsheet data for your projects, work through the [Data Carpentry lesson on spreadsheet organization](http://www.datacarpentry.org/spreadsheet-ecology-lesson/) (~2 hours) and
read [Hadley Wickham's paper on tidy data](https://www.jstatsoft.org/article/view/v059i10) (~1 hour)
- [Importing Data](https://intro2r.com/importing-data.html) provides a good overview of basic data imports while [R4DS: Import](https://r4ds.hadley.nz/import.html) demonstrates how to properly import more complex data types.
- [AIRTABLE DATA](#airtable) - see chapter on Airtable
- [ODK DATA](#odk) - see chapter on ODK
- [Reading and writing spatial data with `terra`](https://rspatial.org/spatial/5-files.html) This chapter goes over ingesting spatial data with the `terra` package in R.
- [Geographic Data in R](https://r.geocompx.org/spatial-class.html) This chapter discusses geographic data classes in R and the `sf`, `sp`, and `terra` packages.
### Version Control and Git
- Read through and work through the examples in [Happy Git with R](http://happygitwithr.com/).
- Atlassian has a nice [collection of git tutorials](https://www.atlassian.com/git/tutorials) for beginners and advanced users.
### Reproducible reporting
- [Reports with R Markdown](https://epirhandbook.com/en/reports-with-r-markdown.html) covers making the reproducible reports from the basics to parameterization.
- This chapter provides an [overview of using `targets` in R Markdown](https://books.ropensci.org/targets/literate-programming.html#r-markdown-targets).
### Improving Your Statistical Fundamentals
<!-- - [DataCamp: Introduction to Data](https://www.datacamp.com/courses/introduction-to-data) (4 hours) is a good starter course you have not much or any training in statistics in school and want to get familiar with the language and concepts. -->
<!-- - [DataCamp: Foundations of Probability in R](https://www.datacamp.com/courses/foundations-of-probability-in-r) (4 hours) covers many of the important concepts that those doing Bayesian model work at EHA discuss. Take this if you are just getting started interacting with or reading this work and you don't have a probability background. -->
- [Probability, Statistics, and Data: A fresh approach using R](https://mathstat.slu.edu/~speegle/_book/)
A breadth first approach to probability and statistics that assumes minimal familiarity with calculus.
The book includes data sets that require cleaning and wrangling before they can be
used for analysis and relies on simulations to demonstrate important concepts.
- [An Introduction to Probability and Simulation](https://bookdown.org/kevin_davisross/probsim-book/) This book covers concepts in Probability and Simulation. It encourages users to use
code notebooks (google colab, Jupyter, etc) to work through problems.
- [Foundations of Probability in R](https://rpubs.com/trombleydc/extra_credit_11_6_2017).
A short self guided course on probability and distributions.
- [Statistical Rethinking: A Bayesian Course](https://xcelab.net/rm/statistical-rethinking/). This is an excellent book [and video lecture series](https://www.youtube.com/playlist?list=PLDcUM9US4XdM9_N6XUUFrhghGJ4K25bFc) that gives builds great foundations for doing many types of modeling. Prerequisites are Intermediate R, some experience with linear regression and probability. This course has about 19 hours of video. We recommend 2-3 months for going through the book, video, and exercises.
### Improving your data visualization
<!-- - [Data Visualization with ggplot2 (Parts 1 + 2)](https://www.datacamp.com/courses/data-visualization-with-ggplot2-1) (5 + 5 hours). These will take you much further than the basic visualization you learn in Intro to the Tidyverse. Take them if you aren't able to make the plots you need based on the knowledge you learn in that. Part 3 covers -->
<!-- topics not generally needed. -->
- [Fundamentals of Data Visualization](http://serialmentor.com/dataviz/) by Claus Wilke is an excellent guide to making high-quality figures, focusing more on design than mechanics of programming. R code is available for all of its examples. If you feel you have a solid
grasp of **ggplot2** but want to improve the quality of your figures, we recommend reading this e-book, and using the accompanying code in [its GitHub repository](https://github.com/clauswilke/dataviz) to reproduce figures.
### R Programming
[Advanced R: Functions](https://adv-r.hadley.nz/functions.html) "In this chapter, you’ll learn how to turn informal, working knowledge [of functions] into more rigorous, theoretical understanding."
<!-- - [DataCamp: Intermediate R](https://www.datacamp.com/courses/intermediate-r) (6 hours) or [DataCamp: Writing Functions in R](https://www.datacamp.com/courses/writing-functions-in-r)/ (4 hours) are both good courses to improve your skills if you find yourself managing a more complex workflow or working on an R package with someone at EHA. -->
### Map-making and geospatial analysis in R
- [Geocomputation in R](https://geocompr.robinlovelace.net/) is a comprehensive guide for understanding geographic data, mapping, and conducting spatial analysis in R. Likely, the most relevant chapters for your purposes are 1-8, 10-11. A chapter might take you 1-3 hours to work through, depending on how in depth you want to get and the number of exercises that you complete.
<!-- - DataCamp also has courses on spatial analysis using R, notably [Spatial Analysis in R with sf and raster](https://www.datacamp.com/courses/spatial-analysis-in-r-with-sf-and-raster) (4 hours). It covers -->
<!-- some of the material from Geocomputation in R in less depth, and gives you fewer tools for hooking into the multiple geospatial data systems available in R. Take this if you are more time-limited and are going to do less in terms of spatial statistics. -->
- Data Carpentry has a [course on using R for spatial data](https://datacarpentry.org/geospatial-workshop/). Like other \*Carpentry lessons its designed as a workshop lesson plan but can be self-taught. It presumes very little R knowledge at all, and includes stuff like setting a project in RStudio. This is a good place to start people or students with little R experience to get them making maps right away. If you just want to get a quick feel for R spatial data types, jump into Chapter 3.
- [Making Maps with R](http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html) is a quick-start guide to mapping with **ggplot2**. It also introduces the **gmap**, **maps**, and **mapdata** packages for providing basemaps on which to overlay your spatial data. It is good for getting a map together quickly but if you are going to be doing things on a regular basis we suggest the resources above, which give you a better foundation on geographic data.
- [Leaflet for R](https://rstudio.github.io/leaflet/) is a manual on the use of the R **leaflet** package to harness [Leaflet](https://leafletjs.com/), an open-source JS library for creating _interactive_ maps. Leaflet maps particularly useful for exploring and visualizing spatial data, and are easily embedded into R Markdown documents. You should take a course or have knowledge of R Markdown prior to taking this course.
### Bioinformatics
- Conceptual and practical introduction to some of the main topics in Bioinformatics.
## Metagenomics
- [Quality control](#quality-control)
- [Assembly](#assembly-of-high-throughput-data)
- Files and formats in high-throughput sequencing (In development)
- [Pairwise sequence alignment](#seqs_aln)
- Read alignment (In development)
- Multiple Alignment (In development)
- Variant Identification (In development)