-
Notifications
You must be signed in to change notification settings - Fork 11
/
google-auth.Rmd
243 lines (145 loc) · 14.7 KB
/
google-auth.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
# Google Authorization and R
**IMPORTANT** DO NOT ADD SENSITIVE FILES TO A GITHUB REPO UNTIL THEY ARE ENCRYPTED!
EcoHealth Alliance sometimes uses Google Drive, and Google Sheets in particular, to store and collaborate on data. Working with Google Drive-based files in R is relatively painless thanks to the [`googledrive`](https://googledrive.tidyverse.org/), [`googlesheets4`](https://googlesheets4.tidyverse.org/index.html), and [`gargle`](https://gargle.r-lib.org/index.html) packages.
What is less straightforward is working with drive based files without having to manually authenticate your identity. In this chapter, we will walk through the process of creating credentials with API access that can be used in your R project or package. Ultimately, this could allow you to fully automate your Google-centric data pipelines.
## The basic overview:
0) Create or store something (sheet, csv, doc, etc.) in Google drive
1) Create a *Google Cloud project* to manage Google services\
2) Enable the appropriate APIs for the project so it can access things like Drive and Sheets.
3) Create a *service account* so that you can access the APIs via credentials from R
4) Encrypt credentials then add them to your R project so that you can still use `git`-based workflows without leaking access to your service account
5) Share Google-based resources with the service account and check that credentials work as expected from R
6) Add an encryption key to Github as an environment variable so that R can access the resources in automated workflows
## Key Terms
- **Authentication** - Confirms the identity of an entity
- **Authorization** - Permits an entity to do something
- **Auth** - shorthand for authentication or authorization
- **Key** or **Token**- Computer-generated credentials that allow for authorization and authentication. In the R-Google universe *key* and *token* are synonyms, though not all services use this way, at at times okens and keys are communicated over the web using different method. You will see these terms used interchangeably in tutorials.
- **Symmetric encryption** - A type of encryption that uses a single key to encrypt and decrypt an object
- **Asymmetric encryption** - A type of encryption that uses public and private keys to encrypt and decrypt an object
- **Environment variable** - A value stored in a computer's *system environment*. In R, this generally means values stored in the `.Renviron` file, which can be brought into your project using `Sys.getenv("Variable_Name")` and are very useful for storing sensitive information like tokens and keys.
- **Service** - Functionality provided by another system i.e. serving data via an API.
- **GCP** - Google cloud platform. Web services from Google.
## Before we start:
*Note*: The preferred method for adding encryption to projects is via `git-crypt`. See chapter \@ref(encryption) for more on `git-crypt`.
*Credit*: This chapter largely follows the\
[non-interactive auth vignette from the Gargle R package](https://gargle.r-lib.org/articles/non-interactive-auth.html), but diverges for package and non-package focused projects.
*What about Billing?*: Good question. This is not an issue for Google Sheets or Drive APIs but you do need a linked billing account for BigQuery and Maps APIs. If you're new to GCP as of 29 Sept 2021 you get \$300 of credits in the Free Tier. If you use the \$300 in credits GCP will ask for consent before billing. Check with the Data Librarian about using and billing arrangements beyond this.
## Setup encryption tools on your machine
See chapter \@ref(encryption). This process will take ~30 mins and involves using the command line.
## Enable git-crypt on your repository
Enabling git-crypt requires your code to be stored in a git-backed repository. See chapter \@ref(versioning) for setting up git repositories.
Enabling `git-crypt` happens from the command line. You can access the [command line directly in Rstudio](https://support.rstudio.com/hc/en-us/articles/115010737148-Using-the-RStudio-Terminal-in-the-RStudio-IDE). If you use rstudio projects and the command line in Rstudio, then the terminal should open in the repo you want to encrypt.
In the command line run:
```
# check that you are in the directory with the repo
pwd
## /path/to/repo-i-want-to-encrypt
# if you're in the wrong directory, use cd to navigate to the correct repo
# cd /path/to/repo-i-want-to-encrypt
git-crypt init
```
Next you want to tell git-crypt which files should be encrypted. To do this, create a file in the top level directory of your repo called `.gitattributes`. Here you will list the files and folders you would like to encrypt. Each item should be placed a on separate line. To learn more about pattern matching in the `.gitattributes` file, see the [git-crypt read.me](https://github.com/AGWA/git-crypt#gitattributes-file) and [gitignore manual](https://git-scm.com/docs/gitignore#_pattern_format)
Your `.gitattributes` file might look something like this:
```
.env filter=git-crypt diff=git-crypt
auth/** filter=git-crypt diff=git-crypt
.gitattributes !filter !diff
```
The `.env` file will be used to store environment variables and the `auth/` folder will be used to store keys. Do NOT encrypt the `.gitattributes` file. It maybe a good idea to add your google auth key explicitly to the `.gitattributes` file using this format `**/mySecretKey.json` so that the key is encrypted independent of the directory it is in.
Next add yourself and other users who will require access to the encrypted files to the repo. The encryption chapter [encryption][Allow contributors access to encrypted files] in the handbook details how to add contributors to the repo.
Finally, you will have to set up a symmetric key for github actions to use. This key can always be regenerated so there is no reason to store it. Additional detail can
be found in the encryption chapter [here][Use a symmetric key for automated processes].
First, add the git-crypt key to your `.gitignore` so you don't accidentally commit it to the repo.
```
.Rproj.user
.Rhistory
.Rdata
.httr-oauth
.DS_Store
inst/.DS_Store
git_crypt_key.key
```
In the command line run:
```
# create the symmetic key
git-crypt export-key git_crypt_key.key
# convert it from binary to bas64 so github can use it
# the file's contents can now be pasted into a github secret
# environment variable
cat git_crypt_key.key | base64 | pbcopy
```
Paste the key into [Github's secret environment variable field](https://docs.github.com/en/actions/security-guides/encrypted-secrets) as `GIT_CRYPT_KEY64`.
Now delete the key file:
```
# doesn't have to be done in terminal but you've already got it open
rm -i git_crypt_key.key
```
You have now setup asymmetric encryption for human users and symmetric encryption for automations. The next steps involve getting the files you want to encrypt.
## Setting up non-interactive authentication for Google sheets
In this section, we will walk through setting up credentials that can be used in R to access Google sheets without manual authentication. To achieve non-interactive authorization, we want to either provide a *token* directly to a service or make a token discoverable for a service. A token is essentially a long password, designed to be exchanged by machines but too long and complexly formatted to be used by people, and often time-limited. Remember that tokens, secrets, and API keys should be stored in a secure fashion (NOT stored in the text of your code or in unencrypted files).
We are going to follow the recommended (as of 29 September 2021) strategy of providing a *service account key* directly to handle authorization. A newer approach called "workload identity federation" exists as of writing but is not fully implemented in the `gargle` package.
### Create a Google cloud platform project
"Google Cloud projects form the basis for creating, enabling, and using all Google Cloud services including managing APIs, enabling billing, adding and removing collaborators, and managing permissions for Google Cloud resources." - [*GCP Docs*](https://cloud.google.com/resource-manager/docs/creating-managing-projects)
We will use a GCP project to access the Google sheets API via a service account. You do not need a profound understanding of GCP projects to setup a service account.
- Setup/view your [Google cloud account](https://console.cloud.Google.com/)
- Create a project on Google cloud to hold your credentials ![gcp console](./assets/gcp_console.png) ![gcp project init](./assets/gcp_create_proj.png)
- The GCP console is your destination for monitoring and modifying your projects
### Enable APIs
GCP Projects are centered on the idea that a single project will contain a single application. In our case, the application we are creating relies on the Google Sheets API. You can enable API's for our application to access via the APIs & Services menu item.
- In the left side menu, navigate to APIs & Services > Library ![naviate to Library](./assets/gcp_nav_lib.png)
- Choose your api of interest. For this example it is Google sheets. ![find sheets](./assets/gcp_find_sheets.png)
- Enable the api of interest. *If you need to enable more API's later you can always come back*.![enable sheets](./assets/gcp_enable_sheets.png)
### Create a Service Account
Service accounts allow applications, like the GCP project we make, to access certain resources they need via authorized API calls. The service account's access can be limited such that it can only access specific resources in a certain way.
Importantly, service accounts are not part of the EHA workspace domain. You have to manually share resources like Google sheets with a service account even if you have provided domain-level access.
[GCP Service Account Docs](https://cloud.google.com/iam/docs/service-accounts)
- Navigate back to your project homepage
- In the left sidebar go to IAM & Admin > Service Accounts ![gcp_nav_service](assets/gcp_nav_service.png)
- Click create service account
- Give it a good name and description ![gcp_create_service_acnt](./assets/gcp_create_service_acnt.png)
- For Google sheets, we do not need to assign our account service a role
- Roles can be established to perform tests and otherwise manage the service but are not necessary
- Also not necessary to grant user access for this example
- You may have a need for this with more complicated services
- It may also be a good idea to get some redundancy in your workflow ![gcp_service_setup](./assets/gcp_service_setup.png)
### Create a Key for your service account
Keys for Google service accounts are stored in JSON files. Remember that this key will hold very sensitive information and we should treat it like a username and password combo.
- Click on the appropriate item in the service accounts table. *Notice that it says no key*.![gcp_nav_key](./assets/gcp_nav_key.png)
- Click on the keys tab, then click on **ADD KEY** ![gcp_add_key](./assets/gcp_add_key.png)
- Select create new key and download the JSON file. **WARNING:** Do not store this unencrypted key in a shared location (Dropbox, Google Drive, folder connected to a git repository). If you are using the `git-crypt` workflow, add the file to the "./auth" folder in your project's working directory. ![gcp_get_key](./assets/gcp_get_key.png)
- You should now see that there is an active key associated with your service account in the GCP project.
### Share the Google drive resources of interest with the service account
- **Make sure your sheet is shared with the service account**
- Remember that service accounts do not belong the EHA domain so there will be extra prompts when sharing resources.
## General approach to securely managing keys
This workflow uses the encryption practices laid out in the [encryption chapter][Encryption]. Make sure you have setup your computer with encryption tools
before proceeding. Reach out to the `r params$data_librarian` with questions.
1) Unencrypted files storing keys for the Google service account should NOT be stored in shared or public locations (Drive, Dropbox, Github Repo)
- If possible store in an encrypted volume. Keybase, Bitwarden, or other credential management storage systems generally allow you to store files in an encrypted manner.
2) Files storing keys for the Google service account only ever enter the project working directory after being encrypted or after git-crypt has been initiated and the file is included in the `.gitattributes` file or stored in an encrypted folder like `./auth`. See the [encryption chapter][Set up encryption for a repo that did not previously use git-crypt.] for more details
### Provide a service account key for projects
Now you have a secret key for the service account and need to securely access it in an R project.
If you setup `git-crypt`, make sure the file is saved in an appropriate location (e.g. `./auth`) and that all the users who need to use the encrypted file are [added to the repo][Allow contributors access to encrypted files].
#### Run your code using encrypted keys
For `git-crypt` users, the file will already be decrypted on your machine. If it isn't, run `git-crypt unlock` in the terminal. If that does not work double check that your public key has been added to the repo.
```{r eval=FALSE}
jsonChar <- readr::read_file("./auth/myKey.json")
googlesheets4::gs4_auth(path = jsonChar)
```
#### Read in your sheet
- *Make sure your sheet is shared with the service account if it hasn't been already*
```{r eval=FALSE}
googlesheets4::read_sheet("https://docs.Google.com/spreadsheets/that/i/definitely/shared/with/the/service-account")
```
#### Add Environment Variable to other services
Non-interactive authentication allows us to automate workflows that involve Google sheets on services like Github Actions. In either case, it is relatively straight forward to securely add our `SERVICE_ACCOUNT_PASSWORD` environment variables to those services.
See documentation here:
- [Add secrets to Github Actions](https://help.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets)
## Additional Resources
### Google Drive
The [`googledrive`](https://googledrive.tidyverse.org/) package allows you to interact with files stored in Google drive from R. You can download, share, delete, copy, publish and otherwise manipulate files on your drive using this package.
Package vignettes can be found [here](https://googledrive.tidyverse.org/articles/index.html)
### Google Sheets
The [`googlesheets4`](https://googlesheets4.tidyverse.org/index.html) package allows you to directly interact with Google sheets in R. You can read, write, reformat, create formulas, and otherwise manipulate the sheet of interest.
Package vignettes can be found [here](https://googlesheets4.tidyverse.org/articles/index.html)