generated from allisonhorst/meds-distill-template
-
Notifications
You must be signed in to change notification settings - Fork 10
/
day2-remote_server.qmd
428 lines (262 loc) · 14 KB
/
day2-remote_server.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
---
title: "Working on a remote server"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
![CERN computing center](img/cern-server-small.jpg){width="70%"}
## Learning Objectives
In this lesson, you will learn:
- How to connect to a remote server
- Get familiar with RStudio server
- Get an introduction to the command line (CLI) & bash
## Why working on a remote machine?
Often the main motivation is to **scale your analysis beyond what a personal computer can handle**. R being pretty memory intensive, moving to a server often provides you more RAM and thus allows to load larger data in R without the need of slicing your data into chunks. But there are also other advantages, here are the main for scientist:
- **Power**: More *CPUs/Cores* (24/32/48), More RAM (256/384GB)
- **Capacity**: More *disk space* and generally *faster storage* (in highly optimized RAID arrays)
- **Security**: Data are spread across *multiple drives* and have nightly *backups*
- **Collaboration**: *shared folders* for code, data, and other materials; same *software versions*
::: callout-warning
***=\> The operating system is more likely going to be Linux!!***
More on this in a few minutes
:::
## Introduction to UNIX and its siblings
UNIX
: Originally developed at AT&T / [Bell Labs](https://en.wikipedia.org/wiki/Bell_Labs) circa 1970. Has
experienced a [long, multi-branched
evolutionary path](https://upload.wikimedia.org/wikipedia/commons/7/77/Unix_history-simple.svg)
POSIX (Portable Operating System Interface)
: a set of specifications of what an OS needs to qualify as "a Unix", to enhance interoperability among all the "Unix" variants
### Various Unices
![The unix family tree](img/unix-history-simple.png)
* Linux (Linus Torvalds, 1991)
: is _not_ fully POSIX-compliant, but certainly can be
regarded as [functionally Unix](http://en.wikipedia.org/wiki/Unix-like) <br>
Some popular Linux distributions include Debian, Fedora Linux, Arch Linux, and Ubuntu. There are also commercial distributions such as Red Hat Enterprise Linux and SUSE Linux Enterprise. Android is actually Linux-based!
* OS X
: [is a Unix](http://unix.stackexchange.com/questions/1489/is-mac-os-x-unix)!
### Some Unix hallmarks
* Supports multi-users, multi-processes
* Highly modular: many small tools that do one thing well, and can be combined
* Culture of text files and streams
* Primary OS on HPC (**High Performance Computing** Systems)
* Main OS on which Internet was built
---
## Connecting via IDE - Posit Workbench
From an user perspective, _Posit Workbench is your familiar RStudio interface in your web browser_. The big difference however is that with RStudio Server the computation will be running on the remote machine instead of your local personal computer. This also means that **the files you are seeing through the RStudio Server interface are located on the remote machine. And this also include your R packages!!!** This remote file management is the main change you will have to adopt in your workflow.
To help with remote files management, the RStudio Server interface as few additional features that we will be discussing in the following sections.
```{r RStudio Server, echo=FALSE, out.width='90%', fig.align="center"}
knitr::include_graphics("img/rstudio-server-ide.png")
```
## Connecting to MEDS Analytical Server
::: {.callout-tip}
## Reference
[MEDS computational servers guide](https://bren.zendesk.com/hc/en-us/articles/8727865050900-MEDS-Computational-Server-Guide)
:::
1. Got to: <https://workbench-1.bren.ucsb.edu>
2. Enter your credentials
3. You are in!
```{r echo=FALSE}
knitr::include_graphics("img/rstudio-server_login.png")
```
4. Click on the `New Session` button. You can see that you are able to start both an R (Studio) and jupyter notebook session. Let's take a few minutes to experiment with the different options.
For this session, we are going to select the `RStudio` option and hit `Start Session`.
```{r echo=FALSE}
knitr::include_graphics("img/rstudio-server_session.png")
```
**You should now see a very familiar interface :) Except it is running on the server with a lot of resources at your fingertips!!**
### File structure
Let's explore explore a little bit the file structure on the server. By default on a Linux server, you are located in the `home` folder. This folder is only accessible to you and it is where you can store your personal files on a server. You should see 2 folders: `R` and `H`
```{r echo=FALSE}
knitr::include_graphics("img/rstudio-server_file-structure.png")
```
The `R` folder is where your local R packages will be installed, you can ignore it. The `H` is your H drive that the Bren School is offering to all its students. If you click on it you should see any files you have uploaded there.
Let us make a folder named `github` by click on the `New Folder` button at the top of the tab. We will use this folder (also named `directory` in linux/unix terms) to clone any GitHub repository.
### R packages
If we go to the `Packages` tab, we can see a long list of packages that have already be installed by our system administrator (Brad). Those packages have been installed server wide, meaning that all the users have access to them.
```{r echo=FALSE}
knitr::include_graphics("img/rstudio-server_packages.png")
```
A user can also installed her/his own packages. Let's try to install the `remote` package that lets you install R packages directly from GitHub: `install.packages("remotes")`. Once done, note a new section that appeared on the Packages tab named `User Library`. Each of us have now its own copy of the package installed (in this `R` folder we were talking about a few minutes ago).
```{r echo=FALSE}
knitr::include_graphics("img/rstudio-server_packages-user.png")
```
A few notes:
- In this example we will have made a better choice to have the `remotes` package installed once at the system level
- Some R packages depend on external libraries that need to be installed on the server. Those libraries will have to be installed by the system administrator first before you can install the R package
- Installing an R package on a linux machine generally requires compilation of the code and will thus take more time to install than when you install it from pre-compiled binaries
**Look now inside you `R` folder!!**
---
## The Command Line Interface (CLI)
The CLI provides a direct way to interact with the Operating System, by typing in commands.
### Why the CLI is worth learning
* Might be the only interface you have to a High Performance Computer (HPC)
* Command statements can be reused easily and saved as scripts
* Easier automation and text files manipulation
### A little bit of terminology
- `Command Line Interface (CLI)`: This is a user interface that lets you interact with a computer. It is a legacy from the early days of computers. Now a days computers have graphical user interfaces instead (MacOSX, Windows, Linux, ...)
- `Terminal`: It is a an application that lets you run a command line interface. It used to be a physical machine connected to a mainframe computer
- `Shell`: It is the program that runs the command line. There are many different shells, the most common being `bash` (Bourne Again SHell) which is installed by default in many Linux distributions
::: {.callout-tip}
### The pitch
**Not convinced? Check this out: [the CLI pitch](https://eds-214.github.io/eds214-handson-cli/cli-pitch.html)**
:::
### The Terminal from RStudio
You can access the command line directly from RStudio by using the `Terminal` tab next to your R console.
![](img/rstudio-terminal.png)
### Navigating and managing files/directories in *NIX
![](img/unix_filetree.gif)
* `pwd`: Know where you are
* `ls`: List the content of the directory
* `cd`: Go inside a directory
Some pseudo directory names. Wherever you are:
* `~` : Home directory
* `.` : Here (current directory)
* `..`: Up one level (upper directory)
Let's put this into action:
* go to my "Home" directory: `cd ~`
* go up one directory level: `cd ..`
* list the content: `ls`
* list the content showing hidden files: `ls -a` note that `-a` is referred as an option (modifies the command)
More files/directories manipulations:
* `mkdir`: Create a directory
* `cp`: Copy a file
* `mv`: Move a file **it is also how you rename a file!**
* `rm` / `rmdir`: Remove a file / directory **use those carefully, there is no return / Trash!!**
**Note: typing is not your thing? the `<tab>` key is your friend!** One hit it will auto-complete the file/directory/path name for you. If there are many options, hit it twice to see the options.
### Permissions
All files have permissions and ownership.
![File permissions](img/UnixFileLongFormat.png)
* List files showing ownership and permissions: `ls -l`
```bash
brun@workbench-1:/courses/EDS214$ ls -l
total 16
drwxrwxr-x+ 3 brun esmdomainusers 4096 Aug 20 04:49 data
drwxrwxr-x+ 2 katherine esmdomainusers 4096 Aug 18 18:32 example
```
You can change those permissions:
* Change permissions: `chmod`
* Change ownership: `chown`
::: {.callout-tip}
Clear contents in terminal window: `clear`
:::
## General command syntax
* `command [options] [arguments]`
where `command` must be an _executable_ file on your `PATH`
* `echo $PATH`
and `options` can usually take two forms
* short form: `-a`
* long form: `--all`
You can combine the options:
```bash
ls -ltrh
```
What do these options do?
```bash
man ls
```
::: {.callout-tip}
hit `spacebar` to get to the next page of the manual
hit `q` to exit the help
:::
## Getting things done
### Some useful, special commands using the Control key
* Cancel (abort) a command: `Ctrl-c` Note: very different than Windows!!
* Stop (suspend) a command: `Ctrl-z`
* `Ctrl-z` can be used to suspend, then background a process
### Process management
* Like Windows Task Manager, OSX Activity Monitor
* **`top`**, `ps`, `jobs` (hit `q` to get out!)
* `kill` to delete an unwanted job or process
* Foreground and background: `&`
### What about "space"
* How much storage is available on this system? `df -h`
* How much storage am "I" using overall? `du -hs <folder>`
* How much storage am "I" using, by sub directory? `du -h <folder>`
* How much RAM is used? `free -h`
### Existential questions
* What is your username? `whoami`
* What is today's date? `date`
* Where are the programs you are using? `whereis R` also try `which -a python`
* What is the kernel version of your OS: `uname -a`
* How long since last reboot: `uptime`
* Which shell are you using? `echo $0`
### History
* See your command history: `history`
* Re-run last command: `!!`
* Re-run 32th command: `!32`
* Re-run 5th from last command: `!-5`
* Re-run last command that started with 'c': `!c`
## A sampling of simple commands for dealing with files
* `wc` count lines, words, and/or characters
* `diff` compare two files for differences
* `sort` sort lines in a file
* `uniq` report or filter out repeated lines in a file
## Get into the flow, with pipes
![`stdin`, `stdout`, `stderr`](img/pipe_split.png
"http://www.ucblueash.edu/thomas/Intro_Unix_Text/Images/pipe_split.png")
```bash
wc -l *.TXT
wc -l *.TXT | sort -n
```
:::{callout-tip}
- note use of `*` as character wildcard for zero or more matches (same in Mac and Windows)
- `?` matches single character;
:::
Here the `wc -l` command, which counts the number of lines in files,
is given file names on the command line, so it counts the lines in
those files. It writes its output to stdout. `sort -n` sorts lines
in files. It was given no files to sort, so it sorts whatever lines
come in via stdin. By piping these together (i.e., by hooking `wc`'s
stdout to `sort`'s stdin using the pipe operator), the output from `wc
-l` is thereby sorted.
There are various operators for redirecting where stdin comes from and
where stdout and stderr go:
- `< file`: read stdin from file
- `> file`: write stdout to file
- `2> file`: write stderr to file
- `>& file`: write both stdout and stderr to file
- `>> file`: append stdout to file
Let's write the above result to a file:
```bash
wc -l *.TXT | sort -n > csvcount.log
```
Caution: except for `>>`, all forms of `>` are destructive: Bash
overwrites any existing file with an empty file before the program is
run.
Want to get rid of output you don't want to see? Use the Unix black
hole: `>& /dev/null`. (This is a cultural meme, you'll see it on
T-shirts and license plates.)
The above are the main redirections, but there are others.
**To end a session, simply type `exit` or `logout` into the command line**
::: {.callout-note collapse=false}
### Setting git on workbench-1 (if not yet done)
At the **Terminal**:
```{bash, eval=FALSE, echo=TRUE}
git config --global user.name "Jane Doe"
git config --global user.email [email protected]
git config --global credential.helper 'cache --timeout=10000000'
git config --list
```
### Setting GitHub token on workbench-1
At the **R Console**:
```{r, eval=FALSE, echo=TRUE}
# On your laptop
usethis::create_github_token() # This should open a web browser on GitHub
# On workbench-1
gitcreds::gitcreds_set()
usethis::git_sitrep()
```
:::
## [**CLI Practice**](https://github.com/EDS-214/eds214-handson-cli){target="_blank"}
---
## Bonuses
### Hear more about the unix revolution directly from its creators
<https://www.youtube.com/watch?v=tc4ROCJYbm0>
### Vim text editor
This section is optional depending of our progress and interest. Vim is still the default text editor on many Linux distribution and it is good to know about its basics.
[Vim](vim.html)
## Advanced CLI
Want to know more cool tools your can use: [**CLI Advanced**](day2-cli_advanced.html)
## Aknowledgements
This section reuses materials from [NCEAS Open Science for Synthesis (OSS)](https://learning.nceas.ucsb.edu/) intensive summer schools and other training. Contributions to this content have been made by Mark Schildhauer, Matt Jones, Jim Regetz and many others; and from EDS-213 [10 bash essentials](https://github.com/UCSB-Library-Research-Data-Services/bren-meds213-class-data/blob/main/week5/bash-essentials.md) developed by Greg Janée