Skip to content

Commit

Permalink
Merge pull request #219 from UI-Research/iss214-ec2-code
Browse files Browse the repository at this point in the history
Iss214 ec2 code
  • Loading branch information
Deckart2 authored Sep 7, 2023
2 parents c2f1712 + a8d3ad6 commit 1410ed3
Show file tree
Hide file tree
Showing 4 changed files with 109 additions and 0 deletions.
9 changes: 9 additions & 0 deletions R/config.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,12 @@ install.packages("quarto")
install.packages("tidycensus")
install.packages("future")
install.packages("furrr")
install.packages("aws.s3")

if(!(dir.exists("factsheets/999_county-pages"))){
dir.create("factsheets/999_county-pages")
}

if(!(dir.exists("factsheets/998_place-pages"))){
dir.create("factsheets/998_place-pages")
}
1 change: 1 addition & 0 deletions create_standard_pages.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
library(tidyverse)
library(quarto)
library(tidycensus)
library(furrr)

source("R/create_standard_county_df.R")
source("R/create_standard_place_df.R")
Expand Down
42 changes: 42 additions & 0 deletions running-pages-documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
## Steps to Run the Fact Sheet Code
### Gabe Morrison
### 2023-07-10

#1 Spin up a large EC2 instance using the elastic analytics Launch template:
- Gabe recommends c6a.32xlarge
- Note that you may not have credentials to do this and may need to reach out to the AWS Goverance team for assistance (use the ec2 intake form: https://app.smartsheet.com/b/form/2c9200302b9941cebc0b61e945653f48)
- This step may not be necessary if an instance has already been spun up and is currently "off". If this is the case, instead of taking the actions above, you will need to (1) ssh into the instance and (2) rerun the rocker docker image. More guidance on how to do this can be found on the AWS Governance Confluence page.

#2 Connect to the EC2 instance. You can only connect to an RStudio GUI running on the instance from a remote desktop so you should:
- Go to Urban Users (or another remote desktop)
- Go to the link that is sent to you in the email by [email protected]. The link is also ec2-[IPV4 address separated by hypens. Ex: 001-002-003-004].compute-1.amazonaws.com:8787
- Log in using the credentials shared in the email you receive

#3 Clone the pages repo:
- Run `git clone https://github.com/UI-Research/gates-mobility-metrics-pages` in the terminal
- Depending on the status of the repo, you may need to checkout the most updated branch with:
- `cd gates-mobility-metrics-pages`
- `git checkout -b [branch name]`
- `git config --global user.email [your email]`
- `git config --global user.name [Your name]`
- `git pull origin [branch name]`
- You may have to resolve merge conflicts:
- `git add -u`
- `git commit -m "updating local branch with remote of issXXX"`

#4 Run the `update-quarto.sh` script:
- Gabe finds that this works most successfully if you run this line by line
- Note that you will need to enter `Y` and `y` to the prompts in the terminal
- You should restart R after doing this. Go Session > Quit Session. Then Click Open New Session

#5 Open the `/gates-mobility-metrics-pages/` R project
- Gabe uses the file directory in the bottom right of RStudio
- Run the `R/startup.R` script. Again, for whatever reason, Gabe finds this works only if you run the code line-by-line.

#6 Run the commented out test-code in `create_standard_pages.R`. This should create 4 pages for each of the cities and counties.
- A common error in this workflow is:
`Quitting from lines 172-1 (index-county.qmd) Error: ! The inline value cannot be coerced to character: title`
- This comes from quarto and the packages not being updated. Ensure you restart R and have all packages installed. With this in place, Gabe finds the code runs successfully.

#7 Once the small tests work successfully, you can run the code under "#For actual run". This process takes about 1 - 1.5 hours.
- If you use an instance other than the c6a.32xlarge you will need to change to the NCORES value based on that instance. Note that RStudio cannot handle more than 125 simultaneous connections, so it is likely not beneficial to scale up larger than that instance.
57 changes: 57 additions & 0 deletions test_copy.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
title: "test_copy"
format: html
editor: visual
---

This quarto document is intended to test whether all of the files created on an ec2 instance were successfully copied from the volume on that instance to S3. The script that performs the copy operation is `aws_cp_command.sh`.

The first two tests are intended to be run *after* the files are copied and can be run locally (as long as the user running the commands has CLI access to AWS). The last test is intended to be run on the EC2 instance to ensure that correct number of files were generated.

## Test 1: Count the number of files in

```{r}
library(aws.s3)
library(tidyverse)
```

This script creates a `count.csv` file that has a column "count" with the first variable being the number of counties in the s3 bucket and the second bing the number of places.

```{bash}
echo "count" > count.csv
aws s3 ls s3://mobility-metrics-data-pages-dev/999_county-pages/ | wc -l >> count.csv
aws s3 ls s3://mobility-metrics-data-pages-dev/998_place-pages/ | wc -l >> count.csv
```

```{r}
library(tidyverse)
count <- read_csv("count.csv")
counties_count <- count[1, 1] %>%
pull(count)
places_count <- count[2, 1] %>%
pull(count)
stopifnot(counties_count == 3143)
stopifnot(places_count == 486)
```

##Test 2:
This test looks to see that the last file that was copied in the county and place sub-directories. This is more of a "sniff test" but the dates and times should make sense based on the current run, and the last files should be in late folders (e.g. state fips codes should be 56).


```{bash}
aws s3 ls s3://mobility-metrics-data-pages-dev/999_county-pages/ --recursive | sort | tail -n 1
```

```{bash}
aws s3 ls s3://mobility-metrics-data-pages-dev/998_place-pages/ --recursive | sort | tail -n 1
```

\`\``## Test 3: To ensure that all of the files are created, we can count the number of files called`index.html`on the EC2 instance in each of the sub-directories. Specifically, you can run:

`find factsheets/999_county-pages/ -type f -name 'index.html' \| wc -l\`

`find factsheets/998_place-pages/ -type f -name 'index.html' | wc -l`

0 comments on commit 1410ed3

Please sign in to comment.