-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #219 from UI-Research/iss214-ec2-code
Iss214 ec2 code
- Loading branch information
Showing
4 changed files
with
109 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
## Steps to Run the Fact Sheet Code | ||
### Gabe Morrison | ||
### 2023-07-10 | ||
|
||
#1 Spin up a large EC2 instance using the elastic analytics Launch template: | ||
- Gabe recommends c6a.32xlarge | ||
- Note that you may not have credentials to do this and may need to reach out to the AWS Goverance team for assistance (use the ec2 intake form: https://app.smartsheet.com/b/form/2c9200302b9941cebc0b61e945653f48) | ||
- This step may not be necessary if an instance has already been spun up and is currently "off". If this is the case, instead of taking the actions above, you will need to (1) ssh into the instance and (2) rerun the rocker docker image. More guidance on how to do this can be found on the AWS Governance Confluence page. | ||
|
||
#2 Connect to the EC2 instance. You can only connect to an RStudio GUI running on the instance from a remote desktop so you should: | ||
- Go to Urban Users (or another remote desktop) | ||
- Go to the link that is sent to you in the email by [email protected]. The link is also ec2-[IPV4 address separated by hypens. Ex: 001-002-003-004].compute-1.amazonaws.com:8787 | ||
- Log in using the credentials shared in the email you receive | ||
|
||
#3 Clone the pages repo: | ||
- Run `git clone https://github.com/UI-Research/gates-mobility-metrics-pages` in the terminal | ||
- Depending on the status of the repo, you may need to checkout the most updated branch with: | ||
- `cd gates-mobility-metrics-pages` | ||
- `git checkout -b [branch name]` | ||
- `git config --global user.email [your email]` | ||
- `git config --global user.name [Your name]` | ||
- `git pull origin [branch name]` | ||
- You may have to resolve merge conflicts: | ||
- `git add -u` | ||
- `git commit -m "updating local branch with remote of issXXX"` | ||
|
||
#4 Run the `update-quarto.sh` script: | ||
- Gabe finds that this works most successfully if you run this line by line | ||
- Note that you will need to enter `Y` and `y` to the prompts in the terminal | ||
- You should restart R after doing this. Go Session > Quit Session. Then Click Open New Session | ||
|
||
#5 Open the `/gates-mobility-metrics-pages/` R project | ||
- Gabe uses the file directory in the bottom right of RStudio | ||
- Run the `R/startup.R` script. Again, for whatever reason, Gabe finds this works only if you run the code line-by-line. | ||
|
||
#6 Run the commented out test-code in `create_standard_pages.R`. This should create 4 pages for each of the cities and counties. | ||
- A common error in this workflow is: | ||
`Quitting from lines 172-1 (index-county.qmd) Error: ! The inline value cannot be coerced to character: title` | ||
- This comes from quarto and the packages not being updated. Ensure you restart R and have all packages installed. With this in place, Gabe finds the code runs successfully. | ||
|
||
#7 Once the small tests work successfully, you can run the code under "#For actual run". This process takes about 1 - 1.5 hours. | ||
- If you use an instance other than the c6a.32xlarge you will need to change to the NCORES value based on that instance. Note that RStudio cannot handle more than 125 simultaneous connections, so it is likely not beneficial to scale up larger than that instance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
--- | ||
title: "test_copy" | ||
format: html | ||
editor: visual | ||
--- | ||
|
||
This quarto document is intended to test whether all of the files created on an ec2 instance were successfully copied from the volume on that instance to S3. The script that performs the copy operation is `aws_cp_command.sh`. | ||
|
||
The first two tests are intended to be run *after* the files are copied and can be run locally (as long as the user running the commands has CLI access to AWS). The last test is intended to be run on the EC2 instance to ensure that correct number of files were generated. | ||
|
||
## Test 1: Count the number of files in | ||
|
||
```{r} | ||
library(aws.s3) | ||
library(tidyverse) | ||
``` | ||
|
||
This script creates a `count.csv` file that has a column "count" with the first variable being the number of counties in the s3 bucket and the second bing the number of places. | ||
|
||
```{bash} | ||
echo "count" > count.csv | ||
aws s3 ls s3://mobility-metrics-data-pages-dev/999_county-pages/ | wc -l >> count.csv | ||
aws s3 ls s3://mobility-metrics-data-pages-dev/998_place-pages/ | wc -l >> count.csv | ||
``` | ||
|
||
```{r} | ||
library(tidyverse) | ||
count <- read_csv("count.csv") | ||
counties_count <- count[1, 1] %>% | ||
pull(count) | ||
places_count <- count[2, 1] %>% | ||
pull(count) | ||
stopifnot(counties_count == 3143) | ||
stopifnot(places_count == 486) | ||
``` | ||
|
||
##Test 2: | ||
This test looks to see that the last file that was copied in the county and place sub-directories. This is more of a "sniff test" but the dates and times should make sense based on the current run, and the last files should be in late folders (e.g. state fips codes should be 56). | ||
|
||
|
||
```{bash} | ||
aws s3 ls s3://mobility-metrics-data-pages-dev/999_county-pages/ --recursive | sort | tail -n 1 | ||
``` | ||
|
||
```{bash} | ||
aws s3 ls s3://mobility-metrics-data-pages-dev/998_place-pages/ --recursive | sort | tail -n 1 | ||
``` | ||
|
||
\`\``## Test 3: To ensure that all of the files are created, we can count the number of files called`index.html`on the EC2 instance in each of the sub-directories. Specifically, you can run: | ||
|
||
`find factsheets/999_county-pages/ -type f -name 'index.html' \| wc -l\` | ||
|
||
`find factsheets/998_place-pages/ -type f -name 'index.html' | wc -l` |