Skip to content

Commit

Permalink
Merge pull request #18 from UI-Research/iss17
Browse files Browse the repository at this point in the history
Update with UI-Research vs. UrbanInstitute content
  • Loading branch information
erika-tyagi authored Jul 31, 2024
2 parents 10ed462 + bbbc72b commit 6fa9530
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 13 deletions.
8 changes: 5 additions & 3 deletions _freeze/git-y-drive/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
{
"hash": "3ef63c3b28d5219cb6db3002ff7de3f1",
"hash": "dfd245fb1ee80a41ba0cd89b7b21aa53",
"result": {
"markdown": "---\ntitle: Git and GitHub with Confidential Data \nsubtitle: How to Use Git and GitHub with Confidential Data on the Y Drive (Ares Drive) \n---\n\n\nSo you've learned about the power of Git and GitHub and now want to use it in your projects. That's great news! But if your project data is stored on the Y Drive (aka `ares`) for confidentiality reasons, you'll have to jump through a few extra hoops. Below are instructions on one way to set up a GitHub repo to play nicely with the Y Drive.\n\n::: {.callout-important}\nIn some cases, using Git and GitHub with confidential data is **altogether not allowed** due to strict data use agreements or other contractual requirements. If you aren't sure if you can use Git and GitHub with your data, please reach out to Urban's security team and/or ask in the #github Slack channel. \n:::\n\n## The big picture\n- Store all data directly on the Y Drive\n- Store all scripts in a Git repo\n- Create and clone the repo on the virtual machine (e.g. Urban Users PGP or Win 10 Secure) local drive \n- Keep that repo with the scripts completely separate from the Y Drive folder with the data\n- Within your scripts, always read in data using the direct Y Drive filepaths and write out intermediate/final data files using direct Y Drive filepaths \n\nThis means your scripts will live in a separate place (usually the Documents folder of the virtual desktop) from your data (somewhere in the Y Drive) and only your scripts will be on Git/GitHub. So while changes to your scripts will be tracked with all the power of GitHub, your data will not. \n\nIf your project only has non-sensitive data files, it is best practice to put the data and scripts directly in the same Git repo as that makes it easier to work with collaborators and quickly understand project structures. But if you work with confidential data, you can't put your data in Git/GitHub for security reasons (and therefore won't get the power of Git and file tracking for the data).\n\nNow that's out of the way, how do you setup Git/GitHub with your confidential data?\n\n## Set up the Y Drive folder (for data)\n\n1. Use the [intake form](https://explorer.urban.org/page/2959) to request a new folder for your project and give the appropriate staff access.\n2. Securely move the project data files into the assigned Y Drive folder. Inside the folder, we recommend the following folder structure:\n\n ```\n - projectfolder/\n - data/\n - raw-data/\n - intermediate-data/\n - final-data/\n ```\n\nThe data you move to the Y Drive should go into the `data/raw-data` folder and manipulated data can go into the other folders.\n\nNote that because the data files aren't in the Git repo, it will be possible for members of your team to accidentally overwrite the intermediate/final data files in the Y Drive if multiple people run the same script. For this reason, we suggest a) keeping the data files inside `raw-data` untouched and never programmatically writing to that folder and b) keeping a clear script copy of how you transform the files inside `raw-data` so that you can easily regenerate intermediate and final data files.\n\n## Set up the Git repo (for scripts)\n\n1. Use a virtual desktop to open Win10 Secure or PGP. Note that Win10 Secure has SAS installed while PGP does not. PGP might be more powerful/have more memory if you have computationally expensive code. For additional information, see [Urban's Overview of Computing Resources](https://explorer.urban.org/page/2729). \n2. If necessary, install GitHub Desktop from this link: <a href=\"https://desktop.github.com/\" target=\"_blank\">https://desktop.github.com/</a>. \n3. Open up GitHub Desktop and login to your GitHub account. \n4. Click **Add** \\> **Create new repository**. \n5. We recommend initializing the repo with a `README` (with background information/instructions for running your code) and a `.gitignore` file to tell Git not to track certain files. Here is an example of a strict `.gitignore` file that ignores all files (e.g. data files) except for R, SAS, and Stata scripts. \n\n ```\n # Ignore everything\n *.*\n\n # Include .gitignore and README.md\n !.gitignore\n !README.md\n\n # Include R scripts \n !*.R\n !*.Rproj\n\n # Include SAS scripts \n !*.sas\n\n # Include Stata scripts \n !*.do\n\n ```\n\n6. Very importantly, when selecting a local path, **make sure that the repo is created within the local drive of the virtual machine you are on, and NOT the mapped `Y:` drive**. By default, GitHub Desktop should try to create the repo in `\\\\Ares\\CTX_RedirectedFolders$\\username\\My Documents\\GitHub`. While that may look weird, that is an acceptable filepath and will put the repo in the `My Documents\\GitHub` folder of the virtual desktop. Once you're ready, select **Create Repository**.\n5. This should create the repository in your selected location on the Windows Explorer. If you need help finding the exact location on your computer, go to GitHub Desktop, select **Repository** in the top left and then **Show in File Explorer**.\n6. Click **Publish Repo** in the top right of GitHub Desktop to send the repo to GitHub.com \n7. Inside the Git repo (i.e., folder) we suggest the following folder structure:\n\n ```\n README.md\n scripts/\n somescript.R\n someotherscript.R\n ```\n\n Where the `scripts` folder contains all code for the project. Feel free to create subfolders as you see fit. \n\n8. Make changes to your project, like updating the README, or adding scripts. Then `commit` the changes on GitHub Desktop and `push` the changes. \n\n9. Each member of your project team will need to `clone` the repo into on their respective virtual machines (Urban Users PGP or Win 10 Secure). Again, do NOT clone the repo into the mapped Y Drive.\n\n## Write scripts and commit to Git/GitHub \n\nNow you can write scripts as you normally would. The only caveat is that all data you read in/write out will have to be to the mapped Y Drive path. Below is an example R script that shows how to do this. Note that these filepaths will only work when the scripts are run from a virtual machine that has Urban network access to the Y Drive (i.e., the two virtual machines specified above).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n\n# --- Read in Data ---\n\n## We recommend using the full filepath which starts with\n## `//ares/UI_Projects2/CENTER` as this will never change. Note you will need to\n## replace CENTER and projectfolder with your respective values.\nraw_data <- read_csv(\"//ares/UI_Projects2/CENTER/projectfolder/data/raw-data/ex.csv\")\n\n## Another acceptable filepath uses the \"Y:\" drive mapping built into \n## virtual desktops. This could change in the future.\nraw_data <- read_csv(\"Y:/CENTER/projectfolder/data/raw-data/ex.csv\")\n\n\n# --- Write out Data ---\nraw_data %>%\n write_csv(\"//ares/UI_Projects2/CENTER/projectfolder/data/intermediate-data/ex-cleaned.csv\")\n```\n:::\n",
"supporting": [],
"markdown": "---\ntitle: Git and GitHub with Confidential Data \nsubtitle: How to Use Git and GitHub with Confidential Data on the Y Drive (Ares Drive) \n---\n\n\nSo you've learned about the power of Git and GitHub and now want to use it in your projects. That's great news! But if your project data is stored on the Y Drive for confidentiality reasons, you'll have to jump through a few extra hoops. Below are instructions on one way to set up a GitHub repo to play nicely with the Y Drive.\n\n::: {.callout-important}\nIn some cases, using Git and GitHub with confidential data is **altogether not allowed** due to strict data use agreements or other contractual requirements. If you aren't sure if you can use Git and GitHub with your data, please reach out to Urban's security team and/or ask in the #github Slack channel. \n:::\n\n## The big picture\n- Store all data directly on the Y Drive\n- Store all scripts in a Git repo\n- Create and clone the repo on the virtual machine (e.g. Urban Users PGP or Win 10 Secure) local drive \n- Keep that repo with the scripts completely separate from the Y Drive folder with the data\n- Within your scripts, always read in data using the direct Y Drive filepaths and write out intermediate/final data files using direct Y Drive filepaths \n\nThis means your scripts will live in a separate place (usually the Documents folder of the virtual desktop) from your data (somewhere in the Y Drive) and only your scripts will be on Git/GitHub. So while changes to your scripts will be tracked with all the power of GitHub, your data will not. \n\nIf your project only has non-sensitive data files, it is best practice to put the data and scripts directly in the same Git repo as that makes it easier to work with collaborators and quickly understand project structures. But if you work with confidential data, you can't put your data in Git/GitHub for security reasons (and therefore won't get the power of Git and file tracking for the data).\n\nNow that's out of the way, how do you setup Git/GitHub with your confidential data?\n\n## Set up the Y Drive folder (for data)\n\n1. Use the [intake form](https://explorer.urban.org/page/2959) to request a new folder for your project and give the appropriate staff access.\n2. Securely move the project data files into the assigned Y Drive folder. Inside the folder, we recommend the following folder structure:\n\n ```\n - projectfolder/\n - data/\n - raw-data/\n - intermediate-data/\n - final-data/\n ```\n\nThe data you move to the Y Drive should go into the `data/raw-data` folder and manipulated data can go into the other folders.\n\nNote that because the data files aren't in the Git repo, it will be possible for members of your team to accidentally overwrite the intermediate/final data files in the Y Drive if multiple people run the same script. For this reason, we suggest a) keeping the data files inside `raw-data` untouched and never programmatically writing to that folder and b) keeping a clear script copy of how you transform the files inside `raw-data` so that you can easily regenerate intermediate and final data files.\n\n## Set up the Git repo (for scripts)\n\n1. Use a virtual desktop to open Win10 Secure or PGP. Note that Win10 Secure has SAS installed while PGP does not. PGP might be more powerful/have more memory if you have computationally expensive code. For additional information, see [Urban's Overview of Computing Resources](https://explorer.urban.org/page/2729). \n2. If necessary, install GitHub Desktop from this link: <a href=\"https://desktop.github.com/\" target=\"_blank\">https://desktop.github.com/</a>. \n3. Open up GitHub Desktop and login to your GitHub account. \n4. Click **Add** \\> **Create new repository**. \n5. We recommend initializing the repo with a `README` (with background information/instructions for running your code) and a `.gitignore` file to tell Git not to track certain files. Here is an example of a strict `.gitignore` file that ignores all files (e.g. data files) except for R, SAS, and Stata scripts. \n\n ```\n # Ignore everything\n *.*\n\n # Include .gitignore and README.md\n !.gitignore\n !README.md\n\n # Include R scripts \n !*.R\n !*.Rproj\n\n # Include SAS scripts \n !*.sas\n\n # Include Stata scripts \n !*.do\n\n ```\n\n6. Very importantly, when selecting a local path, **make sure that the repo is created within the local drive of the virtual machine you are on, and NOT the mapped `Y:` drive**. By default, GitHub Desktop should try to create the repo in `\\\\Ares\\CTX_RedirectedFolders$\\username\\My Documents\\GitHub`. While that may look weird, that is an acceptable filepath and will put the repo in the `My Documents\\GitHub` folder of the virtual desktop. Once you're ready, select **Create Repository**.\n5. This should create the repository in your selected location on the Windows Explorer. If you need help finding the exact location on your computer, go to GitHub Desktop, select **Repository** in the top left and then **Show in File Explorer**.\n6. Click **Publish Repo** in the top right of GitHub Desktop to send the repo to GitHub.com \n7. Inside the Git repo (i.e., folder) we suggest the following folder structure:\n\n ```\n README.md\n scripts/\n somescript.R\n someotherscript.R\n ```\n\n Where the `scripts` folder contains all code for the project. Feel free to create subfolders as you see fit. \n\n8. Make changes to your project, like updating the README, or adding scripts. Then `commit` the changes on GitHub Desktop and `push` the changes. \n\n9. Each member of your project team will need to `clone` the repo into on their respective virtual machines (Urban Users PGP or Win 10 Secure). Again, do NOT clone the repo into the mapped Y Drive.\n\n## Write scripts and commit to Git/GitHub \n\nNow you can write scripts as you normally would. The only caveat is that all data you read in/write out will have to be to the mapped Y Drive path. Below is an example R script that shows how to do this. Note that these filepaths will only work when the scripts are run from a virtual machine that has Urban network access to the Y Drive (i.e., the two virtual machines specified above).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n\n# --- Read in Data ---\n\n## We recommend using the full filepath which starts with\n## `//ares/UI_Projects2/CENTER` as this will never change. Note you will need to\n## replace CENTER and projectfolder with your respective values.\nraw_data <- read_csv(\"//ares/UI_Projects2/CENTER/projectfolder/data/raw-data/ex.csv\")\n\n## Another acceptable filepath uses the \"Y:\" drive mapping built into \n## virtual desktops. This could change in the future.\nraw_data <- read_csv(\"Y:/CENTER/projectfolder/data/raw-data/ex.csv\")\n\n\n# --- Write out Data ---\nraw_data %>%\n write_csv(\"//ares/UI_Projects2/CENTER/projectfolder/data/intermediate-data/ex-cleaned.csv\")\n```\n:::\n",
"supporting": [
"git-y-drive_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
Expand Down
24 changes: 16 additions & 8 deletions git-faqs.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,36 @@ subtitle: Questions about Git and GitHub at the Urban Institute and Additional R
---

## FAQs
* **Git and GitHub seem a little scary. How do I get started?**

#### Git and GitHub seem a little scary. How do I get started? {.unnumbered}
Start by using these tools for solo work. It is the lowest stakes way to develop skills that will be valuable for collaboration.

* **Where can I go for help if I get stuck?**
#### Where can I go for help if I get stuck? {.unnumbered}
Running into errors is an inevitable part of working with Git and GitHub, but we're here to help! Drop a message in the <a href="https://theurbaninstitute.slack.com/archives/C6J9AALDR" target="_blank">#github Slack channel</a> if you run into issues.

* **Can I use Git and GitHub with projects that have confidential data stored on the Y Drive?**
#### When should I use UrbanInstitute vs. UI-Research? {.unnumbered}
UI-Research is considered and should be treated as the default organization for all Urban GitHub repositories. UI-Research uses the “Team” plan, while UrbanInstitute is on the Free tier. See the GitHub [documentation](https://github.com/pricing#compare-features) for more details on features.

The most common use-case for UrbanInstitute is for sharing source code of data tools on Urban’s website. For example in this [tool](https://apps.urban.org/features/medical-debt-over-time/), you can find a link to the GitHub repository for the project at the bottom of the page.

If you are sharing code publicly and have questions about review prior to release, use the [Code Review form](https://tech-tools.urban.org/code-review-form/).

#### Can I use Git and GitHub with projects that have confidential data stored on the Y Drive? {.unnumbered}
Yes! Read <a href="https://ui-research.github.io/reproducibility-at-urban/git-y-drive.html" target="_blank">this guide</a> for tips on how to do so.

* **Should I use Git inside of Box? (i.e. turn a Box folder into a Git repository)**
#### Should I use Git inside of Box? (i.e. turn a Box folder into a Git repository) {.unnumbered}
No – we strongly recommend keeping Box folders and GitHub repositories separate.

* **What kinds of files should I track with Git?**
#### What kinds of files should I track with Git? {.unnumbered}
Generally, you should only track code (i.e., scripts written in R, Stata, SAS, Python, etc.) with Git. You should not track large data files or binary files (i.e., Word, Excel files). There are occasions when tracking small data files with Git might make sense, but you should **never** store confidential data on GitHub (even in a private repository). We recommend getting familiar with `.gitignore` <a href="https://www.freecodecamp.org/news/gitignore-what-is-it-and-how-to-add-to-repo/" target="_blank">files</a>, which can prevent you and your collaborators from accidentally pushing files to GitHub. GitHub provides hundreds of template `.gitignore` files for specific programming languages (e.g. <a href="https://github.com/github/gitignore/blob/main/R.gitignore" target="_blank">R</a>, <a href="https://github.com/github/gitignore/blob/main/Global/Stata.gitignore" target="_blank">Stata</a>, or <a href="https://github.com/github/gitignore/blob/main/Python.gitignore" target="_blank">Python</a>) or operating systems (e.g. <a href="https://github.com/github/gitignore/blob/main/Global/Windows.gitignore" target="_blank">Windows</a> or <a href="https://github.com/github/gitignore/blob/main/Global/macOS.gitignore" target="_blank">MacOS</a>) that can be useful as a starting point.

* **How do I add documentation to my repositories?**
#### How do I add documentation to my repositories? {.unnumbered}
Use `README` <a href="https://www.freecodecamp.org/news/how-to-write-a-good-readme-file/" target="_blank">files</a> to add context and documentation to your repos. These are <a href="https://www.markdownguide.org/getting-started/" target="_blank">Markdown</a> files in the root of your directory named `README.md` that can help folks understand the structure and contents of your repo. <a href="https://markdownlivepreview.com/" target="_blank">Markdown Live Preview</a> can be a useful tool to help with formatting.

* **What permissions should I give my collaborators?**
#### What permissions should I give my collaborators? {.unnumbered}
In general, you should follow the <a href="https://en.wikipedia.org/wiki/Principle_of_least_privilege" target="_blank">principle of least privilege</a>, which means that you should give folks the minimum level of access they need, but no more than that. In most cases, this will be the **Write** role, but you should refer to GitHub's <a href="https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories/repository-roles-for-an-organization" target="_blank">guide</a> describing the different roles and what they allow. If you need elevated permissions for a repository, post a message in the <a href="https://theurbaninstitute.slack.com/archives/C6J9AALDR" target="_blank">#github Slack channel</a>.

* **Are there other commands that can be helpful when using Git from the command line?**
#### Are there other commands that can be helpful when using Git from the command line? {.unnumbered}
Definitely! Learning a few simple Bash commands can be helpful for navigating the command line. A few common commands include `pwd` (to print your current directory), `cd` (to change directories), and `ls` (to list the files in a directory). For an introduction to these commands and others, we recommend <a href="https://friendly-101.readthedocs.io/en/latest/commandline.html" target="_blank">this guide</a> from Friendly Django or <a href="https://happygitwithr.com/shell.html" target="_blank">this guide</a> from Happy Git with R.

## Additional Resources
Expand Down
Loading

0 comments on commit 6fa9530

Please sign in to comment.