Skip to content

Commit

Permalink
update repo documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ernestguevarra committed Jun 14, 2024
1 parent befdf16 commit 41450e2
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 26 deletions.
24 changes: 12 additions & 12 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,31 +24,31 @@ For this demonstration, the following scenario describes the use case of the exa

A small project team of 4 people are collaborating on a research project on cause of death (CoD) data. Given the nature of the data, the team's ethical responsibilities and commitments as per their respective institution's regulatory boards include ensuring that the raw CoD data and all of its data derivatives are kept restricted only to authorised research project team members. In addition, raw CoD data and all of its data derivatives are kept encrypted when stored in each of the authorised research project team members' computers. When sharing the data between each other, the research project team members need to ensure that raw CoD data and all of its data derivatives are encrypted on transit and can only be decrypted by authorised research project team members.

Given this, the research project team need to devise a project workflow that will satistfy the encryption requirements while at the same time allowing access of the data to all authorised research project team members. The research project team is using R or their data management, analysis, and reporting and uses GitHub for versioning.
Given this, the research project team needs to devise a project workflow that will satisfy the encryption requirements while at the same time allowing access of the data to all authorised research project team members. The research project team is using R for their data management, analysis, and reporting and uses GitHub for versioning.

## Recommended/suggested workflow in R

The most appropriate tool in the R ecosystem that can support the research project team in fulfilling the requirements for data protection is the [`{cyphr}` package](https://docs.ropensci.org/cyphr/).
The most appropriate tool in the R ecosystem that can support the research project team in fulfilling the requirements for data protection is the [`{cyphr}`](https://docs.ropensci.org/cyphr/) package.

Following is the recommended/suggested R workflow that will meet the requirements for data protection as per the respective institution's regulations.

### Creation of personal SSH keys
### 1. Creation of personal SSH keys

The backbone of this recommended/suggested encryption workflow is the use of personal Secure Shell (SSH) protocol keys. Each authorised research project team member should create their personal SSH keys.

There are plenty of guidance available on the internet on how to do this. This [guide](https://docs.digitalocean.com/products/droplets/how-to/add-ssh-keys/create-with-openssh/) is one of the most straightforward explanations on how to create your personal SSH keys.

Best practice when generating your personal SSH keys is to always generate a **passphrase** to encrypt your private key once it is generated and stored in your computer. Without a passphrase, anyone that can gain access to your computer will also be able use your personal SSH keys.
Best practice when generating your personal SSH keys is to always provide a **passphrase** to encrypt your private key once it is generated and stored in your computer. Without a passphrase, anyone that can gain access to your computer will also be able use your personal SSH keys.

Note that this step should be done on the *command line* or *terminal* and not on R console.

### Create a key for the data and encrypt that key with your personal key
### 2. Create a key for the data and encrypt that key with your personal key

This step is a setup step that should be done by the administrator or the research project team lead or any other research project team member whose role it is to determine who has permissions to access the data.

Other members of the research project team will not need to perform this step.

This step is done through the R console (directory or via an IDE i.e., RStudio) and is facilitated using the `{cyphr}` package (hence, the `{cyphr}` package should be installed prior to doing these steps.
This step is done through the R console (directory or via an IDE i.e., RStudio) and is facilitated using the `{cyphr}` package (hence, the `{cyphr}` package should be installed prior to doing these steps).

For this demonstration, we use a project repository structure where the raw data will be placed within the `data-raw` directory and the processed raw data will be stored within the `data` directory. So, we will setup the key within the root directory of the project repository for clarity and convenience when encrypting and decrypting files within sub-directories.

Expand All @@ -58,7 +58,7 @@ To create a key, the following command should be issued in R:
cyphr::data_admin_init(".", path_user = path_key_admin)
```

where `path_key_admin` is the path to the personal SSH keys generated by the admin or research project team lead. For purposes of this demonstration, let us say that the admin or research project team lead created their SSH key in the default `~.ssh/` directory. So, the command above can be issued as follows instead:
where `path_key_admin` is the path to the personal SSH keys generated by the admin or research project team lead. For purposes of this demonstration, let us say that the admin or research project team lead created their SSH key in the default `~/.ssh` directory. So, the command above can be issued as follows instead:

```R
cyphr::data_admin_init(".", path_user = "~/.ssh/id_rsa")
Expand All @@ -74,11 +74,11 @@ given that the `cyphr::data_admin_init()` function will use the default SSH key

When running this command, the admin or the research project team lead will be asked for the passphrase they created for their personal SSH key (if they generated a passphrase). If the passphrase matches, then R will generate a data key for the project repository and appropriately setup the project for encryption. A directory named `.cyphr` will be created in project root directory (since this is a hidden directory, select *Show hidden files* in your file manager settings to see the directory). This directory should be kept within the project repository and should be committed to GitHub for versioning.

### Add encrypted data to the project repository
### 3. Add encrypted data to the project repository

Now that the admin or the research project team lead has setup the project repository for encryption, they can now add encrypted data to the project.

For this demonstration, we will use the `iris` dataset as our example raw data. We will store an ecrypted CSV copy of this dataset in the `data-raw` directory using the following commands:
For this demonstration, we will use the `iris` dataset as our example raw data. We will store an encrypted CSV copy of this dataset in the `data-raw` directory using the following commands:

```R
## Get the admin key ----
Expand Down Expand Up @@ -132,9 +132,9 @@ we are able to retrieve the data into R.
6 5.4 3.9 1.7 0.4 setosa
```

### Adding collaborators to access the data
### 4. Adding collaborators to access the data

Given that the team uses GitHub for versioning, the next step for the admin or research project team lead is to distribute the project repository as currently structured to their research project team members and authorised collaborators. This will be done by adding them to the project repository as members/collaborators.
Given that the team uses GitHub for versioning, the next step for the admin or research project team lead is to distribute the project repository as currently structured to their research project team members and authorised collaborators. This will be done by adding them to the GitHub project repository as members/collaborators.

Once added, these collaborators can now clone the repository and get their own copies of the workflow on their own machines. When they clone their own copies, this includes the encryption setup made by the admin or research project team lead.

Expand All @@ -148,7 +148,7 @@ collaborator1_key <- cyphr::data_key(".", path_user = path_key_collaborator1)
cyphr::data_request_access(".", path_user = path_key_collaborator1)
```

### Approval of collaborator request
### 5. Approval of collaborator request

Once the research project team member or collaborator has made a request, the admin or research project team lead can approve this request as follows:

Expand Down
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,23 +30,23 @@ ensure that raw CoD data and all of its data derivatives are encrypted
on transit and can only be decrypted by authorised research project team
members.

Given this, the research project team need to devise a project workflow
that will satistfy the encryption requirements while at the same time
Given this, the research project team needs to devise a project workflow
that will satisfy the encryption requirements while at the same time
allowing access of the data to all authorised research project team
members. The research project team is using R or their data management,
members. The research project team is using R for their data management,
analysis, and reporting and uses GitHub for versioning.

## Recommended/suggested workflow in R

The most appropriate tool in the R ecosystem that can support the
research project team in fulfilling the requirements for data protection
is the [`{cyphr}` package](https://docs.ropensci.org/cyphr/).
is the [`{cyphr}`](https://docs.ropensci.org/cyphr/) package.

Following is the recommended/suggested R workflow that will meet the
requirements for data protection as per the respective institution’s
regulations.

### Creation of personal SSH keys
### 1\. Creation of personal SSH keys

The backbone of this recommended/suggested encryption workflow is the
use of personal Secure Shell (SSH) protocol keys. Each authorised
Expand All @@ -59,15 +59,15 @@ is one of the most straightforward explanations on how to create your
personal SSH keys.

Best practice when generating your personal SSH keys is to always
generate a **passphrase** to encrypt your private key once it is
provide a **passphrase** to encrypt your private key once it is
generated and stored in your computer. Without a passphrase, anyone that
can gain access to your computer will also be able use your personal SSH
keys.

Note that this step should be done on the *command line* or *terminal*
and not on R console.

### Create a key for the data and encrypt that key with your personal key
### 2\. Create a key for the data and encrypt that key with your personal key

This step is a setup step that should be done by the administrator or
the research project team lead or any other research project team member
Expand All @@ -78,7 +78,7 @@ step.

This step is done through the R console (directory or via an IDE i.e.,
RStudio) and is facilitated using the `{cyphr}` package (hence, the
`{cyphr}` package should be installed prior to doing these steps.
`{cyphr}` package should be installed prior to doing these steps).

For this demonstration, we use a project repository structure where the
raw data will be placed within the `data-raw` directory and the
Expand All @@ -96,7 +96,7 @@ cyphr::data_admin_init(".", path_user = path_key_admin)
where `path_key_admin` is the path to the personal SSH keys generated by
the admin or research project team lead. For purposes of this
demonstration, let us say that the admin or research project team lead
created their SSH key in the default `~.ssh/` directory. So, the command
created their SSH key in the default `~/.ssh` directory. So, the command
above can be issued as follows instead:

``` r
Expand All @@ -122,14 +122,14 @@ in project root directory (since this is a hidden directory, select
This directory should be kept within the project repository and should
be committed to GitHub for versioning.

### Add encrypted data to the project repository
### 3\. Add encrypted data to the project repository

Now that the admin or the research project team lead has setup the
project repository for encryption, they can now add encrypted data to
the project.

For this demonstration, we will use the `iris` dataset as our example
raw data. We will store an ecrypted CSV copy of this dataset in the
raw data. We will store an encrypted CSV copy of this dataset in the
`data-raw` directory using the following commands:

``` r
Expand Down Expand Up @@ -184,13 +184,13 @@ we are able to retrieve the data into R.
6 5.4 3.9 1.7 0.4 setosa
```

### Adding collaborators to access the data
### 4\. Adding collaborators to access the data

Given that the team uses GitHub for versioning, the next step for the
admin or research project team lead is to distribute the project
repository as currently structured to their research project team
members and authorised collaborators. This will be done by adding them
to the project repository as members/collaborators.
to the GitHub project repository as members/collaborators.

Once added, these collaborators can now clone the repository and get
their own copies of the workflow on their own machines. When they clone
Expand All @@ -210,7 +210,7 @@ collaborator1_key <- cyphr::data_key(".", path_user = path_key_collaborator1)
cyphr::data_request_access(".", path_user = path_key_collaborator1)
```

### Approval of collaborator request
### 5\. Approval of collaborator request

Once the research project team member or collaborator has made a
request, the admin or research project team lead can approve this
Expand Down

0 comments on commit 41450e2

Please sign in to comment.