Skip to content

Commit

Permalink
minor edits
Browse files Browse the repository at this point in the history
  • Loading branch information
alexwalkerepi committed Nov 1, 2023
1 parent 52dd6e8 commit 522570f
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/actions-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ In general, actions are composed as follows:
* The `python`, `r`, and `stata-mp` commands provide a locked-down execution environment that can take one or more `inputs` which are passed to the code.
* Each action must include an `outputs` key with at least one output, classified as either `highly_sensitive` or `moderately_sensitive`
* `highly_sensitive` outputs are considered potentially highly-disclosive, and are never intended for publishing outside the secure environment. This includes all data at the pseudonymised patient-level. Outputs labelled highly_sensitive will not be visible to researchers.
* Outputs should be separated onto different lines, each with a unique 'key', but related outputs can be combined using a wildcard (`*`). Note, when using a wildcare, it is extremely important to ensure that no `highly_sensitive` outputs are included. E.g.:
* `moderately_sensitive` outputs **should never include patient-level data**, only data that is considered non-disclosive. This includes aggregated patient-data outputs such as summary tables, summary statistics and the outputs from statistical models. For a full list check the [allowed file types subsection](releasing-files.md). The appropriate [statistical disclosure controls](releasing-files.md) should have been applied to these files. They are copied to the secure review area (otherwise known as [Level 4](security-levels.md)).
* Outputs should be separated onto different lines, each with a unique 'key', but related outputs can be combined using a wildcard (`*`). Note, when using a wildcard, it is extremely important to ensure that no `highly_sensitive` outputs are included. E.g.:
```yaml
outputs:
moderately_sensitive:
Expand Down
4 changes: 2 additions & 2 deletions docs/security-levels.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Data is held within the EHR vendor's secure environment on the OpenSAFELY server
Data processor staff working at the EHR vendor and a small and restricted number of OpenSAFELY platform developers. Similar to level 1 above, researchers can query this pseudonymised external data, but only indirectly: they write their study analysis code away from the source data, in GitHub, and the OpenSAFELY service automates the execution of the study code against the external data. Only the aggregated results of their study are made available back to the researchers in Level 4 (see below).

## Level 3 [NHS England are data controllers of the data]
At this level data is typically stored as a pseudonymised patient-level (rather than event level) extract. It includes all pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [study definition](study-def.md) or [dataset_definition](https://docs.opensafely.org/ehrql/).. It also includes all of the pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset where certain filters/formatting have been applied.
At this level data is typically stored as a pseudonymised patient-level (rather than event level) extract. It includes all pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [study definition](study-def.md) or [dataset_definition](https://docs.opensafely.org/ehrql/). It also includes all of the pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset where certain filters/formatting have been applied.

As the data stored at this level is still patient-level, access to this level is restricted to a small number of OpenSAFELY staff to allow data quality assessment and debugging problems.

Expand All @@ -61,7 +61,7 @@ Data is held within the EHR vendor's secure environment on the OpenSAFELY server
This is the same as Level 2.

## Level 4 [NHS England are data controllers of the data]
This level includes aggregated patient-data (non patient-level data) derived from queries run against Level 3 data, such as summary tables, summary statistics and the outputs from statistical models. It also includes the automatically created log files of each action/script corresponding to each file, for debugging purposes.
This level includes aggregated patient-data (non patient-level data) derived from analyses run against Level 3 data, such as summary tables, summary statistics and the outputs from statistical models. It also includes the automatically created log files of each action/script corresponding to each file, for debugging purposes.

This is the only level that OpenSAFELY users have access to in order to view their aggregated data/results/log files; users do not have unfettered access to any patient-level data and only see aggregated outputs derived from their analysis code, which satisfies the GDPR principle of confidentiality. Researchers are able to use this level to check that the appropriate statistical disclosure controls have been applied to any files intended for release out of the server.

Expand Down

0 comments on commit 522570f

Please sign in to comment.