Skip to content

Commit

Permalink
adding patient-level emphasis
Browse files Browse the repository at this point in the history
  • Loading branch information
alexwalkerepi committed Nov 1, 2023
1 parent 543a385 commit 52dd6e8
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
2 changes: 1 addition & 1 deletion docs/actions-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ In general, actions are composed as follows:
* The `python`, `r`, and `stata-mp` commands provide a locked-down execution environment that can take one or more `inputs` which are passed to the code.
* Each action must include an `outputs` key with at least one output, classified as either `highly_sensitive` or `moderately_sensitive`
* `highly_sensitive` outputs are considered potentially highly-disclosive, and are never intended for publishing outside the secure environment. This includes all data at the pseudonymised patient-level. Outputs labelled highly_sensitive will not be visible to researchers.
* `moderately_sensitive` outputs are considered non-disclosive (providing the appropriate [statistical disclosure controls](releasing-files.md) have been applied) and are automatically copied to the secure review area (otherwise known as [Level 4](security-levels.md)). This includes aggregated patient-data outputs such as summary tables, summary statistics and the outputs from statistical models. For a full list check the [allowed file types subsection](releasing-files.md).
* Outputs should be separated onto different lines, each with a unique 'key', but related outputs can be combined using a wildcard (`*`). Note, when using a wildcare, it is extremely important to ensure that no `highly_sensitive` outputs are included. E.g.:
* `moderately_sensitive` outputs **should never include patient-level data**, only data that is considered non-disclosive. This includes aggregated patient-data outputs such as summary tables, summary statistics and the outputs from statistical models. For a full list check the [allowed file types subsection](releasing-files.md). The appropriate [statistical disclosure controls](releasing-files.md) should have been applied to these files. They are copied to the secure review area (otherwise known as [Level 4](security-levels.md)).
```yaml
outputs:
moderately_sensitive:
Expand Down
6 changes: 3 additions & 3 deletions docs/actions-scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,17 @@ This helps with:

Scripted actions can read and write output files that are saved in the workspace. These generally fall into two categories:
* large pseudonymised patient-level files of `highly_sensitive` data for use by other actions
* smaller `moderately_sensitive` aggregated patient-data (non patient-level data) files for review and release
* smaller `moderately_sensitive` aggregated patient-data (this should **never** be patient-level data) files for review and release


### Large `highly_sensitive` output files

Outputs should be classed as `highly_sensitive` if they are:
Outputs labelled `highly_sensitive` will not be visible to researchers. This is a [deliberate design feature of OpenSAFELY](https://www.opensafely.org/about/), intended to reduce the risk of disclosure of sensitive information. Outputs should **always** be classed as `highly_sensitive` if they are:

- Pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [study definition](study-def.md) or [dataset_definition](https://docs.opensafely.org/ehrql/).
- Pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset with certain filters/formatting applied.

These types of outputs are considered potentially highly-disclosive, and are never intended for publishing outside the secure environment. Outputs labelled highly_sensitive will not be visible to researchers.
These types of outputs are considered potentially highly-disclosive, should not be pushed to Level 4, and are never intended for publishing outside the secure environment.

Pseudonymised patient-level outputs tend to be large in size and therefore it is important that the right files formats are used for these large data files. The wrong formats can waste disk space, execution time, and server memory. The specific formats used vary with language ecosystem, but they should always be compressed.

Expand Down

0 comments on commit 52dd6e8

Please sign in to comment.