From 522570ff79e96d15e6dfa48b75d8cc6e35c64808 Mon Sep 17 00:00:00 2001 From: Alex Walker Date: Wed, 1 Nov 2023 14:20:59 +0000 Subject: [PATCH] minor edits --- docs/actions-pipelines.md | 2 +- docs/security-levels.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/actions-pipelines.md b/docs/actions-pipelines.md index 15a13abdf..ad02ae49a 100644 --- a/docs/actions-pipelines.md +++ b/docs/actions-pipelines.md @@ -69,8 +69,8 @@ In general, actions are composed as follows: * The `python`, `r`, and `stata-mp` commands provide a locked-down execution environment that can take one or more `inputs` which are passed to the code. * Each action must include an `outputs` key with at least one output, classified as either `highly_sensitive` or `moderately_sensitive` * `highly_sensitive` outputs are considered potentially highly-disclosive, and are never intended for publishing outside the secure environment. This includes all data at the pseudonymised patient-level. Outputs labelled highly_sensitive will not be visible to researchers. - * Outputs should be separated onto different lines, each with a unique 'key', but related outputs can be combined using a wildcard (`*`). Note, when using a wildcare, it is extremely important to ensure that no `highly_sensitive` outputs are included. E.g.: * `moderately_sensitive` outputs **should never include patient-level data**, only data that is considered non-disclosive. This includes aggregated patient-data outputs such as summary tables, summary statistics and the outputs from statistical models. For a full list check the [allowed file types subsection](releasing-files.md). The appropriate [statistical disclosure controls](releasing-files.md) should have been applied to these files. They are copied to the secure review area (otherwise known as [Level 4](security-levels.md)). + * Outputs should be separated onto different lines, each with a unique 'key', but related outputs can be combined using a wildcard (`*`). Note, when using a wildcard, it is extremely important to ensure that no `highly_sensitive` outputs are included. E.g.: ```yaml outputs: moderately_sensitive: diff --git a/docs/security-levels.md b/docs/security-levels.md index 62aa0a810..fcfa95be4 100644 --- a/docs/security-levels.md +++ b/docs/security-levels.md @@ -50,7 +50,7 @@ Data is held within the EHR vendor's secure environment on the OpenSAFELY server Data processor staff working at the EHR vendor and a small and restricted number of OpenSAFELY platform developers. Similar to level 1 above, researchers can query this pseudonymised external data, but only indirectly: they write their study analysis code away from the source data, in GitHub, and the OpenSAFELY service automates the execution of the study code against the external data. Only the aggregated results of their study are made available back to the researchers in Level 4 (see below). ## Level 3 [NHS England are data controllers of the data] -At this level data is typically stored as a pseudonymised patient-level (rather than event level) extract. It includes all pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [study definition](study-def.md) or [dataset_definition](https://docs.opensafely.org/ehrql/).. It also includes all of the pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset where certain filters/formatting have been applied. +At this level data is typically stored as a pseudonymised patient-level (rather than event level) extract. It includes all pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [study definition](study-def.md) or [dataset_definition](https://docs.opensafely.org/ehrql/). It also includes all of the pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset where certain filters/formatting have been applied. As the data stored at this level is still patient-level, access to this level is restricted to a small number of OpenSAFELY staff to allow data quality assessment and debugging problems. @@ -61,7 +61,7 @@ Data is held within the EHR vendor's secure environment on the OpenSAFELY server This is the same as Level 2. ## Level 4 [NHS England are data controllers of the data] -This level includes aggregated patient-data (non patient-level data) derived from queries run against Level 3 data, such as summary tables, summary statistics and the outputs from statistical models. It also includes the automatically created log files of each action/script corresponding to each file, for debugging purposes. +This level includes aggregated patient-data (non patient-level data) derived from analyses run against Level 3 data, such as summary tables, summary statistics and the outputs from statistical models. It also includes the automatically created log files of each action/script corresponding to each file, for debugging purposes. This is the only level that OpenSAFELY users have access to in order to view their aggregated data/results/log files; users do not have unfettered access to any patient-level data and only see aggregated outputs derived from their analysis code, which satisfies the GDPR principle of confidentiality. Researchers are able to use this level to check that the appropriate statistical disclosure controls have been applied to any files intended for release out of the server.