From f35e279bc0bb27dfc486104d3430fea39c3984a7 Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 13 Aug 2024 10:09:12 +0100 Subject: [PATCH 01/13] Move releasing docs to a subdirectory This means we can link to the overview page at the section root instead of having to click on the root to open the dropdown, and then on the overview link. --- docs/actions-pipelines.md | 4 ++-- .../index.md | 2 +- docs/jobs-site.md | 6 +++--- .../{releasing-files-intro.md => releasing/index.md} | 4 ++-- docs/{ => releasing}/output-checking.md | 0 docs/{ => releasing}/releasing-files.md | 2 +- docs/{ => releasing}/requesting-file-release.md | 4 ++-- docs/{ => releasing}/sdc.md | 2 +- docs/security-levels.md | 2 +- docs/workflow.md | 4 ++-- mkdocs.yml | 12 ++++++------ 11 files changed, 21 insertions(+), 21 deletions(-) rename docs/{releasing-files-intro.md => releasing/index.md} (81%) rename docs/{ => releasing}/output-checking.md (100%) rename docs/{ => releasing}/releasing-files.md (97%) rename docs/{ => releasing}/requesting-file-release.md (99%) rename docs/{ => releasing}/sdc.md (99%) diff --git a/docs/actions-pipelines.md b/docs/actions-pipelines.md index 7e4869354..e7e6bb1ee 100644 --- a/docs/actions-pipelines.md +++ b/docs/actions-pipelines.md @@ -69,7 +69,7 @@ In general, actions are composed as follows: * The `python`, `r`, and `stata-mp` commands provide a locked-down execution environment that can take one or more `inputs` which are passed to the code. * Each action must include an `outputs` key with at least one output, classified as either `highly_sensitive` or `moderately_sensitive` * `highly_sensitive` outputs are considered potentially highly-disclosive, and are never intended for publishing outside the secure environment. This includes all data at the pseudonymised patient-level. Outputs labelled highly_sensitive will not be visible to researchers. - * `moderately_sensitive` outputs **should never include patient-level data**, only data that is considered non-disclosive. This includes aggregated patient-data outputs such as summary tables, summary statistics and the outputs from statistical models. For a full list, check the [allowed file types subsection](requesting-file-release.md#allowed-file-types). The appropriate [statistical disclosure controls](sdc.md) should have been applied to these files. They are copied to the secure review area (otherwise known as [Level 4](security-levels.md)). + * `moderately_sensitive` outputs **should never include patient-level data**, only data that is considered non-disclosive. This includes aggregated patient-data outputs such as summary tables, summary statistics and the outputs from statistical models. For a full list, check the [allowed file types subsection](releasing/requesting-file-release.md#allowed-file-types). The appropriate [statistical disclosure controls](releasing/sdc.md) should have been applied to these files. They are copied to the secure review area (otherwise known as [Level 4](security-levels.md)). * Outputs should be separated onto different lines, each with a unique 'key', but related outputs can be combined using a wildcard (`*`). Note, when using a wildcard, it is extremely important to ensure that no `highly_sensitive` outputs are included. E.g.: ```yaml outputs: @@ -174,7 +174,7 @@ After your project has been executed via the [jobs site](jobs-site.md), its outp Users with permission to access Level 4 can view output files that are labelled as _moderately sensitive_; they can also view automatically created log files of the run for debugging purposes. -For security reasons, they will be in a different directory than if you had run locally. For the TPP backend, outputs labelled `moderately_sensitive` in the `project.yaml` will be saved in `D:/Level4Files/workspaces/`. These outputs can be [reviewed on the server](jobs-site.md#viewing-analysis-outputs-on-the-server) and [released if they are deemed non-disclosive](output-checking.md). +For security reasons, they will be in a different directory than if you had run locally. For the TPP backend, outputs labelled `moderately_sensitive` in the `project.yaml` will be saved in `D:/Level4Files/workspaces/`. These outputs can be [reviewed on the server](jobs-site.md#viewing-analysis-outputs-on-the-server) and [released if they are deemed non-disclosive](releasing/output-checking.md). Outputs labelled `highly_sensitive` are not visible. diff --git a/docs/getting-started/how-to/use-released-outputs-in-github-codespaces/index.md b/docs/getting-started/how-to/use-released-outputs-in-github-codespaces/index.md index 78e0db88e..75ef8b0d2 100644 --- a/docs/getting-started/how-to/use-released-outputs-in-github-codespaces/index.md +++ b/docs/getting-started/how-to/use-released-outputs-in-github-codespaces/index.md @@ -4,7 +4,7 @@ This is because the final stage frequently involves carefully crafting figures a making many small adjustments that would otherwise entail multiple round-trips to the OpenSAFELY Jobs site. Executing the final stage of a project pipeline outside a secure environment is only possible when the outputs from the previous stage have been released to the OpenSAFELY Jobs site. -[Released outputs](../../../releasing-files-intro.md) have been subject to statistical disclosure control and have been reviewed by two trained OpenSAFELY output checkers. +[Released outputs](../../../releasing/index.md) have been subject to statistical disclosure control and have been reviewed by two trained OpenSAFELY output checkers. To upload released outputs to a Codespace, using VS Code: diff --git a/docs/jobs-site.md b/docs/jobs-site.md index b04a3876b..f4dff7e30 100644 --- a/docs/jobs-site.md +++ b/docs/jobs-site.md @@ -36,8 +36,8 @@ graph TD Once outputs have been produced by running _jobs_ from within a _Workspace_, there are several stages they must go through before being made publicly available: -1. **Outputs on the [Level 4 server](level-4-server.md)**. These are aggregated patient-data (non patient-level data) outputs marked as `moderately_sensitive` in the `project.yaml` file and are only viewable when logged into the Level 4 server. These outputs have to be [reviewed by our output checking team](output-checking.md) before they can leave the server. -2. **Released outputs**. These are analysis outputs that have been reviewed for any [disclosivity issues](sdc.md#primary-vs-secondary-disclosure) and released from the Level 4 server by the output checking team to the relevant _Workspace_ on the Jobs site. These are only viewable if you have the correct permissions for the _Project_ the _Workspace_ belongs to. +1. **Outputs on the [Level 4 server](level-4-server.md)**. These are aggregated patient-data (non patient-level data) outputs marked as `moderately_sensitive` in the `project.yaml` file and are only viewable when logged into the Level 4 server. These outputs have to be [reviewed by our output checking team](releasing/output-checking.md) before they can leave the server. +2. **Released outputs**. These are analysis outputs that have been reviewed for any [disclosivity issues](releasing/sdc.md#primary-vs-secondary-disclosure) and released from the Level 4 server by the output checking team to the relevant _Workspace_ on the Jobs site. These are only viewable if you have the correct permissions for the _Project_ the _Workspace_ belongs to. 3. **Draft public outputs**. Released outputs can only be shared with close collaborators of your projects ([refer to the examples of who this could include](https://www.opensafely.org/policies-for-researchers/#all-datasets-sharing)). To be shared more widely, they have to first be approved by NHS England. Once approved, and if you have the correct jobs site permissions, you can create draft public outputs for approval. 4. **Published outputs**. Once approved, draft public outputs are made publicly available to view by anyone through the _Workspace_ they belong to. @@ -158,7 +158,7 @@ You can view the various [output types](#output-types) from the `Releases` secti ![Workspace Releases](./images/releases.png) -Any files that you would like to be released from the server, have to first be checked by our team of output checkers. Refer to the [instructions for requesting a release](requesting-file-release.md). +Any files that you would like to be released from the server, have to first be checked by our team of output checkers. Refer to the [instructions for requesting a release](releasing/requesting-file-release.md). Once reviewed, approved and released, your requested files will be available to view from your _Workspace_ in the _Released Outputs_ section of Releases. To view released outputs, you need to have the **ProjectDeveloper** or **ProjectCollaborator** role. If you would like to add a project collaborator to your _Workspace_, please read [this section](https://www.opensafely.org/policies-for-researchers/#all-datasets-sharing) of the researcher policy and/or contact your co-pilot (if you have one). diff --git a/docs/releasing-files-intro.md b/docs/releasing/index.md similarity index 81% rename from docs/releasing-files-intro.md rename to docs/releasing/index.md index d8e1f30f9..597703ed1 100644 --- a/docs/releasing-files-intro.md +++ b/docs/releasing/index.md @@ -1,4 +1,4 @@ -OpenSAFELY follows the [Five Safes](five-safes.md) framework for data access to allow safe and efficient use of data. +OpenSAFELY follows the [Five Safes](../five-safes.md) framework for data access to allow safe and efficient use of data. **Safe Outputs** is one of the dimensions of the Five Safes framework, which assesses any residual risk of disclosure of patient information in outputs wishing to be released from the secure environment. This risk is minimised by researchers applying **statistical disclosure controls** to their research outputs, followed by **output checking** of these outputs by our team of trained output checkers. @@ -7,4 +7,4 @@ In OpenSAFELY, there are 4 key “Safe Outputs” activities: 1. Researchers must [apply statistical disclosure controls to their research outputs](sdc.md). 2. [Requesting release of outputs from the Level 4 server](requesting-file-release.md) that are necessary to fulfil the purpose of a project. 3. [Review of the requested outputs](output-checking.md) by two trained OpenSAFELY output checkers. -4. [Release of outputs that meet our disclosure rules](releasing-files.md) to the relevant workspace on the [Jobs site](jobs-site.md). +4. [Release of outputs that meet our disclosure rules](releasing-files.md) to the relevant workspace on the [Jobs site](../jobs-site.md). diff --git a/docs/output-checking.md b/docs/releasing/output-checking.md similarity index 100% rename from docs/output-checking.md rename to docs/releasing/output-checking.md diff --git a/docs/releasing-files.md b/docs/releasing/releasing-files.md similarity index 97% rename from docs/releasing-files.md rename to docs/releasing/releasing-files.md index 4d49fbdce..158ffe972 100644 --- a/docs/releasing-files.md +++ b/docs/releasing/releasing-files.md @@ -1,4 +1,4 @@ -All approved OpenSAFELY outputs are released to the workspace they belong to on the [Jobs site](jobs-site.md). +All approved OpenSAFELY outputs are released to the workspace they belong to on the [Jobs site](../jobs-site.md). ### Viewing released outputs diff --git a/docs/requesting-file-release.md b/docs/releasing/requesting-file-release.md similarity index 99% rename from docs/requesting-file-release.md rename to docs/releasing/requesting-file-release.md index 00ced35ca..18db57053 100644 --- a/docs/requesting-file-release.md +++ b/docs/releasing/requesting-file-release.md @@ -69,9 +69,9 @@ Only certain file types will be reviewed and released from the secure server. Se * `json` files can be released, but as with tables, make sure that the attributes are easily understandable for reviewers. If the output can be represented as a table, you should consider converting it. * `html` files can be released if you are producing a report that is intended to be hosted on [reports.opensafely.org](https://reports.opensafely.org/) but please note the points below: * `html` files are harder to review than other output types, so should be reserved for reports which require both contextual text and embedded outputs. If you can produce your report locally, using individually released files, you should. - * Make sure that any code blocks are not rendered in the rendered report if they are not needed. You can find [examples showing how to do this for Jupyter notebooks and R markdown files](reports/intro.md#producing-reports). + * Make sure that any code blocks are not rendered in the rendered report if they are not needed. You can find [examples showing how to do this for Jupyter notebooks and R markdown files](../reports/intro.md#producing-reports). * Each individual output within the report should be requested for release separately, with the contextual information outlined above. - * `html` files should be stripped of any embedded javascript and styling. This is obfuscated when viewing a report via a web browser, but makes review of the raw file very difficult. Refer to our instructions [explaining how to strip the `html` files](reports/intro.md#producing-reports). + * `html` files should be stripped of any embedded javascript and styling. This is obfuscated when viewing a report via a web browser, but makes review of the raw file very difficult. Refer to our instructions [explaining how to strip the `html` files](../reports/intro.md#producing-reports). * When making a review request that includes `html` files, please include a link to the code you have used to produce the reports. If you would like to release other file types, please email , stating why it is important that the file is released in a different format. diff --git a/docs/sdc.md b/docs/releasing/sdc.md similarity index 99% rename from docs/sdc.md rename to docs/releasing/sdc.md index 748e9a9c2..d7f7aa243 100644 --- a/docs/sdc.md +++ b/docs/releasing/sdc.md @@ -59,7 +59,7 @@ When applying disclosure controls to your outputs, you should consider the poten ### Redacting counts less than or equal to 7 -Before requesting files to be released, work through the [moderately sensitive](actions-pipelines.md#accessing-outputs) files in the workspace folder systematically to identify any tables, figures, and other released text and objects that may be a disclosure risk. +Before requesting files to be released, work through the [moderately sensitive](../actions-pipelines.md#accessing-outputs) files in the workspace folder systematically to identify any tables, figures, and other released text and objects that may be a disclosure risk. The general principle is that **any statistic describing 7 or fewer patients, either directly or indirectly, should be redacted or combined into other statistics**. This includes: diff --git a/docs/security-levels.md b/docs/security-levels.md index f601db342..31c93d141 100644 --- a/docs/security-levels.md +++ b/docs/security-levels.md @@ -65,7 +65,7 @@ This level includes aggregated patient-data (non patient-level data) derived fro This is the only level that OpenSAFELY users have access to in order to view their aggregated data/results/log files; users do not have unfettered access to any patient-level data and only see aggregated outputs derived from their analysis code, which satisfies the GDPR principle of confidentiality. Researchers are able to use this level to check that the appropriate statistical disclosure controls have been applied to any files intended for release out of the server. -Access to this level is secured via VPN access to a remote desktop. No files are released from the secure environment without undergoing dual independent checking by trained output-checkers for disclosure issues (see the [statistical disclosure control section](sdc.md)) +Access to this level is secured via VPN access to a remote desktop. No files are released from the secure environment without undergoing dual independent checking by trained output-checkers for disclosure issues (see the [statistical disclosure control section](releasing/sdc.md)) ### Where is this data held? Data is held within the EHR vendor's secure environment on a specific server, separate from the Level 2 and 3 server. diff --git a/docs/workflow.md b/docs/workflow.md index d309b77d9..92615666c 100644 --- a/docs/workflow.md +++ b/docs/workflow.md @@ -19,8 +19,8 @@ This repo will contain all the code relating to your project, and a history of i - generating log files to debug the scripts when they run on the real data. 5. **Test the code** by running the analysis steps specified in the [_project pipeline_](actions-pipelines.md), which specifies the execution order for data extracts and analyses and the outputs to be released. 6. **Execute the analysis on the real data** via OpenSAFELY's [jobs site](jobs-site.md). This will generate outputs on the secure server. -7. **Check the output for [disclosivity](output-checking.md)** within the server, and redact if necessary. -8. **[Release](releasing-files.md) the outputs** via GitHub. +7. **Check the output for [disclosivity](releasing/output-checking.md)** within the server, and redact if necessary. +8. **[Release](releasing/releasing-files.md) the outputs** via GitHub. 9. **Repeat and iterate steps 2 to 8 as necessary**. These steps should always proceed with frequent git commits and code reviews where appropriate. Steps 2-5 can all be progressed on your local machine without accessing the real data. diff --git a/mkdocs.yml b/mkdocs.yml index a062018c7..2a29dcc3c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -62,12 +62,12 @@ nav: - Reusable actions: actions-reusable.md - Jobs site: jobs-site.md - Level 4 server: level-4-server.md - - Releasing research outputs from the Level 4 server: - - Overview: releasing-files-intro.md - - Applying statistical disclosure control: sdc.md - - Requesting release of research outputs: requesting-file-release.md - - Review process for release requests: output-checking.md - - Release of approved outputs: releasing-files.md + - Releasing research outputs: + - 'Overview: Releasing research outputs from the Level 4 server': releasing/index.md + - Applying statistical disclosure control: releasing/sdc.md + - Requesting release of research outputs: releasing/requesting-file-release.md + - Review process for release requests: releasing/output-checking.md + - Release of approved outputs: releasing/releasing-files.md - Releasing with Airlock: '!import https://github.com/opensafely-core/airlock?branch=main' - Reports: - Overview: reports/intro.md From 2754a551095867db8cad869474082d3360eaf4ef Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 13 Aug 2024 10:38:24 +0100 Subject: [PATCH 02/13] Update the releasing overview page This page wasn't very easy to read.This makes it reiterate the 5 safes instead of just linking back, and splits out the 4 activities into more clearly separated sections. --- docs/releasing/index.md | 36 +++++++++++++++++++++++++++++++----- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/docs/releasing/index.md b/docs/releasing/index.md index 597703ed1..63444fc25 100644 --- a/docs/releasing/index.md +++ b/docs/releasing/index.md @@ -1,10 +1,36 @@ OpenSAFELY follows the [Five Safes](../five-safes.md) framework for data access to allow safe and efficient use of data. -**Safe Outputs** is one of the dimensions of the Five Safes framework, which assesses any residual risk of disclosure of patient information in outputs wishing to be released from the secure environment. This risk is minimised by researchers applying **statistical disclosure controls** to their research outputs, followed by **output checking** of these outputs by our team of trained output checkers. +The Five Safes are: + +- Safe projects +- Safe people +- Safe data +- Safe settings +- **Safe outputs** + +When we release files from the Level 4 server, we need to take particular care of the +**Safe Outputs** dimension of the Five Safes framework, which assesses any residual risk of disclosure of patient information in outputs wishing to be released from the secure environment. This risk is minimised by researchers applying **statistical disclosure controls** to their research outputs, followed by **output checking** of these outputs by our team of trained output checkers. In OpenSAFELY, there are 4 key “Safe Outputs” activities: -1. Researchers must [apply statistical disclosure controls to their research outputs](sdc.md). -2. [Requesting release of outputs from the Level 4 server](requesting-file-release.md) that are necessary to fulfil the purpose of a project. -3. [Review of the requested outputs](output-checking.md) by two trained OpenSAFELY output checkers. -4. [Release of outputs that meet our disclosure rules](releasing-files.md) to the relevant workspace on the [Jobs site](../jobs-site.md). +**1. [Apply Statistical disclosure controls](sdc.md)** + +Researchers must apply statistical disclosure controls to their research outputs. + +**2. [Requesting release of outputs](requesting-file-release.md)** + +Researchers must follow a defined procedure for requesting release of outputs from the Level 4 server. This includes: + +- only requesting release of files that are necessary to fulfil the purpose of a project +- describing the context (why the files are requested for release) and statistical disclosure controls applied +- restricting files to specific allowed types +- limits on file size and number of rows in tables +- Airlock, a dedicated OpenSAFELY tool for managing the release request and review process + +**3. [Output checking](output-checking.md)** + +Review of the requested outputs by two trained OpenSAFELY output checkers. + +**4. [Release of files](releasing-files.md)** + +Release of outputs that meet our disclosure rules and have undergone thorough output checking to the relevant workspace on the [Jobs site](../jobs-site.md). From ea27833abe9cb4372b4aaa25b598a6d8da6e9251 Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 13 Aug 2024 13:07:46 +0100 Subject: [PATCH 03/13] Split up requesting files info and process Split the information (about allowed file types etc) about releasing files into a separate page to the specific process instructions. This means we can link to the Airlock docs for the process, and also include a link to the old process in case it's needed. --- docs/releasing/requesting-file-release.md | 86 +++++++++---------- .../requesting-release-offline-process.md | 40 +++++++++ 2 files changed, 83 insertions(+), 43 deletions(-) create mode 100644 docs/releasing/requesting-release-offline-process.md diff --git a/docs/releasing/requesting-file-release.md b/docs/releasing/requesting-file-release.md index 18db57053..5cf6dc6ba 100644 --- a/docs/releasing/requesting-file-release.md +++ b/docs/releasing/requesting-file-release.md @@ -1,4 +1,12 @@ -**Only specific members of the OpenSAFELY team trained in output checking have permissions to release the data**. Having applied disclosure controls to your aggregated study data you are ready to request their release. Please read the instructions and [checklist](#checklist) below. +!!! note + Only specific members of the OpenSAFELY team trained in output checking have permissions to release the data. + +Having applied disclosure controls to your aggregated study data you are ready to request their release. This page describes the restrictions on files that can be released and the +information you will need to provide in order to request release. For instructions on how +to create and submit your release request, please refer to the documentation on [releasing files with +Airlock](/using-opensafely/releasing-research-outputs/releasing-with-airlock). + +Note: the [previous manual process for requesting release of files](requesting-release-offline-process.md) is now deprecated. All release requests should be submitted via Airlock wherever possible. !!! warning You **MUST NOT** share any results that have not been released through the official output checking process. This includes: @@ -8,30 +16,19 @@ - transcribing (e.g., to paper or email) - using screen sharing software or any recording device/software -### Create a folder for outputs -First, create one folder in your workspace called `release` (if you have previously made a release, we suggest appending the date to the new folder name to distinguish it) and copy from your `output` folder to this `release` folder the data files that require review. The number of study outputs requested for review must be kept to a minimum and include only the results you absolutely need to export from the secure server. -### Complete a output review request form +### When should I create a release request? -When you are ready to request a release of your aggregated results please [complete this form](/documents/OpenSAFELY_Output_Review_Form_ADD_WORKSPACE_NAME_ADD_DATE.docx), renaming the form to replace the placeholders with your workspace name and the date. - -!!! note - Each data release entails substantial review work. To retain rapid turnaround times, external data releases should typically only be of results for final submission to a journal or public notebook; or a small number of necessary releases for discussion with external collaborators. +Each data release entails substantial review work. To retain rapid turnaround times, external data releases should typically only be of results for final submission to a journal or public notebook; or a small number of necessary releases for discussion with external collaborators. -#### Context requirements +!!! note "Tips for getting a quicker review" + Our resources for checking outputs are not unlimited, therefore it is advised to ensure you have all of your outputs ready at the same time for your project (or its current phase) so they can be reviewed together. Please make your outputs as understandable as possible for output checkers who will not be familiar with your project by, for example, using descriptive variable names and providing full descriptions of your outputs and + [contextual information](#context-and-controls). -For each output wishing to be released you will need to provide a clear contextual description including: +Another reason to ensure your analyses are complete is that re-running your study definition a short time later (e.g. to create an additional variable) may produce small differences in the previous results, e.g. due to movement of patients or codes added retrospectively to patient records. If you have already released similar results, any small changes in new outputs may be subject to small number suppression which may prevent the new outputs being released at all. (One solution to minimise this issue is to round all of your results, e.g. to the nearest 5). -1. The file path for each output -2. Variable descriptions -3. A description and count of the underlying sample of the population for each output. -4. Population size and degrees of freedom for all regression outputs. -5. Relationship to other data/tables which through combination may introduce secondary disclosive risks. - -Each section in the review request form should normally describe a single file, but where necessary for similar files, these can be grouped together and wildcards can be used for the file path (e.g. `release/hospitalisation_rate_by_*.csv`). **If you use a wildcard, please indicate how many files this captures**. - -### Release of intermediate data +#### Release of intermediate data In general, releases should be for final results from your project (see the note above). However, on some occassions it is appropriate to release intermediate data. Below are some suggestions for when this is appropriate: @@ -44,6 +41,21 @@ If requesting release of intermediate data there are a few considerations: * We recommend that you continue to develop downstream analysis actions within the OpenSAFELY pipeline, even if they are not intended to be run on the server. This helps maintain reproducibility. * Intermediate results can contain much more data than outputs produced at the end of the analysis pipeline. The data contained within these outputs should be the minimum amount required to produce the downstream outputs or receive feedback from project collaborators. +### Context and Controls + +For each group of outputs you wish to release you will need to provide a clear contextual description including: + +1. Variable descriptions. +1. A description and count of the underlying sample of the population for each output. +1. Population size and degrees of freedom for all regression outputs. +1. Relationship to other data/tables which through combination may introduce secondary disclosive risks. + +You will also need to descibe **controls** (i.e. [statistical disclosure controls](sdc.md) +that have been applied to the outputs. + +A group of files can include one output file, or multiple files that share the same +context and controls. + ### Error log files For error logs, they should only be requested for output in exceptional circumstances (for example, if you need to discuss the error and any related data within the log file with a researcher who writes code but does not have Level 4 results server access, otherwise we would expect both researchers to review the log via their VPN access). When an error log is requested, you must minimise any data required: make a copy of the log file and delete all data items that are not necessary. The less data that is present, the faster the review process. @@ -76,37 +88,25 @@ Only certain file types will be reviewed and released from the secure server. Se If you would like to release other file types, please email , stating why it is important that the file is released in a different format. -!!! note - The maximum file size that can be released is 16MB. Please check your outputs before requesting them for release. It is unlikely any outputs that exceed this in size are appropriate for release, but if you think they are, please let us know when making a release request. +### Maximum file size +The maximum file size that can be released is 16MB. Please check your outputs before requesting them for release. It is unlikely any outputs that exceed this in size are appropriate for release, but if you think they are, please let us know when making a release request. ### Checklist -Please run through this checklist before making a review request. +Please run through this checklist before submitted a review request. 1. Do your results adhere to the [OpenSAFELY permitted study results policy](https://www.opensafely.org/policies-for-researchers/#permitted-study-results-policy) -2. Are all of the outputs of the [allowed file types](#allowed-file-types)? -3. Are all of the outputs in a [separate release folder](#create-a-folder-for-outputs)? -4. Have you [redacted any low counts](sdc.md#redacting-counts-less-than-or-equal-to-7)? -5. Have you [rounded any counts](sdc.md#rounding-counts) (including [counts underlying rates](sdc.md#rounding-rates))? -6. Have you supplied underlying counts for all of your results? -7. Are all of the outputs clearly described? - * Is the filename sensible and is the filepath provided in the request form correct? - * Have you provided all of the context needed to review each output in isolation in the request form? - * Have you described the disclosure controls you have applied to each output? -8. If you are requesting the release of log files, are you sure they [need to be released](#error-log-files)? -9. Are all of the requested files below the [maximum file size](#allowed-file-types)? +1. Are all of the outputs of the [allowed file types](#allowed-file-types)? +1. Have you [redacted any low counts](sdc.md#redacting-counts-less-than-or-equal-to-7)? +1. Have you [rounded any counts](sdc.md#rounding-counts) (including [counts underlying rates](sdc.md#rounding-rates))? +1. Have you supplied underlying counts for all of your results? +1. Are all of the outputs clearly described? + * Have you provided all of the [context](#context-and-controls) needed to review each output in isolation? + * Have you described the [disclosure controls](#context-and-controls) you have applied to each output? +1. If you are requesting the release of log files, are you sure they [need to be released](#error-log-files)? +1. Are all of the requested files below the [maximum file size](#maximum-file-size)? Following this checklist will make your outputs easier to check, speed up review time and avoid the outputs having to be rechecked. -### Submitting the form - -Once you have completed this form, please send it to ****. The requested outputs will undergo independent review by two OpenSAFELY output checkers who will check that the outputs are within the scope of your original project proposal and that they do not present any disclosure risks. **Please allow up to 5 working days for feedback on your request**. - !!! warning The [Permitted Study Results Policy](https://www.opensafely.org/policies-for-researchers/#permitted-study-results-policy) may be updated: **always check the policy before every new release request.** - -!!! note - **Tips for getting a quicker review** - Our resources for checking outputs are not unlimited, therefore it is advised to ensure you have all of your outputs ready at the same time for your project (or its current phase) so they can be reviewed together. Please make your outputs as understandable as possible for output checkers who will not be familiar with your project by, for example, using descriptive variable names and providing full descriptions of each output in the form provided. - - Another reason to ensure your analyses are complete is that re-running your study definition a short time later (e.g. to create an additional variable) may produce small differences in the previous results, e.g. due to movement of patients or codes added retrospectively to patient records. If you have already released similar results, any small changes in new outputs may be subject to small number suppression which may prevent the new outputs being released at all. (One solution to minimise this issue is to round all of your results, e.g. to the nearest 5). diff --git a/docs/releasing/requesting-release-offline-process.md b/docs/releasing/requesting-release-offline-process.md new file mode 100644 index 000000000..b814840a7 --- /dev/null +++ b/docs/releasing/requesting-release-offline-process.md @@ -0,0 +1,40 @@ +!!! note + This page describes the process used for requesting release of file prior to + Airlock. New release requests should be made using [Airlock](/using-opensafely/releasing-research-outputs/releasing-with-airlock) wherever possible. + + +### Create a folder for outputs + +First, create one folder in your workspace called `release` (if you have previously made a release, we suggest appending the date to the new folder name to distinguish it) and copy from your `output` folder to this `release` folder the data files that require review. The number of study outputs requested for review must be kept to a minimum and include only the results you absolutely need to export from the secure server. + +### Complete a output review request form + +When you are ready to request a release of your aggregated results please [complete this form](/documents/OpenSAFELY_Output_Review_Form_ADD_WORKSPACE_NAME_ADD_DATE.docx), renaming the form to replace the placeholders with your workspace name and the date. + +#### Context requirements + +For each output wishing to be released you will need to provide a clear contextual description including: + +1. The file path for each output +2. Variable descriptions +3. A description and count of the underlying sample of the population for each output. +4. Population size and degrees of freedom for all regression outputs. +5. Relationship to other data/tables which through combination may introduce secondary disclosive risks. + +Each section in the review request form should normally describe a single file, but where necessary for similar files, these can be grouped together and wildcards can be used for the file path (e.g. `release/hospitalisation_rate_by_*.csv`). **If you use a wildcard, please indicate how many files this captures**. + +### Checklist + +Please run through [the checklist](requesting-file-release.md#checklist) before making a review request. In addition, check: + +1. Are all of the outputs in a [separate release folder](#create-a-folder-for-outputs)? +1. Are all of the outputs clearly described? + * Is the filename sensible and is the filepath provided in the request form correct? + * Have you provided all of the context needed to review each output in isolation in the request form? + * Have you described the disclosure controls you have applied to each output? + +### Submitting the form + +Once you have completed this form, please send it to ****. The requested outputs will undergo independent review by two OpenSAFELY output checkers who will check that the outputs are within the scope of your original project proposal and that they do not present any disclosure risks. **Please allow up to 5 working days for feedback on your request**. + +---8<-- 'includes/glossary.md' From 88e0b377af2302dbf475da55934ccbd8601478af Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 13 Aug 2024 13:26:06 +0100 Subject: [PATCH 04/13] Update output-checking page for Airlock Removes some docs that were specific to the old process, but otherwise mostly unchanged. --- docs/releasing/output-checking.md | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/docs/releasing/output-checking.md b/docs/releasing/output-checking.md index d4754b2f1..171c107e0 100644 --- a/docs/releasing/output-checking.md +++ b/docs/releasing/output-checking.md @@ -1,10 +1,23 @@ Before any files are released from the secure server, they are checked independently by two trained OpenSAFELY output checkers. Each checked output is marked as one of the following categories: * **Approve** — output meets disclosure requirements and is safe to be released -* **Approve subject to change** — output is an acceptable type for release, but has outstanding disclosure issues that must be addressed before release +* **Request changes** — output is an acceptable type for release, but has outstanding disclosure issues that must be addressed before release * **Reject** — output is not an acceptable type for release. An example is the release of practice level data which does not meet the [permitted study results policy](https://www.opensafely.org/policies-for-researchers/#permitted-study-results-policy) -Once reviewed, the completed review request will be emailed back to you. We aim to provide a response to review requests within **5 working days**. If all outputs are approved, they will then be released. If one or more outputs are approved subject to change, you will need to address the disclosure issues and submit a new review form detailing the changes you have made. + +=== "Responding to requests - Airlock" + + Requests submitted via Airlock will also be reviewed by output checkers on Airlock. If + the output checkers require changes or have questions about the requested files, they + will return the release request to you. You will receive an email notification when this happens. + + For further information on how to submit and respond to returned requests, please see the + documentation on [releasing with Airlock](/using-opensafely/releasing-research-outputs/ + releasing-with-airlock). + +=== "Responding to requests - manual process (deprecated)" + + Once reviewed, the completed review request will be emailed back to you. We aim to provide a response to review requests within **5 working days**. If all outputs are approved, they will then be released. If one or more outputs are approved subject to change, you will need to address the disclosure issues and submit a new review form detailing the changes you have made. ### Most common problems with output review requests @@ -12,13 +25,11 @@ Below are the most common problems encountered by output checkers when reviewing 1. **There are unrounded counts in the outputs**. All counts should be [rounded](sdc.md#rounding-counts). This includes rounding counts prior to them being used to calculate further statistics, such as percentages or odds ratios. Commonly raw counts are rounded, but downstream statistics are calculated using the raw counts rather than the rounded counts. Unrounded counts account for **~30%** of rejections. 2. **Insufficicent context is provided for the outputs**. **~25%** of rejected outputs are due to insufficient context. Make sure you have provided all of the context needed to review each output in isolation in the request form. Common errors include: - * Stating the incorrect file path. You should check all file paths point to the relevant files within your `release` folder before making a request. - * Files included in the review form being missing from the `review` folder. - * Using unclear column/variable names or poorly describing the presented data. Refer to the [context requirements](requesting-file-release.md#context-requirements). + * Using unclear column/variable names or poorly describing the presented data. Refer to the [context requirements](requesting-file-release.md#context-and-controls). * Not clearly indicating the relationship between different outputs. - * Where an output has previously been requests, not indicating how the output differs to previously reviewed version. + * Where an output has previously been requested, not indicating how the output differs to previously reviewed version. 3. **There are unredacted counts in the outputs**. Prior to rounding counts, [any counts <=7 should be redacted](sdc.md#redacting-counts-less-than-or-equal-to-7). The redaction approach should be clearly described when making a review request. It is not uncommon for the stated redaction approach to be improperly implemented in the outputs. Inappropriate redaction of low counts accounts for **~20%** of rejected outputs. 4. **Underlying data is not provided**. To ensure the low number threshold is met, reviewers require to see the underlying data for each output. This includes the data used to generate figures and to calculate summary statistics such as mean or median. **~10%** of rejected outputs are due to underlying data not being provided. -5. **Unsupported file types being requested**. Files requested for release should be one of the [allowed file types](requesting-file-release.md#allowed-file-types). If you are requesting the release of HTML files, please make sure you have followed the [guidance for HTML files](requesting-file-release.md#allowed-file-types). **~10%** of rejected outputs are due to unsupported file types being requested. +5. **Unsupported file types being requested**. Files requested for release should be one of the [allowed file types](requesting-file-release.md#allowed-file-types). If you are requesting the release of HTML files, please make sure you have followed the [guidance for HTML files](requesting-file-release.md#allowed-file-types). **~10%** of rejected outputs are due to unsupported file types being requested. (Note: Airlock will automatically restrict output files in a request to only allowed file types.) To help avoid these issues, please make sure you have read the [checklist](requesting-file-release.md#checklist) before submitting your review request. From 9a23588b360fd73146d468ef5d62f4f80f79104a Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 13 Aug 2024 13:30:43 +0100 Subject: [PATCH 05/13] Rename releasing-files to viewing-released-files This page was not about actually releasing the files, but about viewing those released files on job-server. --- docs/releasing/index.md | 7 +++++-- docs/releasing/requesting-file-release.md | 2 +- .../{releasing-files.md => viewing-released-files.md} | 0 mkdocs.yml | 2 +- 4 files changed, 7 insertions(+), 4 deletions(-) rename docs/releasing/{releasing-files.md => viewing-released-files.md} (100%) diff --git a/docs/releasing/index.md b/docs/releasing/index.md index 63444fc25..f55349292 100644 --- a/docs/releasing/index.md +++ b/docs/releasing/index.md @@ -31,6 +31,9 @@ Researchers must follow a defined procedure for requesting release of outputs fr Review of the requested outputs by two trained OpenSAFELY output checkers. -**4. [Release of files](releasing-files.md)** +**4. [Retricted viewing of released files](viewing-released-files.md)** -Release of outputs that meet our disclosure rules and have undergone thorough output checking to the relevant workspace on the [Jobs site](../jobs-site.md). +Outputs that meet our disclosure rules and have undergone thorough output checking are +released to the relevant workspace on the [Jobs site](../jobs-site.md). Viewing of +released outputs is restricted to individuals with the relevant roles on the jobs +site, and is not publicly accessible until outputs are published. diff --git a/docs/releasing/requesting-file-release.md b/docs/releasing/requesting-file-release.md index 5cf6dc6ba..1e633134b 100644 --- a/docs/releasing/requesting-file-release.md +++ b/docs/releasing/requesting-file-release.md @@ -32,7 +32,7 @@ Another reason to ensure your analyses are complete is that re-running your stud In general, releases should be for final results from your project (see the note above). However, on some occassions it is appropriate to release intermediate data. Below are some suggestions for when this is appropriate: -* You think you may need to make minor edits to final outputs such as changing figure labels. Release of the intermediate data allows you to make these [changes locally](releasing-files.md#running-further-analyses-on-released-outputs). +* You think you may need to make minor edits to final outputs such as changing figure labels. Release of the intermediate data allows you to make these [changes locally](viewing-released-files.md#running-further-analyses-on-released-outputs). * A large number of outputs are produced from a single intermediate output. Release of the intermediate data underlying the figures (which needs to be checked whether it is released or not) avoids the need to check the downstream outputs. * The intermediate data doesn't contain person-level data, but is used for running a model that would produce multiple outputs. diff --git a/docs/releasing/releasing-files.md b/docs/releasing/viewing-released-files.md similarity index 100% rename from docs/releasing/releasing-files.md rename to docs/releasing/viewing-released-files.md diff --git a/mkdocs.yml b/mkdocs.yml index 2a29dcc3c..cf0de3aeb 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -67,7 +67,7 @@ nav: - Applying statistical disclosure control: releasing/sdc.md - Requesting release of research outputs: releasing/requesting-file-release.md - Review process for release requests: releasing/output-checking.md - - Release of approved outputs: releasing/releasing-files.md + - Viewing released outputs: releasing/viewing-released-files.md - Releasing with Airlock: '!import https://github.com/opensafely-core/airlock?branch=main' - Reports: - Overview: reports/intro.md From 61f82ca17237fc7c58d81b42ddf05a7967eea2c9 Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 13 Aug 2024 13:48:34 +0100 Subject: [PATCH 06/13] Update workflow page links --- docs/workflow.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/workflow.md b/docs/workflow.md index 92615666c..01b9ce147 100644 --- a/docs/workflow.md +++ b/docs/workflow.md @@ -19,8 +19,8 @@ This repo will contain all the code relating to your project, and a history of i - generating log files to debug the scripts when they run on the real data. 5. **Test the code** by running the analysis steps specified in the [_project pipeline_](actions-pipelines.md), which specifies the execution order for data extracts and analyses and the outputs to be released. 6. **Execute the analysis on the real data** via OpenSAFELY's [jobs site](jobs-site.md). This will generate outputs on the secure server. -7. **Check the output for [disclosivity](releasing/output-checking.md)** within the server, and redact if necessary. -8. **[Release](releasing/releasing-files.md) the outputs** via GitHub. +7. **Check the output for [disclosivity](releasing/sdc.md)** within the server, and redact if necessary. +8. **[Request release](releasing/requesting-file-release.md) of the outputs** 9. **Repeat and iterate steps 2 to 8 as necessary**. These steps should always proceed with frequent git commits and code reviews where appropriate. Steps 2-5 can all be progressed on your local machine without accessing the real data. From b3be6e095cb5fc8c06dc66e3f0193aa32a9def17 Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 13 Aug 2024 14:22:05 +0100 Subject: [PATCH 07/13] Replace docs on viewing analysis outputs on job-server with Airlock --- docs/jobs-site.md | 29 +++-------------------------- 1 file changed, 3 insertions(+), 26 deletions(-) diff --git a/docs/jobs-site.md b/docs/jobs-site.md index f4dff7e30..5a0d808b6 100644 --- a/docs/jobs-site.md +++ b/docs/jobs-site.md @@ -123,34 +123,11 @@ Each job will either succeed or fail. In either case, the output and log files a ## Viewing analysis outputs on the server You can view `moderately_sensitive` outputs from any of your submitted _jobs_ -via the Jobs website **if you have access to and are logged into the backend +via Airlock **if you have access to and are logged into the backend the job was run on**. -However, whilst normally you log into from your machines -browser using GitHub, the secure server does not have access to -GitHub. So you need to use an alternate method to login, by generating a Single -Use Token, and then using it on the secure server to log in. - -To generate a Single Use Token, before logging into the secure server, visit -[https://jobs.opensafely.org/settings/](https://jobs.opensafely.org/settings/), -and click on "Generate a Single Use Token". This will be 3 english words, which -you can memorize or write down. This token can be used to log in as you, but is -only valid for a short time, and only works once. - -![Generate Single Use Token](./images/token.png) - -Once you are logged into the server via the VPN: - -* Navigate to [https://jobs.opensafely.org/](https://jobs.opensafely.org/) using Google Chrome (make sure to use `https://`) -* Log in using your email or GitHub username, and the Single Use Token from the above step. -* You should be now logged in. This login will expire after two weeks of not being used. - -Once logged in, to view your `moderately_sensitive` outputs: - -* Navigate to your _Workspace_ -* Under _Releases_, navigate to Level 4 Outputs -* Choose the correct backend -* Pick the file you would like to view from the list of files (you can search) +Refer to the [Airlock documentation](/using-opensafely/releasing-research-outputs/releasing-with-airlock) for information on how to access and view +outputs via Airlock. ## Viewing released outputs From 529ab0790504e325e0920460c247b038a77bd00d Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 13 Aug 2024 14:38:29 +0100 Subject: [PATCH 08/13] Update links in data-access-policy.md Update links to releasing section, plus driveby fix to change absolute links to other docs pages to relative links. --- docs/data-access-policy.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/docs/data-access-policy.md b/docs/data-access-policy.md index 654c325ec..0deb53a9e 100644 --- a/docs/data-access-policy.md +++ b/docs/data-access-policy.md @@ -42,12 +42,12 @@ Full details can be found in our documentation, but security controls include: * Logging of all operations, in public wherever possible * Manual checking of outputs for disclosivity before they are released from the secure environment -Approved researchers (with appropriate permissions) can access the secure environment to view the aggregated results of their study ([level 4 data](https://docs.opensafely.org/security-levels/)); researchers ensure that appropriate statistical disclosure controls have been applied before requesting results for release. +Approved researchers (with appropriate permissions) can access the secure environment to view the aggregated results of their study ([level 4 data](security-levels.md)); researchers ensure that appropriate statistical disclosure controls have been applied before requesting results for release. Researchers can also view the “log files” (also within the secure environment) produced when their study code runs, to help debug any issues. -Researchers conducting projects do not have access to directly view any other pseudonymised patient data (i.e. [levels 1, 2 or 3](https://docs.opensafely.org/security-levels/)). +Researchers conducting projects do not have access to directly view any other pseudonymised patient data (i.e. [levels 1, 2 or 3](security-levels.md)). Research Projects in OpenSAFELY cannot access or process data from patients who have registered a type 1 opt-out[^1]. -This is ensured by the software (see [technical explanation](https://docs.opensafely.org/type-one-opt-outs/)). +This is ensured by the software (see [technical explanation](type-one-opt-outs.md)). ### Who @@ -75,7 +75,7 @@ These cover four key areas: These projects are subject to the same approvals process as Research Projects, and are run via the standard OpenSAFELY Reproducible Analytical Pipeline. Short Data Report Projects in OpenSAFELY cannot access or process data from patients who have registered a type 1 opt-out[^1]. -This is ensured by the software (see [technical explanation](https://docs.opensafely.org/type-one-opt-outs/)). +This is ensured by the software (see [technical explanation](type-one-opt-outs.md)). ### Who @@ -96,9 +96,9 @@ They also review that projects are in-line with the stated purpose, and remind p ### What As part of the services offered by the OpenSAFELY Co-Pilot Service, co-pilots may help pilots by contributing code to their study on GitHub. -It is also possible that a co-pilot might need to view the output files of a pilot's project on the secure environment ([level 4 data](https://docs.opensafely.org/security-levels/)) for purposes such as troubleshooting issues with code and helping with the process of requesting release of results files, such as applying the appropriate statistical disclosure controls. +It is also possible that a co-pilot might need to view the output files of a pilot's project on the secure environment ([level 4 data](security-levels.md)) for purposes such as troubleshooting issues with code and helping with the process of requesting release of results files, such as applying the appropriate statistical disclosure controls. A co-pilot may also need to review the results files (both on the secure environment and those released to Job Server) when reviewing a pilot's paper/report for publication, to check that the correct processes have been followed and that all associated analyses and results are in line with the IG agreements. -There is no need for co-pilots to have access to any other patient data (i.e. [levels 1, 2 or 3](https://docs.opensafely.org/security-levels/)). +There is no need for co-pilots to have access to any other patient data (i.e. [levels 1, 2 or 3](security-levels.md)). ### Who @@ -113,11 +113,11 @@ While co-piloting activities are largely focused on the first four weeks of trai ### Why A key element of the OpenSAFELY service, and one of the [Five Safes](https://www.bennett.ox.ac.uk/blog/2023/03/the-five-safes-framework-and-applying-it-to-opensafely/) is “safe outputs”. -Statistical disclosure control is applied as necessary to research outputs, which then go through an [output checking process](https://docs.opensafely.org/releasing-files/) to minimise the risk of disclosure of identifiable information before outputs are released from the secure environment. +Statistical disclosure control is applied as necessary to research outputs, which then go through an [output checking process](../releasing/releasing-research-outputs.md) to minimise the risk of disclosure of identifiable information before outputs are released from the secure environment. ### What -Checking research outputs requires access to aggregated study files within the secure environment ([level 4 data](https://docs.opensafely.org/security-levels/)). +Checking research outputs requires access to aggregated study files within the secure environment ([level 4 data](security-levels.md)). Releasing approved files requires access to trigger the file release mechanism. ### Who @@ -170,7 +170,7 @@ This work may be done by individuals in different roles: ### Why -For software development and maintenance purposes, it is sometimes necessary to execute OpenSAFELY [project pipelines](https://docs.opensafely.org/actions-pipelines/#project-pipelines). +For software development and maintenance purposes, it is sometimes necessary to execute OpenSAFELY [project pipelines](actions-pipelines.md#project-pipelines). The test project pipelines are necessary to: * ensure that the entire OpenSAFELY platform is operating correctly @@ -188,7 +188,7 @@ Because they are standard OpenSAFELY pipelines, they benefit from all the same s * logging of all operations * manual checking of outputs for disclosivity before they are released from the backend -Access to the OpenSAFELY platform to view [level 4 data](https://docs.opensafely.org/security-levels/) will also be required to verify the pipeline outputs are as expected. +Access to the OpenSAFELY platform to view [level 4 data](security-levels.md) will also be required to verify the pipeline outputs are as expected. Because the OpenSAFELY platform makes it possible to query data from patients who have a type 1 opt-out under approved conditions, it may be necessary for developers to run end-to-end tests that include these patients’ data. This is to ensure that all studies run correctly and that the mechanisms for excluding access to these patients’ data are operating correctly. @@ -207,13 +207,13 @@ During development of new features, test projects may be run via the pipeline to ### Why -Researchers conducting Research Projects or Short Data Report Projects may need technical assistance if there are errors or problems when attempting to run their [project pipelines](https://docs.opensafely.org/actions-pipelines/#project-pipelines). +Researchers conducting Research Projects or Short Data Report Projects may need technical assistance if there are errors or problems when attempting to run their [project pipelines](actions-pipelines.md#project-pipelines). The issues may have been introduced by system failures or bugs in the platform or by errors made by the researchers themselves. ### What As part of the exploration and/or resolution of a technical issue, OpenSAFELY Platform Developers may need to amend the study code and/or re-run the project pipeline for the Approved Project. -They will also need to view the outputs ([level 3 and 4 data](https://docs.opensafely.org/security-levels/)), to check for error resolution. +They will also need to view the outputs ([level 3 and 4 data](security-levels.md)), to check for error resolution. ### Who @@ -238,9 +238,9 @@ Access to backend systems is via secure, authenticated, encrypted channels. This * the OpenSAFELY software running in the backend * logs that the OpenSAFELY software produces -* pseudonymised patient data ([level 1](https://docs.opensafely.org/security-levels/#level-1-gps-are-data-controllers-of-the-data) and [level 2](https://docs.opensafely.org/security-levels/#level-2-nhs-england-are-data-controllers-of-the-data)) -* intermediate outputs (study dataset) ([level 3 data](https://docs.opensafely.org/security-levels/#level-3-nhs-england-are-data-controllers-of-the-data)) -* unchecked, unreleased aggregated study outputs ([level 4 data](https://docs.opensafely.org/security-levels/#level-4-nhs-england-are-data-controllers-of-the-data)). +* pseudonymised patient data ([level 1](security-levels.md#level-1-gps-are-data-controllers-of-the-data) and [level 2](security-levels.md#level-2-nhs-england-are-data-controllers-of-the-data)) +* intermediate outputs (study dataset) ([level 3 data](security-levels.md#level-3-nhs-england-are-data-controllers-of-the-data)) +* unchecked, unreleased aggregated study outputs ([level 4 data](security-levels.md#level-4-nhs-england-are-data-controllers-of-the-data)). This policy does not allow the extraction of pseudonymised patient data or study outputs from the system via this process; the data can only be studied within the system itself. Some operational data may be released under controlled conditions, as detailed in a separate Operational Data Policy. @@ -268,7 +268,7 @@ In order to provide datasets to researchers, we need to ensure that we understan Depending on the requirements of the work, Data Development may be done via the standard OpenSAFELY Reproducible Analytical Pipeline or may require different methods of access. -Some aspects of this work require direct access to the pseudonymised patient data ([level 1 and 2](https://docs.opensafely.org/security-levels/)). +Some aspects of this work require direct access to the pseudonymised patient data ([level 1 and 2](security-levels.md)). Access to backend systems is via secure, authenticated, encrypted channels. All access to the system and database operations are logged. Because the OpenSAFELY platform supports projects that may include data from patients who have an active type 1 opt-out, it may be necessary to include these patients’ data in this development work. From f8dba6e659b75b219c69cf644cb1271ce50c37bd Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 13 Aug 2024 14:44:01 +0100 Subject: [PATCH 09/13] Redirects for the relocated releasing docs --- docs/_redirects | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/_redirects b/docs/_redirects index ff4770fe0..18259ccec 100644 --- a/docs/_redirects +++ b/docs/_redirects @@ -87,3 +87,9 @@ /study-def-variables/ /legacy/study-def-variables/ 301 /study-def /legacy/study-def/ 301 /study-def/ /legacy/study-def/ 301 +/releasing-files-intro/ /releasing/ 301 +/sdc/ /releasing/sdc/ 301 +/requesting-file-release/ /releasing/requesting-file-release/ 301 +/output-checking/ /releasing/output-checking/ 301 +/releasing-files/ /releasing/viewing-released-files/ 301 +/using-opensafely/releasing-research-outputs-from-the-level-4-server/releasing-with-airlock/ /using-opensafely/releasing-research-outputs/releasing-with-airlock/ 301 From 4651492d08566321acf976b14b6bc85aaa6d8b65 Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Tue, 20 Aug 2024 11:28:09 +0100 Subject: [PATCH 10/13] Add airlock docs to the paths for snippets --- mkdocs.yml | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mkdocs.yml b/mkdocs.yml index cf0de3aeb..f15dc5b78 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -192,8 +192,12 @@ markdown_extensions: base_path: # base paths will be checked in order for matching snippets - . # dir containing this config file - imported_docs/ehrql # docs folder for the imported repo + # docs folder for the imported airlock repo + - imported_docs/using-opensafely/releasing-research-outputs/releasing-with-airlock auto_append: - includes/glossary.md + - imported_docs/using-opensafely/releasing-research-outputs/releasing-with-airlock/airlock-includes/glossary.md + - pymdownx.superfences: custom_fences: - name: mermaid From 2b75d07517aa01a54501f8a6976600423c1b0057 Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Thu, 22 Aug 2024 11:57:44 +0100 Subject: [PATCH 11/13] Move old output-checking process to legacy section --- .../requesting-release-offline-process.md | 7 ++++-- docs/releasing/output-checking.md | 22 +++++++------------ docs/releasing/requesting-file-release.md | 2 +- 3 files changed, 14 insertions(+), 17 deletions(-) rename docs/{releasing => legacy}/requesting-release-offline-process.md (82%) diff --git a/docs/releasing/requesting-release-offline-process.md b/docs/legacy/requesting-release-offline-process.md similarity index 82% rename from docs/releasing/requesting-release-offline-process.md rename to docs/legacy/requesting-release-offline-process.md index b814840a7..4945a225c 100644 --- a/docs/releasing/requesting-release-offline-process.md +++ b/docs/legacy/requesting-release-offline-process.md @@ -25,7 +25,7 @@ Each section in the review request form should normally describe a single file, ### Checklist -Please run through [the checklist](requesting-file-release.md#checklist) before making a review request. In addition, check: +Please run through [the checklist](../releasing/requesting-file-release.md#checklist) before making a review request. In addition, check: 1. Are all of the outputs in a [separate release folder](#create-a-folder-for-outputs)? 1. Are all of the outputs clearly described? @@ -37,4 +37,7 @@ Please run through [the checklist](requesting-file-release.md#checklist) before Once you have completed this form, please send it to ****. The requested outputs will undergo independent review by two OpenSAFELY output checkers who will check that the outputs are within the scope of your original project proposal and that they do not present any disclosure risks. **Please allow up to 5 working days for feedback on your request**. ----8<-- 'includes/glossary.md' + +### Responding to requests + +Once reviewed, the completed review request will be emailed back to you. We aim to provide a response to review requests within **5 working days**. If all outputs are approved, they will then be released. If one or more outputs are approved subject to change, you will need to address the disclosure issues and submit a new review form detailing the changes you have made. diff --git a/docs/releasing/output-checking.md b/docs/releasing/output-checking.md index 171c107e0..4b369c1b2 100644 --- a/docs/releasing/output-checking.md +++ b/docs/releasing/output-checking.md @@ -4,20 +4,14 @@ Before any files are released from the secure server, they are checked independe * **Request changes** — output is an acceptable type for release, but has outstanding disclosure issues that must be addressed before release * **Reject** — output is not an acceptable type for release. An example is the release of practice level data which does not meet the [permitted study results policy](https://www.opensafely.org/policies-for-researchers/#permitted-study-results-policy) - -=== "Responding to requests - Airlock" - - Requests submitted via Airlock will also be reviewed by output checkers on Airlock. If - the output checkers require changes or have questions about the requested files, they - will return the release request to you. You will receive an email notification when this happens. - - For further information on how to submit and respond to returned requests, please see the - documentation on [releasing with Airlock](/using-opensafely/releasing-research-outputs/ - releasing-with-airlock). - -=== "Responding to requests - manual process (deprecated)" - - Once reviewed, the completed review request will be emailed back to you. We aim to provide a response to review requests within **5 working days**. If all outputs are approved, they will then be released. If one or more outputs are approved subject to change, you will need to address the disclosure issues and submit a new review form detailing the changes you have made. +### Responding to requests +Requests submitted via Airlock will also be reviewed by output checkers on Airlock. If +the output checkers require changes or have questions about the requested files, they +will return the release request to you. You will receive an email notification when this happens. + +For further information on how to submit and respond to returned requests, please see the +documentation on [releasing with Airlock](/using-opensafely/releasing-research-outputs/ +releasing-with-airlock). ### Most common problems with output review requests diff --git a/docs/releasing/requesting-file-release.md b/docs/releasing/requesting-file-release.md index 1e633134b..3394e094d 100644 --- a/docs/releasing/requesting-file-release.md +++ b/docs/releasing/requesting-file-release.md @@ -6,7 +6,7 @@ information you will need to provide in order to request release. For instructio to create and submit your release request, please refer to the documentation on [releasing files with Airlock](/using-opensafely/releasing-research-outputs/releasing-with-airlock). -Note: the [previous manual process for requesting release of files](requesting-release-offline-process.md) is now deprecated. All release requests should be submitted via Airlock wherever possible. +Note: the [previous manual process for requesting release of files](../legacy/requesting-release-offline-process.md) is now deprecated. All release requests should be submitted via Airlock wherever possible. !!! warning You **MUST NOT** share any results that have not been released through the official output checking process. This includes: From 3ac4254c75edc705e98598579fa709dbc1ae685a Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Thu, 22 Aug 2024 11:58:34 +0100 Subject: [PATCH 12/13] Update jobs-server section with more references to Airlock --- docs/jobs-site.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/jobs-site.md b/docs/jobs-site.md index 5a0d808b6..b048ed480 100644 --- a/docs/jobs-site.md +++ b/docs/jobs-site.md @@ -1,4 +1,4 @@ -The [jobs site](https://jobs.opensafely.org/) is where you can run your code on the server against real data, view your analysis outputs on the server and view outputs that have been reviewed, approved and released from the server by our team of output checkers. +The [jobs site](https://jobs.opensafely.org/) is where you can run your code on the server against real data and view outputs that have been reviewed, approved and released from the server by our team of output checkers. ## Jobs site structure @@ -36,7 +36,7 @@ graph TD Once outputs have been produced by running _jobs_ from within a _Workspace_, there are several stages they must go through before being made publicly available: -1. **Outputs on the [Level 4 server](level-4-server.md)**. These are aggregated patient-data (non patient-level data) outputs marked as `moderately_sensitive` in the `project.yaml` file and are only viewable when logged into the Level 4 server. These outputs have to be [reviewed by our output checking team](releasing/output-checking.md) before they can leave the server. +1. **Outputs on the [Level 4 server](level-4-server.md)**. These are aggregated patient-data (non patient-level data) outputs marked as `moderately_sensitive` in the `project.yaml` file and are only viewable (via [Airlock](/using-opensafely/releasing-research-outputs/releasing-with-airlock)) when logged into the Level 4 server. These outputs have to be [reviewed by our output checking team](releasing/output-checking.md) before they can leave the server. 2. **Released outputs**. These are analysis outputs that have been reviewed for any [disclosivity issues](releasing/sdc.md#primary-vs-secondary-disclosure) and released from the Level 4 server by the output checking team to the relevant _Workspace_ on the Jobs site. These are only viewable if you have the correct permissions for the _Project_ the _Workspace_ belongs to. 3. **Draft public outputs**. Released outputs can only be shared with close collaborators of your projects ([refer to the examples of who this could include](https://www.opensafely.org/policies-for-researchers/#all-datasets-sharing)). To be shared more widely, they have to first be approved by NHS England. Once approved, and if you have the correct jobs site permissions, you can create draft public outputs for approval. 4. **Published outputs**. Once approved, draft public outputs are made publicly available to view by anyone through the _Workspace_ they belong to. @@ -118,7 +118,7 @@ What happens: 6. The temporary directory is deleted. -Each job will either succeed or fail. In either case, the output and log files are only visible in the secure environment to avoid disclosure of potentially sensitive information. +Each job will either succeed or fail. In either case, the output and log files are only visible [via Airlock](/using-opensafely/releasing-research-outputs/releasing-with-airlock) in the secure environment to avoid disclosure of potentially sensitive information. ## Viewing analysis outputs on the server From 8785e252cf603a54a1554f5b8f5b61a974510662 Mon Sep 17 00:00:00 2001 From: Becky Smith Date: Thu, 22 Aug 2024 11:58:48 +0100 Subject: [PATCH 13/13] Fix some typos and broken links --- docs/data-access-policy.md | 2 +- docs/releasing/requesting-file-release.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/data-access-policy.md b/docs/data-access-policy.md index 0deb53a9e..c52352fd2 100644 --- a/docs/data-access-policy.md +++ b/docs/data-access-policy.md @@ -113,7 +113,7 @@ While co-piloting activities are largely focused on the first four weeks of trai ### Why A key element of the OpenSAFELY service, and one of the [Five Safes](https://www.bennett.ox.ac.uk/blog/2023/03/the-five-safes-framework-and-applying-it-to-opensafely/) is “safe outputs”. -Statistical disclosure control is applied as necessary to research outputs, which then go through an [output checking process](../releasing/releasing-research-outputs.md) to minimise the risk of disclosure of identifiable information before outputs are released from the secure environment. +Statistical disclosure control is applied as necessary to research outputs, which then go through an [output checking process](../releasing/index.md) to minimise the risk of disclosure of identifiable information before outputs are released from the secure environment. ### What diff --git a/docs/releasing/requesting-file-release.md b/docs/releasing/requesting-file-release.md index 3394e094d..4685d5efa 100644 --- a/docs/releasing/requesting-file-release.md +++ b/docs/releasing/requesting-file-release.md @@ -93,7 +93,7 @@ The maximum file size that can be released is 16MB. Please check your outputs be ### Checklist -Please run through this checklist before submitted a review request. +Please run through this checklist before submitting a review request. 1. Do your results adhere to the [OpenSAFELY permitted study results policy](https://www.opensafely.org/policies-for-researchers/#permitted-study-results-policy) 1. Are all of the outputs of the [allowed file types](#allowed-file-types)?