From 429294d1895a803cf0093b7f75b5377e0eeb3c10 Mon Sep 17 00:00:00 2001 From: Louis Fisher Date: Thu, 11 Jan 2024 12:57:17 +0000 Subject: [PATCH 1/3] Add section on most common problems with outputs --- docs/releasing-files.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/docs/releasing-files.md b/docs/releasing-files.md index bbe39eb81..e274ddb0b 100644 --- a/docs/releasing-files.md +++ b/docs/releasing-files.md @@ -221,7 +221,7 @@ When you are ready to request a release of your aggregated results please [compl !!! note Each data release entails substantial review work. To retain rapid turnaround times, external data releases should typically only be of results for final submission to a journal or public notebook; or a small number of necessary releases for discussion with external collaborators. -For each output wishing to be released you will need to provide a clear contextual description including: +For each output wishing to be released you will need to provide a clear contextual description including: 1. The file path for each output 2. Variable descriptions @@ -287,7 +287,6 @@ Please run through this checklist before making a review request. 4. Have you [redacted any low counts](#redacting-counts-less-than-or-equal-to-7)? 5. Have you [rounded any counts](#rounding-counts) (including [counts underlying rates](#rounding-rates))? 6. Have you supplied underlying counts for all of your results? -7. Are all of the outputs clearly described? * Is the filename sensible and is the filepath provided in the request form correct? * Have you provided all of the context needed to review each output in isolation in the request form? * Have you described the disclosure controls you have applied to each output? @@ -318,6 +317,24 @@ Before any files are released from the secure server, they are checked independe * **Reject** — output is not an acceptable type for release. An example is the release of practice level data which does not meet the [permitted study results policy](https://www.opensafely.org/policies-for-researchers/#permitted-study-results-policy) Once reviewed, the completed review request will be emailed back to you. If all outputs are approved, they will then be released. If one or more outputs are approved subject to change, you will need to address the disclosure issues and submit a new review form detailing the changes you have made. + +### Most common problems with output review requests + +Below are the most common problems encountered by output checkers when reviewing output review requests. **Avoiding these issues makes it more likely your files can be released first time round**, saving reviewer time and allowing quicker file release for you and other researchers. + +1. **There are unrounded counts in the outputs**. All counts should be [rounded](#rounding-counts). This includes rounding counts prior to them being used to calculate further statistics, such as percentages or odds ratios. Commonly raw counts are rounded, but downstream statistics are calculated using the raw counts rather than the rounded counts. Unrounded counts account for **~30%** of rejections. +2. **Insufficicent context is provided for the outputs**. **~25%** of rejected outputs are due to insufficient context. Make sure you have provided all of the context needed to review each output in isolation in the request form. Common errors include: + * Stating the incorrect file path. You should check all file paths point to the relevant files within your `release` folder before making a request. + * Files included in the review form being missing from the `review` folder. + * Using unclear column/variable names or poorly describing the presented data. See [here](#context-requirements) for more details on the context requirements. + * Not clearly indicating the relationship between different outputs. + * Where an output has previously been requests, not indicating how the output differs to previously reviewed version. +3. **There are unredacted counts in the outputs**. Prior to rounding counts, [any counts <=7 should be redacted](#redacting-counts-less-than-or-equal-to-7). The redaction approach should be clearly described when making a review request. It is not uncommon for the stated redaction approach to be improperly implemented in the outputs. Inappropriate redaction of low counts accounts for **~20%** or rejected outputs. +4. **Underlying data is not provided**. To ensure the low number threshold is met, reviewers require to see the underlying data for each output. This includes the data used to generate figures and to calculate summary statistics such as mean or median. **~10%** or rejected outputs are due to underlying data not being provided. +5. **Unsupported file types being requested**. Files requested for release should be one of the [allowed file types](#allowed-file-types). If you are requesting the release of HTML files, please make sure you have followed the [guidance for HTML files](#allowed-file-types). **~10%** of rejected outputs are due to unsupported file types being requested. + +To help avoid these issues, please make sure you have read the [checklist](#checklist) before submitting your review request. + ## 4. Release of reviewed files All approved OpenSAFELY outputs are released to the workspace they belong to on the [Jobs site](jobs-site.md). From f74fd2bd463f1b38a53acea30dd0b7d138eb9438 Mon Sep 17 00:00:00 2001 From: Louis Fisher Date: Thu, 11 Jan 2024 15:57:44 +0000 Subject: [PATCH 2/3] reinstate point 7 on checklist --- docs/releasing-files.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/releasing-files.md b/docs/releasing-files.md index e274ddb0b..fcb5be705 100644 --- a/docs/releasing-files.md +++ b/docs/releasing-files.md @@ -287,6 +287,7 @@ Please run through this checklist before making a review request. 4. Have you [redacted any low counts](#redacting-counts-less-than-or-equal-to-7)? 5. Have you [rounded any counts](#rounding-counts) (including [counts underlying rates](#rounding-rates))? 6. Have you supplied underlying counts for all of your results? +7. Are all of the outputs clearly described? * Is the filename sensible and is the filepath provided in the request form correct? * Have you provided all of the context needed to review each output in isolation in the request form? * Have you described the disclosure controls you have applied to each output? From a84dba5ff392659065fe814cefd7280b10dd7c64 Mon Sep 17 00:00:00 2001 From: Louis Fisher Date: Thu, 11 Jan 2024 15:58:27 +0000 Subject: [PATCH 3/3] fix typos --- docs/releasing-files.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/releasing-files.md b/docs/releasing-files.md index fcb5be705..a4d1f0b1d 100644 --- a/docs/releasing-files.md +++ b/docs/releasing-files.md @@ -330,8 +330,8 @@ Below are the most common problems encountered by output checkers when reviewing * Using unclear column/variable names or poorly describing the presented data. See [here](#context-requirements) for more details on the context requirements. * Not clearly indicating the relationship between different outputs. * Where an output has previously been requests, not indicating how the output differs to previously reviewed version. -3. **There are unredacted counts in the outputs**. Prior to rounding counts, [any counts <=7 should be redacted](#redacting-counts-less-than-or-equal-to-7). The redaction approach should be clearly described when making a review request. It is not uncommon for the stated redaction approach to be improperly implemented in the outputs. Inappropriate redaction of low counts accounts for **~20%** or rejected outputs. -4. **Underlying data is not provided**. To ensure the low number threshold is met, reviewers require to see the underlying data for each output. This includes the data used to generate figures and to calculate summary statistics such as mean or median. **~10%** or rejected outputs are due to underlying data not being provided. +3. **There are unredacted counts in the outputs**. Prior to rounding counts, [any counts <=7 should be redacted](#redacting-counts-less-than-or-equal-to-7). The redaction approach should be clearly described when making a review request. It is not uncommon for the stated redaction approach to be improperly implemented in the outputs. Inappropriate redaction of low counts accounts for **~20%** of rejected outputs. +4. **Underlying data is not provided**. To ensure the low number threshold is met, reviewers require to see the underlying data for each output. This includes the data used to generate figures and to calculate summary statistics such as mean or median. **~10%** of rejected outputs are due to underlying data not being provided. 5. **Unsupported file types being requested**. Files requested for release should be one of the [allowed file types](#allowed-file-types). If you are requesting the release of HTML files, please make sure you have followed the [guidance for HTML files](#allowed-file-types). **~10%** of rejected outputs are due to unsupported file types being requested. To help avoid these issues, please make sure you have read the [checklist](#checklist) before submitting your review request.