Integrating OpenSAFELY with a TRE instance #1359

migldasilva · 2023-10-23T14:13:56Z

migldasilva
Oct 23, 2023

Hi, I'm working on implementing a TRE instance on Azure. We are following Microsoft's AzureTRE implementation. We have our own data for researchers, and they are interested in releasing research results in a safe way. I am not a researcher myself and couldn't be sure about the correct anonymization process.

As far as I could see, OpenSAFELY is using Github actions for running studies. One of the actions available is cohort-extractor, and it seems to fetch data from NHS' data anonymized DB.

So, my questions are:

Is it possible to use OpenSAFELY for only data anonymization? Here only is written given that the researchers are going to use other tools for researching, and such tools wouldn't be integrated in Github pipelines.
In a broader perspective, is `cohort-extraction the right tool for data anonymization?

Best regards,

bloodearnest · 2023-10-23T14:32:03Z

bloodearnest
Oct 23, 2023
Maintainer

Hi Miguel.

First, a correction:

As far as I could see, OpenSAFELY is using Github actions for running studies.

Not quite - we only run CI test runs of research code against randomly generated patient data in Github. Researchers submit the research code (tested locally and in GitHub) to run against the real data via https://jobs.opensafely.org.

Secondly, regards anonymization: the various tools like cohort-extractor that OpenSAFELY uses do zero anonymization at all - they assume the data is already pseudonymized. The data we access in our secure backends is provided pre-pseudonymized by our health care partners. No part of our platform ever has access to the un-anonymized data, by design.

Our position is that pseudonymization alone is not enough to protect patient privacy. See chapter 3 of the Goldacre Report for more information.

The researchers writing analysis code need to take steps to ensure the outputs they with to release are safe to do so. In OpenSAFELY, they do this in their code, as it very much depends on what data they are using and analyses they are performing.

To validate this is done correctly, we provide an output-checking service where for every request for release, two trained researchers will review the files checking for disclosivity as part of 5 safes framework. The outputs will only be released if this review passes.

Does that answer your question?

0 replies

alexwalkerepi · 2023-10-23T14:33:00Z

alexwalkerepi
Oct 23, 2023
Maintainer

Hi cohortextractor and it's replacement ehrQL are not intended or used for anonymisation. They are used to extract research ready datasets from the raw electronic health record data.

Following this extraction, researchers write analytic code (which is all integrated into the analytic pipeline) to produce summary results. These summary results are generated with appropriate disclosure controls applied by the researcher, and these results are then checked manually by independent, trained Output Checkers.

0 replies

migldasilva · 2023-10-23T14:51:53Z

migldasilva
Oct 23, 2023
Author

@bloodearnest @alexwalkerepi many thanks for the clarification. I'm closing this discussion as my questions were more than answered.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating OpenSAFELY with a TRE instance #1359

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Integrating OpenSAFELY with a TRE instance #1359

migldasilva Oct 23, 2023

Replies: 3 comments

bloodearnest Oct 23, 2023 Maintainer

alexwalkerepi Oct 23, 2023 Maintainer

migldasilva Oct 23, 2023 Author

migldasilva
Oct 23, 2023

bloodearnest
Oct 23, 2023
Maintainer

alexwalkerepi
Oct 23, 2023
Maintainer

migldasilva
Oct 23, 2023
Author