Integrating OpenSAFELY with a TRE instance #1359
Replies: 3 comments
-
Hi Miguel. First, a correction:
Not quite - we only run CI test runs of research code against randomly generated patient data in Github. Researchers submit the research code (tested locally and in GitHub) to run against the real data via https://jobs.opensafely.org. Secondly, regards anonymization: the various tools like cohort-extractor that OpenSAFELY uses do zero anonymization at all - they assume the data is already pseudonymized. The data we access in our secure backends is provided pre-pseudonymized by our health care partners. No part of our platform ever has access to the un-anonymized data, by design. Our position is that pseudonymization alone is not enough to protect patient privacy. See chapter 3 of the Goldacre Report for more information. The researchers writing analysis code need to take steps to ensure the outputs they with to release are safe to do so. In OpenSAFELY, they do this in their code, as it very much depends on what data they are using and analyses they are performing. To validate this is done correctly, we provide an output-checking service where for every request for release, two trained researchers will review the files checking for disclosivity as part of 5 safes framework. The outputs will only be released if this review passes. Does that answer your question? |
Beta Was this translation helpful? Give feedback.
-
Hi cohortextractor and it's replacement ehrQL are not intended or used for anonymisation. They are used to extract research ready datasets from the raw electronic health record data. Following this extraction, researchers write analytic code (which is all integrated into the analytic pipeline) to produce summary results. These summary results are generated with appropriate disclosure controls applied by the researcher, and these results are then checked manually by independent, trained Output Checkers. |
Beta Was this translation helpful? Give feedback.
-
@bloodearnest @alexwalkerepi many thanks for the clarification. I'm closing this discussion as my questions were more than answered. |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm working on implementing a TRE instance on Azure. We are following Microsoft's AzureTRE implementation. We have our own data for researchers, and they are interested in releasing research results in a safe way. I am not a researcher myself and couldn't be sure about the correct anonymization process.
As far as I could see, OpenSAFELY is using Github actions for running studies. One of the actions available is cohort-extractor, and it seems to fetch data from NHS' data anonymized DB.
So, my questions are:
Best regards,
Beta Was this translation helpful? Give feedback.
All reactions