Skip to content

Commit

Permalink
Merge pull request #1389 from opensafely/replace-cohort-extractor-in-…
Browse files Browse the repository at this point in the history
…docs

Replace cohort extractor in docs
  • Loading branch information
inglesp authored Nov 28, 2023
2 parents 3fc2ffe + 5f38ab7 commit 40ba0ac
Show file tree
Hide file tree
Showing 21 changed files with 48 additions and 155 deletions.
90 changes: 0 additions & 90 deletions docs/actions-cohortextractor.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/actions-scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Scripted actions can read and write output files that are saved in the workspace

Outputs labelled `highly_sensitive` will not be visible to researchers. This is a [deliberate design feature of OpenSAFELY](https://www.opensafely.org/about/), intended to reduce the risk of disclosure of sensitive information. Outputs should **always** be classed as `highly_sensitive` if they are:

- Pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [study definition](study-def.md) or [dataset_definition](https://docs.opensafely.org/ehrql/).
- Pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [dataset_definition](https://docs.opensafely.org/ehrql/).
- Pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset with certain filters/formatting applied.

These types of outputs are considered potentially highly-disclosive, should not be pushed to Level 4, and are never intended for publishing outside the secure environment.
Expand Down
3 changes: 0 additions & 3 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,6 @@ We welcome proposals for longer topic-based guides. Suggest them as an [issue in
## Reusable actions
These are [units of software that solve a problem for several studies](./actions-reusable.md) without the need to copy-and-paste between them. They can be shared between researchers, even between groups that use different programming languages, and are one of the best ways you can make contributions that benefit the community. If you've written a reusable action you'd like to contribute to the actions library, please get in touch at [[email protected]](mailto:[email protected]).

## Study definition patterns
[Study definitions](./study-def.md) are intended to be shared under open source licences, and they can be found in the [OpenSAFELY GitHub organisation](https://github.com/opensafely). We welcome all ideas that may be able to help other members of the community, which we can incorporate in our documentation ([for example on this page about study definition tips](study-def-tricks.md)); we're also happy to help publicise blog posts, or host them as guest blog posts on our website. Just submit a ticket to the [documentation repository](https://github.com/opensafely/documentation/issues) with your suggestion.

## Peer support

We encourage researchers to post questions in the [Q&A Forum](https://github.com/opensafely/documentation/discussions). We would love more people to chip in and attempt to answer questions!
Expand Down
7 changes: 4 additions & 3 deletions docs/data-sources/intro.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
This section provides contextual information on the core primary care EHR systems inside which OpenSAFELY is built (currently TPP and EMIS), as well as all external datasets imported to the secure EHR environment. To query the data, see the [study definition section](../study-def.md).
This section provides contextual information on the core primary care EHR systems inside which OpenSAFELY is built (currently TPP and EMIS), as well as all external datasets imported to the secure EHR environment.
To learn about querying the data, refer to the [documentation on ehrQL](/ehrql/).

View information on [available datasets](index.md).


## What is primary care data?
The core patient-level data used within OpenSAFELY is based on electronic GP records that are collected and securely stored to facilitate patient management by healthcare providers. They capture symptoms, test results, diagnoses, prescriptions, onward referrals, demographic and social characteristics, and so on. Essentially, everything about a patient that is electronically recorded or accessed by GPs.
The core patient-level data used within OpenSAFELY is based on electronic GP records that are collected and securely stored to facilitate patient management by healthcare providers. They capture symptoms, test results, diagnoses, prescriptions, onward referrals, demographic and social characteristics, and so on. Essentially, everything about a patient that is electronically recorded or accessed by GPs.

GP records, or primary care records, can also be used for conducting health research, which is what OpenSAFELY was built for. We've made a video to help explain primary care data in more detail; essential viewing if you're new to this domain.

<div class="video-wrapper">
<iframe width="1280" height="720" src="https://www.youtube.com/embed/NEwSQ5-dWSg" frameborder="0" allowfullscreen></iframe>
</div>

---8<-- 'includes/glossary.md'
---8<-- 'includes/glossary.md'
2 changes: 1 addition & 1 deletion docs/how-to-get-help.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ OpenSAFELY uses GitHub to manage and share code and other platform resources. If

Issues can be submitted for lots of different things &mdash; new variables or other features, bug reports, additional R or Stata packages, documentation updates, and so on. All our core software packages live in the [`opensafely-core`](https://github.com/opensafely-core/) GitHub organisation.

The most common requests are about library support, and new study definition variables. We have a whole page describing [how to request new libraries](requesting-libraries.md), and another about [how to request new study variables](requesting-variables.md). If you want to report bugs or request features in the `opensafely` command-line tool, you can do so in [its own dedicated issue tracker](https://github.com/opensafely-core/opensafely-cli/issues).
The most common requests are about library support; this page describes [how to request new libraries](requesting-libraries.md). If you want to report bugs or request features in the `opensafely` command-line tool, you can do so in [its own dedicated issue tracker](https://github.com/opensafely-core/opensafely-cli/issues).

Other than this, you will need to choose the most appropriate repo to submit an issue. If you're not sure where to submit your issue, just ask a question in our [Q&A forum](https://github.com/opensafely/documentation/discussions) and we can point you to the right place.

Expand Down
2 changes: 1 addition & 1 deletion docs/images/c4-container.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/install-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
**Please read even if you already have Python installed**

For security, consistency, and readability, OpenSAFELY provides an API built in [**Python**](https://www.python.org/) for using the platform.
This API includes script-based functions for specifying the patients and variables that make up a study dataset (using a [study definition](study-def.md)),
This API includes script-based functions for specifying the patients and variables that make up a study dataset (using [ehrQL](/ehrql/)),
and command line functions for importing codelists, generating dummy data, and testing that the study definition can be run successfully on the server.
**Python version 3.7 or higher** must be installed on your machine to perform these tasks.

Expand Down
11 changes: 5 additions & 6 deletions docs/opensafely-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ See the [Codelist](codelist-intro.md) section for more information on codelists.

To run your code on your machine, the `opensafely` tool uses the same Docker
images that run in the secure server environments. There is the
`cohortextractor` image, for processing study definitions, and then the `r`,
`ehrql` image, for processing dataset definitions, and then the `r`,
`stata-mp`, and `python` images, for running your analysis code. These last
three provide a pre-built environment for their specific language, with
a fixed set of pre-installed libraries.
Expand Down Expand Up @@ -212,15 +212,14 @@ Or alternatively go to File -> shutdown in the JupyterLab tab.

### `unzip` - unzipping CSV files

For performance and storage reasons on the backend, you must use the
compressed `csv.gz` output format for cohortextractor output files. However,
you may need to inspect the raw CSV data. You can easily unzip a CSV file with
If an action produces a compressed CSV file,
you can view the raw CSV data by unzipping it with

```bash
opensafely unzip outputs/input.csv.gz
opensafely unzip outputs/dataset.csv.gz
```

This will create a decompressed `output/input.csv` file you can view as normal.
This will create a decompressed `output/dataset.csv` file you can view as normal.


### Managing Resources
Expand Down
22 changes: 0 additions & 22 deletions docs/requesting-variables.md

This file was deleted.

6 changes: 3 additions & 3 deletions docs/security-levels.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Data is held within the EHR vendor's secure environment.
### Who has access?
Only data processor staff working for the EHR vendor (as well as approved GP practice staff) have access to the identifiable GP data for the purposes of direct patient care.

A small and restricted number of OpenSAFELY platform developers at the University of Oxford (under contract with NHS England) can also access the pseudonymised GP data for system integration activities. No additional direct access to the pseudonymised GP is available to any other individuals, or OpenSAFELY researchers.
A small and restricted number of OpenSAFELY platform developers at the University of Oxford (under contract with NHS England) can also access the pseudonymised GP data for system integration activities. No additional direct access to the pseudonymised GP is available to any other individuals, or OpenSAFELY researchers.

Researchers can query this pseudonymised GP data, but only indirectly: they write their study analysis code away from the source data, in GitHub, and the OpenSAFELY service automates the execution of the study code against the GP data. Only the aggregated results of their study are made available back to the researchers in Level 4 (see below).

Expand Down Expand Up @@ -50,7 +50,7 @@ Data is held within the EHR vendor's secure environment on the OpenSAFELY server
Data processor staff working at the EHR vendor and a small and restricted number of OpenSAFELY platform developers. Similar to level 1 above, researchers can query this pseudonymised external data, but only indirectly: they write their study analysis code away from the source data, in GitHub, and the OpenSAFELY service automates the execution of the study code against the external data. Only the aggregated results of their study are made available back to the researchers in Level 4 (see below).

## Level 3 [NHS England are data controllers of the data]
At this level data is typically stored as a pseudonymised patient-level (rather than event level) extract. It includes all pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [study definition](study-def.md) or [dataset_definition](https://docs.opensafely.org/ehrql/). It also includes all of the pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset where certain filters/formatting have been applied.
At this level data is typically stored as a pseudonymised patient-level (rather than event level) extract. It includes all pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [dataset_definition](https://docs.opensafely.org/ehrql/). It also includes all of the pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset where certain filters/formatting have been applied.

As the data stored at this level is still patient-level, access to this level is restricted to a small number of OpenSAFELY staff to allow data quality assessment and debugging problems.

Expand All @@ -74,7 +74,7 @@ Data is held within the EHR vendor's secure environment on a specific server, se
Anyone with Level 2/3 access. In addition, researchers who have an NHS England approved study, and who have signed a Data Access Agreement relevant to level 4 access for the purposes of checking and redacting their aggregated study outputs prior to release.

## Unrestricted data
Any level 4 files that have been cleared by output-checkers, and therefore considered to have negligible disclosure risk, can be released.
Any level 4 files that have been cleared by output-checkers, and therefore considered to have negligible disclosure risk, can be released.

## Diagram

Expand Down
2 changes: 2 additions & 0 deletions docs/study-def-codelists.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
---8<-- 'includes/cohort-extractor-deprecated.md'

A *Codelist* is a collection of clinical codes that classifies patients as having certain conditions or demographic properties. For example, in an clinical system, an asthma diagnosis may be indicated by [any of more than 100 codes](https://www.opencodelists.org/codelist/primis-covid19-vacc-uptake/ast/v1/#full-list).

Codelists must be stored as data within your study repository, from where they can be used in your study definition.
Expand Down
2 changes: 2 additions & 0 deletions docs/study-def-dates.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
---8<-- 'includes/cohort-extractor-deprecated.md'

In study definitions, dates are described in `"YYYY-MM-DD"` format; so, for example, the 3rd May 1995 would be written `"1995-05-03"`.


Expand Down
2 changes: 2 additions & 0 deletions docs/study-def-expectations.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
---8<-- 'includes/cohort-extractor-deprecated.md'

Because OpenSAFELY doesn't allow direct access to individual patient records, researchers must use *dummy data* for developing their analytic code on their own computer.

OpenSAFELY requires you to define *expectations* in your study definition: these describe the properties of each variable, and are used to generate random data that match the expectations.
Expand Down
2 changes: 2 additions & 0 deletions docs/study-def-flowcharts.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
---8<-- 'includes/cohort-extractor-deprecated.md'

## Flowcharts (temporary workaround)

Many studies will require a flowchart to show inclusion/exclusion of patients in the study. Eventually the numbers of patients excluded/included will be summarised automatically following cohort extract, but for now, a slightly manual approach is required:
Expand Down
Loading

0 comments on commit 40ba0ac

Please sign in to comment.