Merge pull request #1389 from opensafely/replace-cohort-extractor-in-…

…docs Replace cohort extractor in docs
opensafely · Nov 28, 2023 · 40ba0ac · 40ba0ac
2 parents 3fc2ffe + 5f38ab7
commit 40ba0ac
Show file tree

Hide file tree

Showing 21 changed files with 48 additions and 155 deletions.
diff --git a/docs/actions-cohortextractor.md b/docs/actions-cohortextractor.md
diff --git a/docs/actions-scripts.md b/docs/actions-scripts.md
@@ -34,7 +34,7 @@ Scripted actions can read and write output files that are saved in the workspace
 
 Outputs labelled `highly_sensitive` will not be visible to researchers. This is a [deliberate design feature of OpenSAFELY](https://www.opensafely.org/about/), intended to reduce the risk of disclosure of sensitive information. Outputs should **always** be classed as `highly_sensitive` if they are:
 
-- Pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [study definition](study-def.md) or [dataset_definition](https://docs.opensafely.org/ehrql/).
+- Pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [dataset_definition](https://docs.opensafely.org/ehrql/).
 - Pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset with certain filters/formatting applied.
 
 These types of outputs are considered potentially highly-disclosive, should not be pushed to Level 4, and are never intended for publishing outside the secure environment.

diff --git a/docs/contributing.md b/docs/contributing.md
@@ -11,9 +11,6 @@ We welcome proposals for longer topic-based guides. Suggest them as an [issue in
 ## Reusable actions
 These are [units of software that solve a problem for several studies](./actions-reusable.md) without the need to copy-and-paste between them. They can be shared between researchers, even between groups that use different programming languages, and are one of the best ways you can make contributions that benefit the community. If you've written a reusable action you'd like to contribute to the actions library, please get in touch at [[email protected]](mailto:[email protected]).
 
-## Study definition patterns
-[Study definitions](./study-def.md) are intended to be shared under open source licences, and they can be found in the [OpenSAFELY GitHub organisation](https://github.com/opensafely).  We welcome all ideas that may be able to help other members of the community, which we can incorporate in our documentation ([for example on this page about study definition tips](study-def-tricks.md)); we're also happy to help publicise blog posts, or host them as guest blog posts on our website. Just submit a ticket to the [documentation repository](https://github.com/opensafely/documentation/issues) with your suggestion.
-
 ## Peer support
 
 We encourage researchers to post questions in the [Q&A Forum](https://github.com/opensafely/documentation/discussions). We would love more people to chip in and attempt to answer questions!

diff --git a/docs/data-sources/intro.md b/docs/data-sources/intro.md
@@ -1,15 +1,16 @@
-This section provides contextual information on the core primary care EHR systems inside which OpenSAFELY is built (currently TPP and EMIS), as well as all external datasets imported to the secure EHR environment.  To query the data, see the [study definition section](../study-def.md).
+This section provides contextual information on the core primary care EHR systems inside which OpenSAFELY is built (currently TPP and EMIS), as well as all external datasets imported to the secure EHR environment.
+To learn about querying the data, refer to the [documentation on ehrQL](/ehrql/).
 
 View information on [available datasets](index.md).
 
 
 ## What is primary care data?
-The core patient-level data used within OpenSAFELY is based on electronic GP records that are collected and securely stored to facilitate patient management by healthcare providers. They capture symptoms, test results, diagnoses, prescriptions, onward referrals, demographic and social characteristics, and so on. Essentially, everything about a patient that is electronically recorded or accessed by GPs. 
+The core patient-level data used within OpenSAFELY is based on electronic GP records that are collected and securely stored to facilitate patient management by healthcare providers. They capture symptoms, test results, diagnoses, prescriptions, onward referrals, demographic and social characteristics, and so on. Essentially, everything about a patient that is electronically recorded or accessed by GPs.
 
 GP records, or primary care records, can also be used for conducting health research, which is what OpenSAFELY was built for. We've made a video to help explain primary care data in more detail; essential viewing if you're new to this domain.
 
 <div class="video-wrapper">
   <iframe width="1280" height="720" src="https://www.youtube.com/embed/NEwSQ5-dWSg" frameborder="0" allowfullscreen></iframe>
 </div>
 
----8<-- 'includes/glossary.md'
+---8<-- 'includes/glossary.md'
diff --git a/docs/how-to-get-help.md b/docs/how-to-get-help.md
@@ -74,7 +74,7 @@ OpenSAFELY uses GitHub to manage and share code and other platform resources. If
 
 Issues can be submitted for lots of different things &mdash; new variables or other features, bug reports, additional R or Stata packages, documentation updates, and so on.  All our core software packages live in the [`opensafely-core`](https://github.com/opensafely-core/) GitHub organisation.
 
-The most common requests are about library support, and new study definition variables. We have a whole page describing [how to request new libraries](requesting-libraries.md), and another about [how to request new study variables](requesting-variables.md). If you want to report bugs or request features in the `opensafely` command-line tool, you can do so in [its own dedicated issue tracker](https://github.com/opensafely-core/opensafely-cli/issues).
+The most common requests are about library support; this page describes [how to request new libraries](requesting-libraries.md). If you want to report bugs or request features in the `opensafely` command-line tool, you can do so in [its own dedicated issue tracker](https://github.com/opensafely-core/opensafely-cli/issues).
 
 Other than this, you will need to choose the most appropriate repo to submit an issue. If you're not sure where to submit your issue, just ask a question in our [Q&A forum](https://github.com/opensafely/documentation/discussions) and we can point you to the right place.
 

diff --git a/docs/images/c4-container.svg b/docs/images/c4-container.svg
diff --git a/docs/install-python.md b/docs/install-python.md
@@ -2,7 +2,7 @@
     **Please read even if you already have Python installed**
 
 For security, consistency, and readability, OpenSAFELY provides an API built in [**Python**](https://www.python.org/) for using the platform.
-This API includes script-based functions for specifying the patients and variables that make up a study dataset (using a [study definition](study-def.md)),
+This API includes script-based functions for specifying the patients and variables that make up a study dataset (using [ehrQL](/ehrql/)),
 and command line functions for importing codelists, generating dummy data, and testing that the study definition can be run successfully on the server.
 **Python version 3.7 or higher** must be installed on your machine to perform these tasks.
 

diff --git a/docs/opensafely-cli.md b/docs/opensafely-cli.md
@@ -118,7 +118,7 @@ See the [Codelist](codelist-intro.md) section for more information on codelists.
 
 To run your code on your machine, the `opensafely` tool uses the same Docker
 images that run in the secure server environments. There is the
-`cohortextractor` image, for processing study definitions, and then the `r`,
+`ehrql` image, for processing dataset definitions, and then the `r`,
 `stata-mp`, and `python` images, for running your analysis code. These last
 three provide a pre-built environment for their specific language, with
 a fixed set of pre-installed libraries.
@@ -212,15 +212,14 @@ Or alternatively go to File -> shutdown in the JupyterLab tab.
 
 ### `unzip` - unzipping CSV files
 
-For performance and storage reasons on the backend, you must use the
-compressed `csv.gz` output format for cohortextractor output files. However,
-you may need to inspect the raw CSV data. You can easily unzip a CSV file with
+If an action produces a compressed CSV file,
+you can view the raw CSV data by unzipping it with
 
 ```bash
-opensafely unzip outputs/input.csv.gz
+opensafely unzip outputs/dataset.csv.gz
 ```
 
-This will create a decompressed `output/input.csv` file you can view as normal.
+This will create a decompressed `output/dataset.csv` file you can view as normal.
 
 
 ### Managing Resources

diff --git a/docs/requesting-variables.md b/docs/requesting-variables.md
diff --git a/docs/security-levels.md b/docs/security-levels.md
@@ -14,7 +14,7 @@ Data is held within the EHR vendor's secure environment.
 ### Who has access?
 Only data processor staff working for the EHR vendor (as well as approved GP practice staff) have access to the identifiable GP data for the purposes of direct patient care.
 
-A small and restricted number of OpenSAFELY platform developers at the University of Oxford (under contract with NHS England) can also access the pseudonymised GP data for system integration activities. No additional direct access to the pseudonymised GP is available to any other individuals, or OpenSAFELY researchers. 
+A small and restricted number of OpenSAFELY platform developers at the University of Oxford (under contract with NHS England) can also access the pseudonymised GP data for system integration activities. No additional direct access to the pseudonymised GP is available to any other individuals, or OpenSAFELY researchers.
 
 Researchers can query this pseudonymised GP data, but only indirectly: they write their study analysis code away from the source data, in GitHub, and the OpenSAFELY service automates the execution of the study code against the GP data. Only the aggregated results of their study are made available back to the researchers in Level 4 (see below).
 
@@ -50,7 +50,7 @@ Data is held within the EHR vendor's secure environment on the OpenSAFELY server
 Data processor staff working at the EHR vendor and a small and restricted number of OpenSAFELY platform developers. Similar to level 1 above, researchers can query this pseudonymised external data, but only indirectly: they write their study analysis code away from the source data, in GitHub, and the OpenSAFELY service automates the execution of the study code against the external data. Only the aggregated results of their study are made available back to the researchers in Level 4 (see below).
 
 ## Level 3 [NHS England are data controllers of the data]
-At this level data is typically stored as a pseudonymised patient-level (rather than event level) extract. It includes all pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [study definition](study-def.md) or [dataset_definition](https://docs.opensafely.org/ehrql/). It also includes all of the pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset where certain filters/formatting have been applied.
+At this level data is typically stored as a pseudonymised patient-level (rather than event level) extract. It includes all pseudonymised patient-level outputs derived from queries run against Level 1 and 2 data, i.e., a specific study dataset generated by a [dataset_definition](https://docs.opensafely.org/ehrql/). It also includes all of the pseudonymised patient-level intermediate outputs for a study derived from queries run against Level 3 data which output pseudonymised patient-level data i.e., a processed study dataset where certain filters/formatting have been applied.
 
 As the data stored at this level is still patient-level, access to this level is restricted to a small number of OpenSAFELY staff to allow data quality assessment and debugging problems.
 
@@ -74,7 +74,7 @@ Data is held within the EHR vendor's secure environment on a specific server, se
 Anyone with Level 2/3 access. In addition, researchers who have an NHS England approved study, and who have signed a Data Access Agreement relevant to level 4 access for the purposes of checking and redacting their aggregated study outputs prior to release.
 
 ## Unrestricted data
-Any level 4 files that have been cleared by output-checkers, and therefore considered to have negligible disclosure risk, can be released. 
+Any level 4 files that have been cleared by output-checkers, and therefore considered to have negligible disclosure risk, can be released.
 
 ## Diagram
 

diff --git a/docs/study-def-codelists.md b/docs/study-def-codelists.md
@@ -1,3 +1,5 @@
+---8<-- 'includes/cohort-extractor-deprecated.md'
+
 A *Codelist* is a collection of clinical codes that classifies patients as having certain conditions or demographic properties. For example, in an clinical system, an asthma diagnosis may be indicated by [any of more than 100 codes](https://www.opencodelists.org/codelist/primis-covid19-vacc-uptake/ast/v1/#full-list).
 
 Codelists must be stored as data within your study repository, from where they can be used in your study definition.

diff --git a/docs/study-def-dates.md b/docs/study-def-dates.md
@@ -1,3 +1,5 @@
+---8<-- 'includes/cohort-extractor-deprecated.md'
+
 In study definitions, dates are described in `"YYYY-MM-DD"` format; so, for example, the 3rd May 1995 would be written `"1995-05-03"`.
 
 

diff --git a/docs/study-def-expectations.md b/docs/study-def-expectations.md
@@ -1,3 +1,5 @@
+---8<-- 'includes/cohort-extractor-deprecated.md'
+
 Because OpenSAFELY doesn't allow direct access to individual patient records, researchers must use *dummy data* for developing their analytic code on their own computer.
 
 OpenSAFELY requires you to define *expectations* in your study definition: these describe the properties of each variable, and are used to generate random data that match the expectations.

diff --git a/docs/study-def-flowcharts.md b/docs/study-def-flowcharts.md
@@ -1,3 +1,5 @@
+---8<-- 'includes/cohort-extractor-deprecated.md'
+
 ## Flowcharts (temporary workaround)
 
 Many studies will require a flowchart to show inclusion/exclusion of patients in the study. Eventually the numbers of patients excluded/included will be summarised automatically following cohort extract, but for now, a slightly manual approach is required:
Original file line number	Diff line number	Diff line change
		@@ -1,3 +1,5 @@
		---8<-- 'includes/cohort-extractor-deprecated.md'

		In study definitions, dates are described in `"YYYY-MM-DD"` format; so, for example, the 3rd May 1995 would be written `"1995-05-03"`.


Expand Down