From 126537a37a024d51af2f840d55e452a9f3b99e0e Mon Sep 17 00:00:00 2001 From: Andrea Schaffer <37759997+alschaffer@users.noreply.github.com> Date: Fri, 4 Oct 2024 13:32:42 +0100 Subject: [PATCH 1/6] Update workflow.md --- docs/workflow.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/workflow.md b/docs/workflow.md index d12f924a..659ca030 100644 --- a/docs/workflow.md +++ b/docs/workflow.md @@ -10,10 +10,10 @@ This repo will contain all the code relating to your project, and a history of i 2. **Write a [dataset definition](/ehrql/)** that specifies what data you want to extract from the database: - specify the patient population (dataset rows) and variables (dataset columns) - specify the expected distributions of these variables for use in dummy data - - specify (or create) the [codelists](codelist-intro.md) required by the study definition, hosted by [OpenCodelists](https://www.opencodelists.org), and import them to the repo. + - specify (or create) the [codelists](codelist-intro.md) required by the dataset definition, hosted by [OpenCodelists](https://www.opencodelists.org), and import them to the repo. 3. **Generate [dummy data](/ehrql/how-to/dummy-data)** based on the dataset definition, for writing and testing code. 4. **Develop analysis scripts** using the dummy data in R, Stata, or Python. This will include: - - importing and processing the dataset(s) created by the cohort extractor + - importing and processing the dataset(s) created by the [dataset definition](/ehrql/) - importing any other external files needed for analysis - generating analysis outputs like tables and figures - generating log files to debug the scripts when they run on the real data. @@ -29,4 +29,4 @@ It is possible to automatically test that the analytical pipeline defined in ste This pipeline is also [automatically tested](actions-pipelines.md#running-your-code-with-github-actions) against dummy data every time a new version of the study repository is saved ("pushed") to GitHub. As well as your own Python, R or Stata scripts, other non-standard actions are available. -For example, it's possible to run a matching routine that extracts a matched control population to the population defined in the study definition, without having to extract all candidate matches into a dataset first. +For example, it's possible to run a matching routine that extracts a matched control population to the population defined in the dataset definition, without having to extract all candidate matches into a dataset first. From 25227d0745b31cdc9547fa708d8ba35eba9a4623 Mon Sep 17 00:00:00 2001 From: Andrea Schaffer <37759997+alschaffer@users.noreply.github.com> Date: Fri, 4 Oct 2024 13:35:48 +0100 Subject: [PATCH 2/6] Update repositories.md --- docs/repositories.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/repositories.md b/docs/repositories.md index fe3c0aac..ff3cc5a1 100644 --- a/docs/repositories.md +++ b/docs/repositories.md @@ -74,7 +74,7 @@ This is an important folder, used internally by GitHub, that you can happily ign By convention, this folder contains: -* Any `study_definition.py` script that defines the study definition +* Any `dataset_definition.py` script that defines the dataset definition * Analysis scripts in R, Python or Stata ### `codelists/` From d2816f976b95e925148717a983fa2041a773d820 Mon Sep 17 00:00:00 2001 From: Andrea Schaffer <37759997+alschaffer@users.noreply.github.com> Date: Fri, 4 Oct 2024 13:39:25 +0100 Subject: [PATCH 3/6] Update codelist-intro.md --- docs/codelist-intro.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/codelist-intro.md b/docs/codelist-intro.md index c2a6cb6f..11d08745 100644 --- a/docs/codelist-intro.md +++ b/docs/codelist-intro.md @@ -19,8 +19,8 @@ that to find all the patients with Type 1 diabetes, you may have to search for We built a system for building, reviewing and maintaining codelists at [OpenCodelists](https://www.opencodelists.org/). We've made an introductory video to help explain OpenCodelists in more detail. Codelists -that are hosted on this website can be used directly in the Study Definition. This means -there is no need to download or alter these codelists in the study definition, and +that are hosted on this website can be used directly in the Dataset Definition. This means +there is no need to download or alter these codelists in the dataset definition, and they can be reused.