Skip to content

Commit

Permalink
done
Browse files Browse the repository at this point in the history
  • Loading branch information
3mmaRand committed Oct 2, 2024
1 parent ad55f5e commit 0765613
Show file tree
Hide file tree
Showing 2 changed files with 145 additions and 44 deletions.
188 changes: 145 additions & 43 deletions core/week-2/workshop.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,47 +25,112 @@ engine: knitr
In this workshop we will discuss why reproducibility matters and how to
organise your work to make it reproducible. We will cover:

What is reproducibility
How to achieve reproducibility
Rationale for scripting
Project-oriented workflow
::: {style="font-size: 70%;"}

- What is reproducibility
- How to achieve reproducibility
- Rationale for scripting
- Project-oriented workflow
- Code formatting and style
- Coding algorithmically
- Naming things
- And some handy workflow tips

:::

# Slide navigation


# Reproducibility


## What is reproducibility?

- **Reproducible: Same data + same analysis = identical results**.
*"... obtaining consistent results using the same input data;
computational steps, methods, and code; and conditions of analysis.
This definition is synonymous with"computational reproducibility"*
[@nationalacademiesofsciences2019]
[![The Turing Way\'s definitions of reproducible research
](images/reproducible-matrix.jpg){fig-alt="Two by Two cell matrix. Columns are Data, either same or different. Rows are Analysis either same or different. Each of cells contain one of the definitions for reproducibility"}](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions#rr-overview-definitions)

## Definitions


- Replicable: Different data + same analysis = qualitatively similar
results. The work is not dependent on the specificities of the data.
[![The Turing Way\'s definitions of reproducible research
](images/reproducible-matrix.jpg){fig-alt="Two by Two cell matrix. Columns are Data, either same or different. Rows are Analysis either same or different. Each of cells contain one of the definitions for reproducibility"}](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions#rr-overview-definitions)

- Robust: Same data + different analysis = qualitatively similar or
identical results. The work is not dependent on the specificities of
the analysis.

- Generalisable: Different data + different analysis = qualitatively
similar results and same conclusions.
::: {style="font-size: 70%;"}

**Reproducible: Same data + same analysis = identical results**.
*"... obtaining consistent results using the same input data;
computational steps, methods, and code; and conditions of analysis.
This definition is synonymous with"computational reproducibility"*
[@nationalacademiesofsciences2019]. This is what we are concentrating
on in the Supporting Information.


:::


## Definitions

## What is reproducibility?

[![The Turing Way\'s definitions of reproducible research
](images/reproducible-matrix.jpg){fig-alt="Two by Two cell matrix. Columns are Data, either same or different. Rows are Analysis either same or different. Each of cells contain one of the definitions for reproducibility"}](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions#rr-overview-definitions)


::: {style="font-size: 70%;"}


Replicable: Different data + same analysis = qualitatively similar
results. The work is not dependent on the specificities of the data.


:::

## Definitions


[![The Turing Way\'s definitions of reproducible research
](images/reproducible-matrix.jpg){fig-alt="Two by Two cell matrix. Columns are Data, either same or different. Rows are Analysis either same or different. Each of cells contain one of the definitions for reproducibility"}](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions#rr-overview-definitions)


::: {style="font-size: 70%;"}


Robust: Same data + different analysis = qualitatively similar or
identical results. The work is not dependent on the specificities of
the analysis.

:::

## Definitions


[![The Turing Way\'s definitions of reproducible research
](images/reproducible-matrix.jpg){fig-alt="Two by Two cell matrix. Columns are Data, either same or different. Rows are Analysis either same or different. Each of cells contain one of the definitions for reproducibility"}](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions#rr-overview-definitions)


::: {style="font-size: 70%;"}

Generalisable: Different data + different analysis = qualitatively
similar results and same conclusions.

:::





## Why does it matter?

::: incremental

- Five selfish reasons to work reproducibly [@markowetz2015].
Alternatively, see the very entertaining
[talk](https://youtu.be/yVT07Sukv9Q)

- Many high profile cases of work which did not reproduce e.g. Anil
Potti unravelled by @baggerly2009

- Five selfish reasons to work reproducibly [@markowetz2015].
Alternatively, see the very entertaining
[talk](https://youtu.be/yVT07Sukv9Q)

- **Will** become standard in Science and publishing e.g OECD Global
Science Forum Building digital workforce capacity and skills for
data-intensive science [@oecdglobalscienceforum2020]
Expand All @@ -85,8 +150,10 @@ Project-oriented workflow
- Code: follow a consistent style, organise into sections and scripts
(be modular), Code algorithmically

- Documentation: Readme files, code comments, metadata, version
control, continuous integration
- Documentation: Readme files, code comments, metadata,

- More advanced: version, control, continuous integration and testing
(not required for SI)

# Scripting

Expand Down Expand Up @@ -119,6 +186,7 @@ Project-oriented workflow
## Example: SI itself is an RSP

```{bash}
#| eval: false
-- stem_cell_rna
|__stem_cell_rna.Rproj
|__raw_ data/
Expand All @@ -138,7 +206,7 @@ Project-oriented workflow
## Example: SI includes an RSP

```{bash}
#| eval: false
-- stem_cell_rna
|__data_processing/
|__01_data_processing.py
Expand Down Expand Up @@ -194,15 +262,11 @@ Project-oriented workflow
:::

::: {.column width="40%"}
The project directory is the folder at the top [^1]
The project directory is the folder at the top
:::
:::

[^1]: Thanks to [Mine
Çetinkaya-Rundel](https://mastodon.social/@[email protected]) who
helped me work out how to highlight a line
<https://gist.github.com/mine-cetinkaya-rundel/3af3415eab70a65be3791c3dcff6e2e3>.
Note to futureself: the `engine: knitr` matters.


## RStudio Projects

Expand All @@ -226,10 +290,18 @@ The project directory is the folder at the top [^1]
:::

::: {.column width="40%"}
the `.RProj` file is directly under the project folder. Its presence is what makes the folder an RStudio Project
the `.RProj` file is directly under the project folder[^1]. Its presence is
what makes the folder an RStudio Project
:::
:::

[^1]: Thanks to [Mine
Çetinkaya-Rundel](https://mastodon.social/@[email protected]) who
helped me work out how to highlight a line
<https://gist.github.com/mine-cetinkaya-rundel/3af3415eab70a65be3791c3dcff6e2e3>.
Note to futureself: the `engine: knitr` matters.


## RStudio Projects

::: incremental
Expand Down Expand Up @@ -263,13 +335,16 @@ There are two menus options:

They both do the same thing.

In both cases you choose: New Project \| New Directory \| New Project
## Creating an RStudio Project

Then Choose: New Project \| New Directory \| New Project

Make sure you "Browse" to the folder you want to create the project.


❔ Is your working directory a good place to create a Project folder?

## Creating an RStudio Project

When you create a new RStudio Project

Expand All @@ -285,15 +360,15 @@ When you create a new RStudio Project

## Opening and closing

You can close an RStudio Project with ONE of:
You can **close** an RStudio Project with ONE of:

1. File \| Close Project
2. Using the drop-down option on the far right of the tool bar where
you see the Project name

. . .
## Opening and closing

You can open an RStudio Project with ONE of:
You can **open** an RStudio Project with ONE of:

1. File \| Open Project or File \| Recent Projects\
2. Using the drop-down option on the far right of the tool bar where
Expand All @@ -309,10 +384,16 @@ When you open project, a new R session starts.

## Code formatting and style

> "Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread."
> "Good coding style is like correct punctuation: you can manage without it butitsuremakesthingseasiertoread."
[The tidyverse style guide](https://style.tidyverse.org/)

. . .

Code is not write only.

Code is communication!

## Code formatting and style

We have all written code which is hard to read!
Expand All @@ -330,19 +411,22 @@ tweetrmd::include_tweet("https://twitter.com/hadleywickham/status/58906868766924

Some keys points:

::: {style="font-size: 70%;"}

- be consistent, emulate experienced coders
- use snake_case for variable names (not CamelCase, dot.case)
- use `<-` not `=` for assignment
- use `<-` (not `=`) for assignment
- use spacing around most operators and after commas
- use indentation
- avoid long lines, break up code blocks with new lines
- use `"` for quoting text (not `'`) unless the text contains double quotes


- use `"` for quoting text (not `'`) unless the text contains
double quotes
- space after `#` for comments
:::

## 😩 Ugly code 😩

::: {style="font-size: 50%;"}
::: {style="font-size: 70%;"}

```{r}
#| eval: false
Expand Down Expand Up @@ -371,7 +455,7 @@ data<-data|>mutate(id=str_extract(accession,"1::[^;]+")|>str_replace("1::",""))

## 😎 Cool code 😎

::: {style="font-size: 50%;"}
::: {style="font-size: 70%;"}

```{r}
#| eval: false
Expand Down Expand Up @@ -527,9 +611,16 @@ sum((eggs - mean_eggs)^2)

## Naming things

::: columns
::: {.column width="50%"}

![documents, CC-BY-NC,
https://xkcd.com/1459/](images/xkcd-comic-file-names.png){fig-alt="A comic figure is looking over the shoulder of another and is shocked by a list of files with names like 'Untitled 138 copy.docx' and 'Untitled 243.doc'. Caption: 'Protip: Never look in someone else's documents folder'"}

:::


::: {.column width="50%"}
Guiding principle - Have a convention! Good file names are:

- machine readable
Expand All @@ -538,6 +629,10 @@ Guiding principle - Have a convention! Good file names are:

- play nicely with sorting

:::

:::

## Naming suggestions

- no spaces in names
Expand All @@ -558,7 +653,7 @@ Guiding principle - Have a convention! Good file names are:

# Workflow tips

::: {style="font-size: 50%;"}
::: {style="font-size: 60%;"}
- multiple cursors

- open a file/function or find a variable CONTROL+.
Expand All @@ -582,6 +677,10 @@ Guiding principle - Have a convention! Good file names are:
- [GitHub Copilot in RStudio, it's finally
here!](https://colorado.posit.co/rsc/rstudio-copilot/#/TitleSlide)

- It's all gone wrong, Restart R CONTROL+SHIFT+F10

- [Fira Code](https://github.com/tonsky/FiraCode?tab=readme-ov-file)

:::

## Summary
Expand All @@ -603,6 +702,7 @@ Guiding principle - Have a convention! Good file names are:

Completely optional suggestions for further reading

::: {style="font-size: 70%;"}
- [Project-oriented workflow \| What They Forgot to Teach You About
R](https://rstats.wtf/projects) [@bryan]. Recommended if you still
need convincing to use RStudio Projects
Expand All @@ -612,6 +712,8 @@ Completely optional suggestions for further reading
- Excuse Me, Do You Have a Moment to Talk About Version Control?
[@bryan2018]

Pages made with R [@R-core], Quarto [@Allaire_Quarto_2024], `knitr` [@knitr1; knitr2; knitr3], `kableExtra` [@kableExtra]
Pages made with R [@R-core], Quarto [@Allaire_Quarto_2024], `knitr` [@knitr1; @knitr2; @knitr3], `kableExtra` [@kableExtra]

:::

## References
1 change: 0 additions & 1 deletion references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,6 @@ @article{baggerly2009
pages = {1309--1334},
volume = {3},
number = {4},
doi = {10.2307/27801549},
url = {http://www.jstor.org/stable/27801549},
note = {Publisher: Institute of Mathematical Statistics}
}
Expand Down

0 comments on commit 0765613

Please sign in to comment.