Skip to content

Commit

Permalink
Add FAQ #26 (#51) [no CI]
Browse files Browse the repository at this point in the history
* Add FAQ #26

* Fix FAQ as per review [no CI]
  • Loading branch information
chainsawriot authored Feb 15, 2023
1 parent 49230c0 commit 614943c
Show file tree
Hide file tree
Showing 3 changed files with 226 additions and 0 deletions.
3 changes: 3 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Suggests:
knitr,
rmarkdown,
testthat (>= 3.0.0)
Config/testthat/edition: 3
Imports:
Expand All @@ -25,3 +27,4 @@ Imports:
gh
Depends:
R (>= 3.5.0)
VignetteBuilder: knitr
2 changes: 2 additions & 0 deletions vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.html
*.R
221 changes: 221 additions & 0 deletions vignettes/faq.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
---
title: "FAQ"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{FAQ}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

# General questions

**GQ1: Tell me in 10 lines how to use this package.**

**GA1:** Get the dependency graph of several R packages on CRAN or Github at a specific snapshot date(time)

```r
graph <- resolve(c("crsh/papaja", "rio"), snapshot_date = "2019-07-21")
```

Dockerize the dependency graph to a directory

```r
dockerize(graph, output_dir = "rangtest")
```

You can build the Docker image either by the R package `stevedore` or Docker CLI client. We use the CLI client.

```sh
cd rangtest
docker build -t rangtest . ## might need sudo
```

Launch the container with the built image

```sh
docker run --rm --name "rangcontainer" -ti rangtest
```

And the tenth line is not needed.

**GQ2: For running `resolve()`, how do I know which packages are used in a project?**

**GA2:** We recommend `renv::dependencies()`.

Suppose in a directory called "project" there are two R files:

```r
here::here()
```

```r
library(rio)
x <- import("hello.csv")
```

Running this reveals

```{r include = FALSE}
## x <- tempfile()
## dir.create(x)
## writeLines(c("library(rio)", "here::here()"), file.path(x, "fake.R"))
## y <- renv::dependencies(x)
## y$Source <- c("myproject/1.R", "myproject/2.R")
## y
```

<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true"></a>renv<span class="op">::</span><span class="kw">dependencies</span>(<span class="st">&quot;project&quot;</span>)</span></code></pre></div>
<pre><code>#&gt; Finding R package dependencies ... Done!
#&gt; Source Package Require Version Dev
#&gt; 1 myproject/1.R here FALSE
#&gt; 2 myproject/2.R rio FALSE</code></pre>

You may still need to manually review which packages are from Github.

**GQ3: Why is the R script generated by `dockerize()` and `export_rang()` so strange/unidiomatic/inefficient/did you guys read `fortunes::fortune("answer is parse")`?**

**GA3:** It is because we optimize the R code in `rang.R` for backward compatibility. We need to make sure that the code runs well in vanilla R environments since 2.1.0.

**GQ4: Why doesn't `rang` support reconstructing computational environments with R < 2.1.0 yet?**

**GA4:** It is because installing source packages from within R was introduced in R 2.1.0. Before that one needed to install source packages with `R CMD INSTALL`. But we are working on supporting R in the 1.x series.

**GQ5: Does `rang.R` (generated by `export_rang()` or `dockerize()`) run on non-Linux OSes?**

**GA5:** Theoretically speaking, yes. But strongly not recommended. If the system requirements are fulfilled, `rang.R` should probably run fine on OS X if the R packages do not contain compiled code. C and Fortran compilers are needed if it is the case. See [this entry](https://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#Installation-of-source-packages) in R Mac OS X FAQ. On Windows, installing Github packages requires properly set up PATH and `tar`. Similarly, R packages with compiled code require C / Fortran compilers. See [this entry](https://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-install-packages-into-libraries-in-this-version_003f) in R for Windows FAQ.

**GQ6: What are the caveats of using rang?**

**GA6:** Many

* `rang` does not support reconstructing computational environments with R < 2.1.0 (i.e. `snapshot_date` < "2005-04-19 09:01") yet
* `dockerize()` can only generate Debian/Ubuntu-based Docker images
* `dockerize(cache = TRUE)` does not cache [R source code](https://cran.r-project.org/src/base/) (yet) and System Requirements (in `deb` packages)
* `query_sysreqs()` (as well as `resolve(query_sysreqs = TRUE)`) queries for System Requirements based on the latest version of the packages on CRAN / Github. Therefore:
* Removed CRAN packages are assumed to have no System Requirements
* R Packages with changed System Requirements between `snapshot_date` and the date of running `resolve()` might produce incorrect System Requirements
* A result from `resolve()` with R version < 3.1 and has at least one Github package must be dockerized with caching (i.e. `dockerize(cache = TRUE)`). It is because the outdated version of Debian cannot communicate with the Github API
* R packages on Github or CRAN might not be available in the near future (Github: likely; CRAN: very unlikely). But one can cache the packages (`dockerize(cache = TRUE)`).
* The Rocker project and its host Docker Hub might not be available in the near future (unlikely)
* Ubuntu / Debian archives (for System Requirements) might not be available in the future (super unlikely)

**GQ7: `rang` depends on R >= 3.5.0. Several of the dependencies depend on many modern R packages. How dare you claiming your package supports R >= 2.1.0?**

**GA7:** To clarify, it is true that `resolve()` and `dockerize()` depend on many factors, including a modern version of R. But the reconstruction process (if with caching of R packages) depends only on the availability of Docker images from Docker Hub, availability of R source code on CRAN (R < 3.1.0), and `deb` packages from Ubuntu and Debian in the future. If you don't believe in all of these, see also: DQ4.

**GQ8: What are the data sources of `resolve()`?**

**GA8:** Several

* Dependencies / R version / System Requirements: r-hub APIs [pkgsearch](https://r-hub.github.io/pkgsearch/) [r-versions](https://api.r-hub.io/rversions) [sysreqs](https://sysreqs.r-hub.io/)
* Github: [Github API](https://docs.github.com/en/rest)
* Dependencies of Github packages: [Package manager from Posit](https://github.com/rstudio/r-system-requirements#operating-systems)

**GQ9: I am not convinced by this package. What are the alternatives?**

**GA9:** If you don't consider the Dockerization part of `rang`, the date-based pinning of R packages can be done by:

* Using [Package Manager](https://packagemanager.rstudio.com/)

```r
library(pak)
options(repos = c(REPO_NAME = "https://packagemanager.rstudio.com/cran/2019-07-21"))
pkg_install("rio")
pkg_install("crsh/papaja")
```

* Using [groundhog](https://groundhogr.com/)

```r
library(groundhog)
pkgs <- c("rio","crsh/papaja")
groundhog.library(pkgs, "2019-07-21")
```

If you don't consider the date-based pinning of R packages, the Dockerization can be done by:

* Using [containerit](https://github.com/o2r-project/containerit) [not on CRAN]

```r
library(containerit)
## combine with Package Manager to pin packages by date
install.packages("rio")
remotes::install_github("crsh/papaja")
library(rio)
library(papaja)
print(containerit::dockerfile(from = utils::sessionInfo()))
```

* Using [dockerfiler](https://cran.r-project.org/web/packages/dockerfiler/index.html)

```r
library(dockerfiler)
my_dock <- Dockerfile$new()
## combine with Package Manager to pin packages by date
my_dock$RUN(r(install.packages(c("remotes", "rio"))))
my_dock$RUN(r(remotes::install_github("crsh/papaja")))
my_dock
```

# Docker questions

**DQ1: Is Docker an overkill to simply ensure that a few lines of R code are reproducible?**

**DA1:** It might be the case for recent R code, e.g. R >= 3.0 (or `snapshot_date` > "2013-04-03 09:10"). But we position `rang` as an archaeological tool to run really old R code (`snapshot_date` >= "2005-04-19 09:01", but see GQ4). For this, Docker is essential because R in the 2.x (or even 1.x in future) series might not be installable anymore in a non-virtualized environment.

According to [The Turing Way](https://the-turing-way.netlify.app/reproducible-research/compendia.html), a research compendium that aids computational reproducibility should contain a complete description of the computational environment. The directory exported by `dockerize()`, especially when `materials_dir` and `cache` were used, can be directly shared as a research compendium.

**DQ2: How do I access bash instead of R?**

**DA2:** By default, containers launched with the images generated by `rang` goes to R. One can override this by launching the container with an alternative entry point.

Suppose an image was built as per GA1.

```sh
docker run --rm --name "rangcontainer" --entrypoint bash -ti rangtest
```

**DQ3: How do I copy files from and to a launched container?**

**DA3:** Again an image was built as per GA1 and launched as below

```sh
docker run --rm --name "rangcontainer" -ti rangtest
```

```sh
# probably you need to run this from another terminal
docker cp rangcontainer:/rang.R rang2.R
docker cp rang2.R rangcontainer:/rang2.R
```

We want to emphasize here that launching a container with `--name` is useful because the name of the container is randomly generated when `--name` was not used to launch it. It is also important to remind you that a relaunched container goes back to the initial state. Any file generated inside the container previously will be removed. So use `docker cp` to copy any artifact if one wants to preserve any artifact.

**DQ4: How do I back up an image?**

**DA4:** If you don't believe Docker Hub / Debian archives / Ubuntu archives would be available forever, you may back up the generated image.

```sh
docker save rangtest | gzip > rangtest.tar.gz
```

You can also share the back up gzipped tarball file (usually < 1G, depending on the size of `materials_dir`, thus sharable on Zenodo).

To restore the backup image:

```sh
docker load < rangtest.tar.gz
```

And launch a container the same way

```sh
docker run --rm --name "rangcontainer" -ti rangtest
```

0 comments on commit 614943c

Please sign in to comment.