Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing R dependencies #90

Open
abkfenris opened this issue Oct 1, 2024 · 5 comments
Open

Managing R dependencies #90

abkfenris opened this issue Oct 1, 2024 · 5 comments

Comments

@abkfenris
Copy link
Collaborator

There may be a better solution to managing R dependencies reliably these days:

https://rstudio.github.io/renv/articles/docker.html

@emiliom
Copy link
Member

emiliom commented Nov 11, 2024

I missed this issue when I was preparing the images for last month's OHW-espanol event. But now that I'm preparing for the next one (Nov 25-29), I'd like to try to make R env management easier and more reliable. I described problems and needs to 2i2c just a couple of days ago (pasted below); don't know yet if they'll be able to help -- their initial reply was "are you using repo2docker? If so, then let me know the GitHub repo and I can take a quick look at your configuration." I'll follow up with them today.

@abkfenris do you have time to work with me (and maybe 2i2c) over the next week or so to try to improve this?

PS. The old PR #73 is also relevant.

(challenges with) Environment management for R image.

  • The setup we've been using in OHW is complicated, relying on conda when an R package is available on conda-forge and Docker for packages from CRAN and GitHub. For the last two, we can't specify that dependencies be installed or upgraded automatically, so it takes a lot of extra legwork. Can you recommend improvements?
  • We're using pretty old versions of RStudio and (in the conda env for R) python and pangeo-notebook, due to previous issues. Can you provide guidance for recent versions that are known to work?
  • We ran into a problem where some packages that were already installed and used successfully in 2023 were not recognized (could not be loaded) on RStudio. We need help tracking this down

@abkfenris
Copy link
Collaborator Author

I believe we had a few different issues with repo2docker that really made it unusable for us on the Python side:

  • Very long solve times for conda environment, or failure to solve at all
  • No ability to lock environments limiting reproducibility
  • Some environment tweaks we needed were very hard to do

If I remember right, we really hit this wall before we started adding R.

  • From there we found that if we didn't install at least the core geospatial R dependencies with conda, they often would mangle each others versions of underlying libraries. Telling the ones that we've specified directly in Dockerfile to not install or upgrade dependencies automatically is part of keeping them from mangling each other.
  • Rstudio has some licensing issue that kept it from being able to be packaged in Conda Forge for a few years. I believe that's fixed, @ocefpaf might know more. I didn't end up updating it this year as the R folks said the environment was fine for their tutorials. Python and pangeo-notebook are probably fine to update
  • I know just enough R to get this process to work, so you might need to find someone who knows how RStudio finds things. I know we found that it often ignored environment variables that would otherwise be used to set paths.

@ocefpaf
Copy link
Member

ocefpaf commented Nov 13, 2024

  • Rstudio has some licensing issue that kept it from being able to be packaged in Conda Forge for a few years. I believe that's fixed, @ocefpaf might know more. I didn't end up updating it this year as the R folks said the environment was fine for their tutorials. Python and pangeo-notebook are probably fine to update

I guess only the package name throws people off. It is rstudio-desktop and not rstudio. The updates there are slow though b/c it is notoriously hard to build it. Also, it is super heavy and brings tons of large dependencies.

TL;DR if you really need it, it works. It not, I don't think it is worth adding to the env.

@emiliom
Copy link
Member

emiliom commented Nov 13, 2024

Thanks for the background and updates, @abkfenris and @ocefpaf! Very helpful. I haven't heard back from 2i2c yet (pinged them again today), but I've been reading up on these topics. One step I can -- and will -- take now is to make these core updates to the R image:

  • conda (environment.yml)
    • Python: 3.12
    • pangeo-notebook: I'll set it to the same one in the current pixi.toml, 2024.08.07, just for consistency
  • r/Dockerfile
    • RStudio. This is where RStudio is installed, via the rstudio-server Debian package rather than a conda package. It's currently set to rstudio-server-2022.07.1-554-amd64.deb on bionic. It looks like (cat /etc/os-release) the hub currently uses jammy (Ubuntu 22.04), and after some digging I found the latest version on jammy: jammy/amd64/rstudio-server-2024.09.1-394-amd64.deb

I'll update the R image today with these changes, cross my fingers, then test it when it's rebuilt -- assuming it does build 😅

The Dockerfile has the comment "Newer one has bug that doesn't work with jupyter-rsession-proxy" that goes way back and is still found in this Dockerfile on the main 2i2c hub image repo. But from what I've read, the incompatibility was resolved back in Dec. 2021 or so, with jupyter-rsession-proxy 2.0.

@abkfenris I also read the article about renv + Docker, https://rstudio.github.io/renv/articles/docker.html. That looks promising for improving dependency management, but I think it's out of my depth for now, at least as of this week. Among the OHW-espanol organizers and instructors we have several heavy R users; I doubt they have experience with renv in Docker, but I'll ask. We'll see also if that's something 2i2c can help us with.

@emiliom
Copy link
Member

emiliom commented Nov 15, 2024

2i2c has shared this. Very relevant comparison.

I can point you in the direction of our community CryoCloud's RStudio configuration that follows our recommended repo2docker action: https://github.com/CryoInTheCloud/hub-Rstudio-image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants