Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propogate environment variables into RStudio #145

Open
cboettig opened this issue Feb 16, 2024 · 22 comments
Open

Propogate environment variables into RStudio #145

cboettig opened this issue Feb 16, 2024 · 22 comments
Labels

Comments

@cboettig
Copy link

Bug description

Annoyingly, RStudio (though not R itself) decides to ignore global system environmental variables and only recognizes those environmental variables declared in an Renviron file (i.e. either $R_HOME/etc/Renviron.site, for all users, or a .Renviron in the user's home directory). For instance, the client ID required by the awesome gh-scoped-creds python module would typically be passed in this way.

In the rocker project, we propagate most environmental variables into R_HOME before bringing up the rserver by using the s9 init system, https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/init_set_env.sh , which obviously isn't used in a jupyterhub + jupyter-rsession-proxy setup. Would it be possible to have the jupyter-rsession-proxy do something similiar?

@cboettig cboettig added the bug label Feb 16, 2024
Copy link

welcome bot commented Feb 16, 2024

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@ryanlovett
Copy link
Collaborator

It makes sense to try to get RStudio to be aware of those environment variables. It is also logical for the rocker images to modify one of the Renviron files because the docker images are in full control of the environment. However I worry about this extension making changes to those files because it could be run outside of docker. For example it could be run on a shared HPC node. If we change a file, it'd need to be per-user and we'd have to unset the changes somehow afterwards. Using environment variables would normally be the right choice to influence app behavior and being forced to use config files is, yes, annoying.

I'll look into how RStudio prepares the environment a bit more. In this case, it might be best to alter the files for the gh-scoped-creds env vars in the Docker images. (which I know you need in https://github.com/berkeley-dsep-infra/datahub)

@cboettig
Copy link
Author

Thanks, makes sense, I think I followed most of this.

If we change a file, it'd need to be per-user and we'd have to unset the changes somehow afterwards.

I was wondering if it would be possible for juypter-ression-proxy to echo-append the env vars into the user's home dir, ~/.Renviron ? This wouldn't conflict with the Renviron.site coming from the docker image, and would automatically be applied per-user for Rstudio. Not sure about the unsetting it part.

and yeah, it's quite annoying RStudio makes us do this.

@benz0li
Copy link

benz0li commented Mar 17, 2024

With an image based on the Jupyter Docker Stacks, this could be done with a Startup Hook.

E.g. a rstudio.sh script containing

echo "LANG=$LANG" >>"$(R RHOME)/etc/Renviron.site"
echo "TZ=$TZ" >>"$(R RHOME)/etc/Renviron.site"
...

This would also require a change of ownership and permission of $(R RHOME)/etc/*.site when building the image:

chown :"$NB_GID" "$(R RHOME)/etc" "$(R RHOME)/etc/"*.site
chmod g+w "$(R RHOME)/etc" "$(R RHOME)/etc/"*.site

@benz0li
Copy link

benz0li commented Mar 17, 2024

@ryanlovett Is there something like Startup Hooks for binder, too?

@manics
Copy link
Member

manics commented Mar 17, 2024

There are postBuild and start files.

@yuvipanda
Copy link
Contributor

I think RStudio does this primarily to protect itself from weird env vars in people's desktops, and that causes issues when running serverside.

jupyter-rsession-proxy is a good place to do this!

I think the primary thing to be determined is which file to modify. Ideally, this should be:

  1. Per-user, and not systemwide. This accounts for both permission issues, as well as the issues of running outside containerized environments.
  2. At least inside containerized environments, does not persist past restarts commonly. This avoids issues possible issues with staleness.

@cboettig putting anything under $HOME helps with (1) but not (2). Does Renviron allow us to include other files? That way, another solution would be to add a line under $HOME that includes a file from somewhere else (like /tmp or wherever).

@yuvipanda yuvipanda changed the title Propagate env vars into Renviron.site? Propogate environment variables into RStudio Apr 26, 2024
@yuvipanda
Copy link
Contributor

Another important use case for this is with AWS credentials (or GCP credentials). These are dynamically set at runtime on the pod, and need to be propagated for automatic detection of credentials when accessing APIs to work. So while we can set gh-scoped-creds at image build time, we can't do that for these.

@cmd-ntrf
Copy link

Side note: Compute Canada / Digital Research Alliance of Canada solution for this problem was to patch RStudio.

Patch is available here: https://github.com/ComputeCanada/easybuild-easyconfigs-installed-avx2/blob/main/2023/RStudio-Server/rstudio-1.2.1335.patch. It is quite simple, but I have never tried to have it merged upstream.

@ryanlovett
Copy link
Collaborator

@yuvipanda What we ended up doing was modifying Rprofile to support sourcing files in an Rprofile.d directory, and then using extraFiles to set the env vars there. I imagine one could add a script in Rprofile.d to source something from HOME. This is all in the user environment and outside of jupyter-rsession-proxy however.

@yuvipanda
Copy link
Contributor

@ryanlovett oooh, that's interesting. Is there a per-user Rprofile? If there is, perhaps we can dynamically modify that in rsession-proxy?

  1. Modify Rprofile to load env vars from a specific location (if it exists)
  2. have rsession-proxy dump out env vars in this specific location

Alternatively, we could start with putting this kinda code in Rprofile for just the rocker/binder image, where it can simply read from /proc/1/environ. Thoughts on that, @cboettig?

@ryanlovett
Copy link
Collaborator

Is there a per-user Rprofile? If there is, perhaps we can dynamically modify that in rsession-proxy?

Yes, ~/.Rprofile (and also ~/.Renviron). The modification process would have to be idempotent. Would you also want to leave the vars behind? Or assume whatever file is included gets overwritten each session?

@yuvipanda
Copy link
Contributor

@ryanlovett yes, we'd have the code to be idempotent - reasonably doable I'd think (conda init does something like that for example).

The vars should be put on something like /tmp, so that gets cleaned up as appropriate. So we assume it gets overwritten each session.

@cboettig
Copy link
Author

cboettig commented Apr 26, 2024

I think having something like:

tmp <- tempfile()
writeLines(readBin("/proc/1/environ", "character", n = 500), tmp)
readRenviron(tmp)

in ${R_HOME}/etc/Rprofile.site would populate the env var list when R session starts up for any user? Or if this is handled by jupyter-rsession-proxy, presumably it would append this to user-specific $HOME/.Rprofile ?

@benz0li
Copy link

benz0li commented Apr 26, 2024

Why not use a Startup Hook? e.g.

exclude_vars="HOME LD_LIBRARY_PATH OLDPWD PATH PWD RSTUDIO_VERSION SHLVL"
for var in $(compgen -e); do
  [[ ! $exclude_vars =~ $var ]] && echo "$var=${!var}" \
    >> "$(R RHOME)/etc/Renviron.site"
done

@yuvipanda
Copy link
Contributor

@benz0li that is specific to jupyter/docker-stacks. I'd like a more general solution here.

my ideal solution is to get #145 (comment) upstreamed so rstudio will also act like most other applications :D But I don't think that's going to happen.

@eeholmes
Copy link

eeholmes commented Jun 3, 2024

@cboettig I think this Codespace devcontainer PWD behavior is related to this.

The following devcontainer.json spins up a devcontainer with JupyterLab and RStudio.

{
  "name": "test",
  "workspaceFolder": "/home/jovyan",
  "image": "ghcr.io/nmfs-opensci/container-images/py-rocket-base:latest",
  "forwardPorts": [ 8889 ],
  "portsAttributes": { "8889": { "label": "Jupyter Lab",  }  },
  "postCreateCommand": "jupyter lab --ip=0.0.0.0 --port=8889 --allow-root --no-browser --NotebookApp.token='' --NotebookApp.password=''"
}

In the codespace from a terminal:
${PWD} is \home\jovyan

In JupyterLab from a terminal:
${PWD} is \home\jovyan

In RStudio from the terminal tab
${PWD} is \workspaces<reponame>

In RStudio in the file panel
what is shown is \home\jovyan which is ${HOME}

If we change the workspaceFolder to something different than HOME

  "workspaceFolder": "/home/jovyan/codespace",

In the codespace from a terminal:
${PWD} is /home/jovyan/codespace

In JupyterLab from a terminal:
${PWD} is /home/jovyan/codespace

In RStudio from the terminal tab
${PWD} is \workspaces<reponame>

In RStudio in the file panel
what is shown is \home\jovyan which is ${HOME}

@eeholmes
Copy link

eeholmes commented Jun 3, 2024

Sadly,

echo "PWD=/home/jovyan" >> ~/.Renviron

seems to be ignored. Setting other envs works fine but PWD is ignored. I restarted R and restarted the terminal tab. Even setting PWD=~ only works for that terminal. As soon as I start a new one, it goes back to the \workspaces\<reponame> PWD.

Also

echo PWD=\home\jovyan >> ~/.bashrc

didn't help (unless I typed bash).

@benz0li
Copy link

benz0li commented Jun 3, 2024

@eeholmes Set --notebook-dir=/home/jovyan in the postCreateCommand.

Cross reference: https://github.com/b-data/data-science-devcontainers#usage

@benz0li
Copy link

benz0li commented Jun 3, 2024

ℹ️ Opening your codespace in JupyterLab according to the GitHub Docs sets the default path to /workspaces/<repository-name> that you can not escape.

@eeholmes
Copy link

eeholmes commented Jun 3, 2024

@benz0li Sadly that has no effect on the terminal being opened by usr/bin/env bash -l in RStudio. The PWD is fine in JupyterLab. The issue is in the terminal opened by RStudio that is being opened via the launcher.
image

The devcontainer.json file is working close to what I want now.
https://github.com/nmfs-opensci/container-images/blob/main/.devcontainer/test/devcontainer.json
I think I can fix the PWD issue with

echo -e PWD=/home/jovyan\ncd $PWD >> ~/.bash_login

in the postCreate command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants