diff --git a/content/blog/2024/gesis-binderhub-profiles/index.md b/content/blog/2024/gesis-binderhub-profiles/index.md index 07e9e5d17..bcd9184b8 100644 --- a/content/blog/2024/gesis-binderhub-profiles/index.md +++ b/content/blog/2024/gesis-binderhub-profiles/index.md @@ -17,7 +17,7 @@ draft: true [mybinder.org](https://mybinder.org) is a very popular service that allows end users to build the environment (languages, packages, etc) needed for their notebooks to run correctly, and share that with others with just a simple link. While not without its own set of challenges, this is extremely powerful because it puts control of the *environment* in the hands of the people who write the code that has to run in the environment. They can customize the environment to fit the needs of their code, instead of having to fit their code into the environment that admins have made available. -But, mybinder.org (and the [binderhub](https://github.com/jupyterhub/binderhub/) software that powers it) is built for *sharing* your work after you are done with it, *not* for actively doing work. [JupyterHub](https://jupyter.org/hub) is more commonly used for this, but doesn't currently posess the ability for users to easily build their own environments. Admins who are *running* the JupyterHub can make [multiple environments](https://z2jh.jupyter.org/en/stable/jupyterhub/customizing/user-environment.html#using-multiple-profiles-to-let-users-select-their-environment) available for users to choose from, but this still puts admins in the critical path for environment customization. +But, mybinder.org (and the [BinderHub](https://github.com/jupyterhub/binderhub/) software that powers it) is built for *sharing* your work after you are done with it, *not* for actively doing work. [JupyterHub](https://jupyter.org/hub) is more commonly used for this, but doesn't currently posess the ability for users to easily build their own environments. Admins who are *running* the JupyterHub can make [multiple environments](https://z2jh.jupyter.org/en/stable/jupyterhub/customizing/user-environment.html#using-multiple-profiles-to-let-users-select-their-environment) available for users to choose from, but this still puts admins in the critical path for environment customization. Our [collaboration](https://2i2c.org/blog/2022/gesis-2i2c-collaboration-update/) with [GESIS](http://gesis.org), [NFDI4DS](https://www.nfdi4datascience.de), and [CESSDA](https://www.cessda.eu), aims to bring this flexibility to JupyterHub directly. We aim to empower users to decide for themselves which applications and dependencies are installed on a per-project basis. Our work enables communities with heterogeneous requirements to share a single Hub. Our approach work frees administrators from being overwhelmed by installation requests and transforms the JupyterHub platform into a platform for computational reproducibility, a role previously reserved for BinderHub. In this update, we report on our progress and upcoming steps in this project. @@ -51,11 +51,11 @@ So what all did we need to do to accomplish this, in a way that's very upstream ## Standalone `binderhub-service` helm chart -The default upstream [binderhub helm-chart](https://github.com/jupyterhub/binderhub/tree/main/helm-chart) *includes* a JupyterHub as a dependency, and configures itself to be used primarily in a manner similar to [mybinder.org](https://mybinder.org). As the person who helped make that choice early on, I can tell you why it was made - for convenience! And it *was* very convenient, as it allowed us to get mybinder.org going fast. However, it makes it difficult to install a binderhub service *alongside* an existing JupyterHub, which we would need for our dynamic image building integration. To this end, we have created a standalone [binderhub helm chart](https://github.com/2i2c-org/binderhub-service/), designed to be installed *alongside* an existing JupyterHub! This allows the BinderHub instance to be used as a [JupyterHub Service](https://jupyterhub.readthedocs.io/en/stable/reference/services.html), which is what we want. +The default upstream [BinderHub helm-chart](https://github.com/jupyterhub/binderhub/tree/main/helm-chart) *includes* a JupyterHub as a dependency, and configures itself to be used primarily in a manner similar to [mybinder.org](https://mybinder.org). As the person who helped make that choice early on, I can tell you why it was made - for convenience! And it *was* very convenient, as it allowed us to get mybinder.org going fast. However, it makes it difficult to install a BinderHub service *alongside* an existing JupyterHub, which we would need for our dynamic image building integration. To this end, we have created a standalone [BinderHub helm chart](https://github.com/2i2c-org/binderhub-service/), designed to be installed *alongside* an existing JupyterHub! This allows the BinderHub instance to be used as a [JupyterHub Service](https://jupyterhub.readthedocs.io/en/stable/reference/services.html), which is what we want. -As part of this, we also added a way for BinderHub to run in [API only mode](https://github.com/jupyterhub/binderhub/pull/1647), so we can fully turn off the UI *and* launching ability of binderhub, and use it purely for its building API. +As part of this, we also added a way for BinderHub to run in [API only mode](https://github.com/jupyterhub/binderhub/pull/1647), so we can fully turn off the UI *and* launching ability of BinderHub, and use it purely for its building API. -While this helm chart is currently under the 2i2c GitHub org, the hope is that it can eventually migrate to a [jupyterhub-contrib](https://github.com/jupyterhub/team-compass/issues/519) organization (once it is created), or it can become the upstream helm chart for binderhub if enough work can be done in binderhub to allow it to serve use cases like mybinder.org. +While this helm chart is currently under the 2i2c GitHub org, the hope is that it can eventually migrate to a [jupyterhub-contrib](https://github.com/jupyterhub/team-compass/issues/519) organization (once it is created), or it can become the upstream helm chart for BinderHub if enough work can be done in BinderHub to allow it to serve use cases like mybinder.org. ## Sustainably extending KubeSpawner's `profileList` @@ -81,13 +81,13 @@ This is ongoing now at the [jupyterhub-fancy-profiles](https://github.com/yuvipa ## [`jupyterhub/@binderhub-client`](https://www.npmjs.com/package/@jupyterhub/binderhub-client) npm package -While building `jupyterhub-fancy-profiles`, we want to use the *same* javascript code used by BinderHub frontend to interact with the BinderHub API, instead of re-implementing it. However, the existing binderhub javascript code was not easily consumable by external projects. We refactored this, adding tests, migrating to use modern JS practices and published the [`jupyterhub/@binderhub-client` NPM package](https://www.npmjs.com/package/@jupyterhub/binderhub-client) that can be used not just by `jupyerhub-fancy-profiles` but any external project for talking to the binderhub API. +While building `jupyterhub-fancy-profiles`, we want to use the *same* javascript code used by BinderHub frontend to interact with the BinderHub API, instead of re-implementing it. However, the existing BinderHub javascript code was not easily consumable by external projects. We refactored this, adding tests, migrating to use modern JS practices and published the [`jupyterhub/@binderhub-client` NPM package](https://www.npmjs.com/package/@jupyterhub/binderhub-client) that can be used not just by `jupyerhub-fancy-profiles` but any external project for talking to the BinderHub API. -This had to be done in such a way that current binderhub installations (such as mybinder.org) do not break. That took quite a few pull requests: [1](https://github.com/jupyterhub/binderhub/pull/1689), [2](https://github.com/jupyterhub/binderhub/pull/1693), [3](https://github.com/jupyterhub/binderhub/pull/1694), [4](https://github.com/jupyterhub/binderhub/pull/1741), [5](https://github.com/jupyterhub/binderhub/pull/1742), [6](https://github.com/jupyterhub/binderhub/pull/1758), [7](https://github.com/jupyterhub/binderhub/pull/1761), [8](https://github.com/jupyterhub/binderhub/pull/1771), [9](https://github.com/jupyterhub/binderhub/pull/1773), [10](https://github.com/jupyterhub/binderhub/pull/1775), [11](https://github.com/jupyterhub/binderhub/pull/1778), [12](https://github.com/jupyterhub/binderhub/pull/1779), [13](https://github.com/jupyterhub/binderhub/pull/1781), [14](https://github.com/jupyterhub/binderhub/pull/1782), [15](https://github.com/jupyterhub/binderhub/pull/1783) +This had to be done in such a way that current BinderHub installations (such as mybinder.org) do not break. That took quite a few pull requests: [1](https://github.com/jupyterhub/binderhub/pull/1689), [2](https://github.com/jupyterhub/binderhub/pull/1693), [3](https://github.com/jupyterhub/binderhub/pull/1694), [4](https://github.com/jupyterhub/binderhub/pull/1741), [5](https://github.com/jupyterhub/binderhub/pull/1742), [6](https://github.com/jupyterhub/binderhub/pull/1758), [7](https://github.com/jupyterhub/binderhub/pull/1761), [8](https://github.com/jupyterhub/binderhub/pull/1771), [9](https://github.com/jupyterhub/binderhub/pull/1773), [10](https://github.com/jupyterhub/binderhub/pull/1775), [11](https://github.com/jupyterhub/binderhub/pull/1778), [12](https://github.com/jupyterhub/binderhub/pull/1779), [13](https://github.com/jupyterhub/binderhub/pull/1781), [14](https://github.com/jupyterhub/binderhub/pull/1782), [15](https://github.com/jupyterhub/binderhub/pull/1783) ## `cryptnono` anti-abuse features -For Open Science to flourish, we need to allow access without login / paywalls wherever possible. A new menace against this has been [cryptojacking](https://www.interpol.int/en/Crimes/Cybercrime/Cryptojacking) - where attackers use up any and all available free compute to mine cryptocurrencies. This has affected *many* folks on the internet, including [GitHub Actions](https://www.bleepingcomputer.com/news/security/github-actions-being-actively-abused-to-mine-cryptocurrency-on-github-servers/) and mybinder.org, the primary public binderhub installation. mybinder.org has some extra protections against cryptojacking that aren't easily usable elsewhere, and this has unfortunately meant that the imagebuilding demos have been behind a login wall. I personally believe login walls are long term antithetical to open science, and so this was an important problem to solve. +For Open Science to flourish, we need to allow access without login / paywalls wherever possible. A new menace against this has been [cryptojacking](https://www.interpol.int/en/Crimes/Cybercrime/Cryptojacking) - where attackers use up any and all available free compute to mine cryptocurrencies. This has affected *many* folks on the internet, including [GitHub Actions](https://www.bleepingcomputer.com/news/security/github-actions-being-actively-abused-to-mine-cryptocurrency-on-github-servers/) and mybinder.org, the primary public BinderHub installation. mybinder.org has some extra protections against cryptojacking that aren't easily usable elsewhere, and this has unfortunately meant that the imagebuilding demos have been behind a login wall. I personally believe login walls are long term antithetical to open science, and so this was an important problem to solve. [cryptnono](https://github.com/yuvipanda/cryptnono) is an open source project designed to help fight cryptojacking, and as part of this grant we ported some of this functionality out of mybinder.org specific code into cryptnono, so other deployments may also benefit from it! As part of the port, we also migrated to using the super efficient [ebpf](https://ebpf.io/) Linux Kernel feature, allowing for more complex heuristics to catch a much broader range of cryptomining activity. We have been slowly tweaking the config here on mybinder.org, and it has proven to be very effective! This will be very helpful for *anyone* who wants to provide a JupyterHub (or any other computational service) without a login wall. If you are interested in using cryptnono in this fashion, please reach out to us so we can work together! @@ -95,7 +95,7 @@ For Open Science to flourish, we need to allow access without login / paywalls w List of things that were tried and then decided as not good pathways: -- [repo2docker-service](https://github.com/consideRatio/repo2docker-service), a separate JupyterHub service that could *only* build images. As we worked on it, we realized that it was replicating a lot of features that binderhub already has, so we pivoted to working on binderhub directly instead. +- [repo2docker-service](https://github.com/consideRatio/repo2docker-service), a separate JupyterHub service that could *only* build images. As we worked on it, we realized that it was replicating a lot of features that BinderHub already has, so we pivoted to working on BinderHub directly instead. - Building off of [tljh-repo2docker](https://github.com/plasmabio/tljh-repo2docker). While this already had a nice UI, it would be hard to port it to run on a distributed kubernetes environment without it becoming a 'hard fork'. While these did slow down the implementation of the project, it has allowed us to be very confident that the methods we have chosen are long term sustainable.