Skip to content

Commit

Permalink
Merge pull request #272 from jnywong/hhmi-mysql-post
Browse files Browse the repository at this point in the history
Add HHMI Spyglass MySQL post
  • Loading branch information
jnywong authored Jul 5, 2024
2 parents 57bb25b + 703a957 commit a711e5d
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
87 changes: 87 additions & 0 deletions content/blog/2024/hhmi-spyglass-mysql/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
title: Enabling neuroscience in the cloud with HHMI Spyglass and MySQL on JupyterHub
date: "2024-07-05"
authors: ["Yuvi Panda", "James Munroe", "Jenny Wong"]
tags: [bioscience, open source]
categories: [impact]
featured: false
draft: false
---

![HHMI Spyglass tutorial](featured.png "The [HHMI Spyglass tutorial](https://spyglass.hhmi.2i2c.cloud/)")

## Spyglass

[Spyglass](https://github.com/LorenFrankLab/spyglass) is a framework for reproducible and shareable neuroscience research produced by [Loren Frank’s lab](https://github.com/LorenFrankLab) at the University of California, San Francisco. Check out our [blog post about the release of their preprint](../hhmi-spyglass/index.md) to read more about the methods.

This post focuses on the complex data storage needed for the project, which can be difficult to set up locally or at scale in the cloud. In particular, the analysis needed a MySQL database for reproducibility. This is a fairly common task across many fields. The aim of 2i2c is to enable researchers to focus on the essential complexity of what they were doing, i.e. the science, without managing the accidental complexity of how to do it -- in this case, setting up databases.

We describe how you can do this too for your own JupyterHubs. Since 2i2c commits to running our infrastructure in line with open-source values as much as possible, you can also directly see the [configuration for the hub](https://github.com/2i2c-org/infrastructure/blob/99071c38712ef8e6bed6609117ca4b894b89ae5c/config/clusters/hhmi/spyglass.values.yaml#L76) referenced in the paper.

## What is a "sidecar container"?

The Kubernetes definition of a [sidecar container](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) is

> Sidecar containers are the secondary containers that run along with the main application container within the same Pod. These containers are used to enhance or to extend the functionality of the primary app container by providing additional services, or functionality such as logging, monitoring, security, or data synchronization, without directly altering the primary application code.
In this case, the *primary* app container is the JupyterLab instance where people are interactively running code and doing science. We want to provide a MySQL database as a sidecar so that each user server gets their own independent MySQL server instance (that is not accessible to anyone else). We can then run code such as

```
%%bash
mysql -h 127.0.0.1 -u root --password=tutorial < path-to-sql-file-with-data
```

to load data into the database. Note the IP address `127.0.0.1` - the MySQL server is listening on localhost, even though it is not running in the *same container*! Thanks to the magic of [Linux Network Namespaces](https://lwn.net/Articles/580893/), the sidecar and main app container can share `127.0.0.1`. This allows you to write code that works **in the exact same way** on a user's local computers as on the JupyterHub, making transitions and replication easier.

## Setting up sidecars in JupyterHub on Kubernetes

We're leveraging multiple tools from the open-source ecosystem - JupyterHub, Kubernetes, Linux as well as MySQL itself.

Since this is a *Kubernetes* feature, we can pass through config to it. There are
two layers here, which are

1. [singleuser.extraContainers](https://z2jh.jupyter.org/en/latest/resources/reference.html#singleuser-extracontainers) in [z2jh](https://z2jh.jupyter.org/en/stable/) configuration
2. [KubeSpawner.extra_containers](https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html#kubespawner.KubeSpawner.extra_containers) in [KubeSpawner](https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html) configuration

The hub configuration looks like

```yaml
singleuser:
extraContainers:
- name: mysql
image: datajoint/mysql:8.0 # following the spyglass tutorial at https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
ports:
- name: mysql
containerPort: 3306
resources:
limits:
# Best effort only. No more than 1 CPU, and if mysql uses more than 4G, restart it
memory: 4Gi
cpu: 1.0
requests:
# If we don't set requests, k8s sets requests == limits!
# So we set something tiny
memory: 64Mi
cpu: 0.01
env:
# Configured using the env vars documented in https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
- name: MYSQL_ROOT_PASSWORD
value: "tutorial"
```
By setting this up, we allow users to insert the code snippet above
```
%%bash
mysql -h 127.0.0.1 -u root --password=tutorial < path-to-sql-file-with-data
```

into their [Jupyter Notebooks](https://github.com/LorenFrankLab/spyglass-demo/blob/main/notebooks/00_HubQuickStart.ipynb), which gives access to their MySQL database in the hub!

However, this configuration does not include permanently store the database itself between hub server sessions. Thanks to a pilot in a prior collaboration with University of Texas, Austin, we do have [some documentation](https://github.com/2i2c-org/infrastructure/blob/main/docs/howto/features/per-user-db.md) on how you can enable that as well!

## Acknowledgements

- National Institute of Mental Health (NIMH), grant number RF1MH130623
- [kubespawner](https://github.com/jupyterhub/kubespawner)
- [zero-to-jupyterhub-k8s](https://github.com/jupyterhub/zero-to-jupyterhub-k8s/)

0 comments on commit a711e5d

Please sign in to comment.