Skip to content

Commit

Permalink
Contrib notebook refresh (kubeflow#3423)
Browse files Browse the repository at this point in the history
  • Loading branch information
Pete MacKinnon authored and k8s-ci-robot committed Jun 7, 2019
1 parent 32e4190 commit 69fb4ec
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 72 deletions.
24 changes: 2 additions & 22 deletions components/contrib/kaggle-notebook-image/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.

# use basic syntax for now
FROM gcr.io/kaggle-images/python:latest

Expand Down Expand Up @@ -56,25 +53,8 @@ RUN cd /tmp && \

RUN chown -R ${NB_USER}:users $HOME

ENV GITHUB_REF https://raw.githubusercontent.com/kubeflow/kubeflow/master/components/tensorflow-notebook-image

ADD --chown=jovyan:users $GITHUB_REF/jupyter_notebook_config.py /tmp

# Wipe $HOME for PVC detection later
WORKDIR $HOME
RUN rm -fr $(ls -A $HOME)

# Get init scripts from kubeflow
ADD --chown=jovyan:users \
$GITHUB_REF/start-singleuser.sh \
$GITHUB_REF/start-notebook.sh \
$GITHUB_REF/start.sh \
$GITHUB_REF/pvc-check.sh \
/usr/local/bin/

RUN chmod a+rx /usr/local/bin/*

# Configure container startup
EXPOSE 8888
USER jovyan
ENTRYPOINT ["tini", "--"]
CMD ["start-notebook.sh"]
CMD ["sh","-c", "jupyter notebook --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]
6 changes: 3 additions & 3 deletions components/contrib/kaggle-notebook-image/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Kaggle notebook

This Dockerfile builds an image that is derived from the latest [Kaggle python image](https://github.com/Kaggle/docker-python) but which is compatible for launching from the Kubeflow JupyterHub. [Kaggle](https://www.kaggle.com/) is the home of data science collaboration and competition.
This Dockerfile builds an image that is derived from the latest [Kaggle python image](https://github.com/Kaggle/docker-python) but which is compatible for launching from the Kubeflow notebook controller. [Kaggle](https://www.kaggle.com/) is the home of data science collaboration and competition.

Important notes:
* this notebook is not curated by the Kubeflow project and is not regularly tested
* the versions of TensorFlow, PyTorch, and the other libraries included may change at any time
* this is a very large notebook, over 21 Gb in size. Since our notebook uses the latest Kaggle image, docker pulls (and notebook launches) can take a lengthy period of time.
* this is a large notebook image, over 15 Gb in size. Since our notebook uses the latest Kaggle image, docker pulls (and notebook launches) can take a lengthy period of time.
* the base image size for docker devicemapper is 10 Gb, which won't be large enough to run this image. Your docker daemon must be configured for at least 30 Gb (`-storage-opt dm.basesize=30G`) or use a storage driver like overlay2.
* the Kaggle image includes TensorFlow 1.9 or greater built with AVX2 support, so the image may not run on older CPU
* the Kaggle image includes TensorFlow 1.13.1 or greater built with AVX2 support, so the image may not run on older CPU
* the other Kubeflow curated notebooks have the feature that the jovyan user is able to install new packages from pip or conda. Unfortunately this is not the case with the Kaggle image due to the impact on the image size due to adding a new ownership layer.

To build the image run:
Expand Down
29 changes: 2 additions & 27 deletions components/contrib/rapidsai-notebook-image/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ ENV HOME /home/$NB_USER
ENV CONDA_DIR=/conda
ENV PATH $CONDA_DIR/bin:$PATH
ENV PATH $CONDA_DIR/envs/rapids/bin:$PATH
# anticipate a default GKE nvidia mount
ENV PATH /usr/local/nvidia/bin:$PATH

# Use bash instead of sh
SHELL ["/bin/bash", "-c"]
Expand Down Expand Up @@ -57,31 +55,8 @@ RUN cd /tmp && \

RUN chown -R ${NB_USER}:users $HOME

ENV GITHUB_REF https://raw.githubusercontent.com/kubeflow/kubeflow/master/components/tensorflow-notebook-image

ADD --chown=jovyan:users $GITHUB_REF/jupyter_notebook_config.py /tmp

# Wipe $HOME for PVC detection later
WORKDIR $HOME
RUN rm -fr $(ls -A $HOME)

# Get init scripts from kubeflow
ADD --chown=jovyan:users \
$GITHUB_REF/start-notebook.sh \
$GITHUB_REF/start-singleuser.sh \
$GITHUB_REF/start.sh \
$GITHUB_REF/pvc-check.sh \
/usr/local/bin/

RUN chmod a+rx /usr/local/bin/*

# HACK: GKE late-binding of NVIDIA driver mount
# seems to leave us with a stale cache for the CUDA libs;
# cudf librmm.so can't find libcuda* from LD_LIBRARY_PATH
RUN chown -R ${NB_USER}:users /etc
RUN sed -i '/JUPYTERHUB_API_TOKEN/i\ldconfig' /usr/local/bin/start-notebook.sh

# Configure container startup
EXPOSE 8888
USER jovyan
ENTRYPOINT ["tini", "--"]
CMD ["start-notebook.sh"]
CMD ["sh","-c", "jupyter notebook --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]
21 changes: 1 addition & 20 deletions components/contrib/rapidsai-notebook-image/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Rapids AI notebook

This Dockerfile builds an image that is derived from the current [RAPIDS image](https://ngc.nvidia.com/catalog/containers) but which is compatible for launching from the Kubeflow JupyterHub. [RAPIDS](https://devblogs.nvidia.com/gpu-accelerated-analytics-rapids/) uses NVIDIA CUDA for high-performance GPU execution, exposing that GPU parallelism and high memory bandwidth through user-friendly Python interfaces. RAPIDS provides several Python API including cuDF, a GPU DataFrame library with a pandas-like API, and cuML, a GPU-accelerated library of machine learning algorithms.
This Dockerfile builds an image that is derived from the current [RAPIDS image](https://ngc.nvidia.com/catalog/containers) but which is compatible for launching from the Kubeflow notebook controller. [RAPIDS](https://devblogs.nvidia.com/gpu-accelerated-analytics-rapids/) uses NVIDIA CUDA for high-performance GPU execution, exposing that GPU parallelism and high memory bandwidth through user-friendly Python interfaces. RAPIDS provides several Python API including cuDF, a GPU DataFrame library with a pandas-like API, and cuML, a GPU-accelerated library of machine learning algorithms.

Requirements:

Expand All @@ -21,22 +21,3 @@ Specify whatever repo and image tag you need for your purposes.
## cuDF E2E notebook

The demonstration of the cuDF API in `E2E.ipynb` performs intensive ETL of Fannie Mae mortgage data that can be downloaded into your notebook. The defaults in the notebook use 8 dask workers (for 8 GPU) and assumes mortgage data for 16 years partitioned across 16 files. This configuration may exceed the capabilities of what you actually have available for GPU count and GPU RAM. Simply change the defaults in notebook cells 2 and 4 to match workers to available GPU and use a reduced span of years and partitions from the mortgage dataset.

## GKE deployment notes

### Driver

It is possible to run this image with the latest NVIDIA drivers available, even if they are not installed yet in GKE. You need to login into your Kubeflow GKE cluster and apply this [daemonset](https://github.com/GoogleCloudPlatform/container-engine-accelerators/blob/master/daemonset.yaml) for the installation of the desired NVIDIA drivers. You must add the following environment variables to the init container in the daemonset YAML:

```yaml
- name: NVIDIA_DRIVER_VERSION
value: "410.79"
- name: IGNORE_MISSING_MODULE_SYMVERS
value: "1"
```
Once you have applied the new daemonset, you must recycle the GPU nodes in your cluster. Note this is just a temporary solution until such time that newer NVIDIA drivers are added to GKE, and in fact may no longer be applicable at some point in the future.
### Linker
When deploying this Kubeflow image to GKE, you will need to ensure that the `/etc/ld.so.cache` is updated after launch in order to use the cuDF Python API. Simply open a terminal in JupyterHub and execute `ldconfig`. Permissions have been modified in this image to allow the `ld.so.cache` update by the `jovyan` user. cuDF provides shared libs that are currently unable to dynamically resolve their CUDA dependencies from the `LD_LIBRARY_PATH` alone.

0 comments on commit 69fb4ec

Please sign in to comment.