Contrib notebook refresh (kubeflow#3423)

ricardov1 · Jun 7, 2019 · 69fb4ec · 69fb4ec
1 parent 32e4190
commit 69fb4ec
Show file tree

Hide file tree

Showing 4 changed files with 8 additions and 72 deletions.
diff --git a/components/contrib/kaggle-notebook-image/Dockerfile b/components/contrib/kaggle-notebook-image/Dockerfile
@@ -1,6 +1,3 @@
-# Copyright (c) Jupyter Development Team.
-# Distributed under the terms of the Modified BSD License.
-
 # use basic syntax for now
 FROM gcr.io/kaggle-images/python:latest
 
@@ -56,25 +53,8 @@ RUN cd /tmp && \
 
 RUN chown -R ${NB_USER}:users $HOME
 
-ENV GITHUB_REF https://raw.githubusercontent.com/kubeflow/kubeflow/master/components/tensorflow-notebook-image
-
-ADD --chown=jovyan:users $GITHUB_REF/jupyter_notebook_config.py /tmp
-
-# Wipe $HOME for PVC detection later
-WORKDIR $HOME
-RUN rm -fr $(ls -A $HOME)
-
-# Get init scripts from kubeflow
-ADD --chown=jovyan:users \
-    $GITHUB_REF/start-singleuser.sh \
-    $GITHUB_REF/start-notebook.sh \
-    $GITHUB_REF/start.sh \
-    $GITHUB_REF/pvc-check.sh \
-    /usr/local/bin/
-
-RUN chmod a+rx /usr/local/bin/*
-
 # Configure container startup
 EXPOSE 8888
+USER jovyan
 ENTRYPOINT ["tini", "--"]
-CMD ["start-notebook.sh"]
+CMD ["sh","-c", "jupyter notebook --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]
diff --git a/components/contrib/kaggle-notebook-image/README.md b/components/contrib/kaggle-notebook-image/README.md
@@ -1,13 +1,13 @@
 # Kaggle notebook
 
-This Dockerfile builds an image that is derived from the latest [Kaggle python image](https://github.com/Kaggle/docker-python) but which is compatible for launching from the Kubeflow JupyterHub. [Kaggle](https://www.kaggle.com/) is the home of data science collaboration and competition.
+This Dockerfile builds an image that is derived from the latest [Kaggle python image](https://github.com/Kaggle/docker-python) but which is compatible for launching from the Kubeflow notebook controller. [Kaggle](https://www.kaggle.com/) is the home of data science collaboration and competition.
 
 Important notes:
 * this notebook is not curated by the Kubeflow project and is not regularly tested
 * the versions of TensorFlow, PyTorch, and the other libraries included may change at any time
-* this is a very large notebook, over 21 Gb in size. Since our notebook uses the latest Kaggle image, docker pulls (and notebook launches) can take a lengthy period of time.
+* this is a large notebook image, over 15 Gb in size. Since our notebook uses the latest Kaggle image, docker pulls (and notebook launches) can take a lengthy period of time.
 * the base image size for docker devicemapper is 10 Gb, which won't be large enough to run this image. Your docker daemon must be configured for at least 30 Gb (`-storage-opt dm.basesize=30G`) or use a storage driver like overlay2.
-* the Kaggle image includes TensorFlow 1.9 or greater built with AVX2 support, so the image may not run on older CPU
+* the Kaggle image includes TensorFlow 1.13.1 or greater built with AVX2 support, so the image may not run on older CPU
 * the other Kubeflow curated notebooks have the feature that the jovyan user is able to install new packages from pip or conda. Unfortunately this is not the case with the Kaggle image due to the impact on the image size due to adding a new ownership layer.
 
 To build the image run:

diff --git a/components/contrib/rapidsai-notebook-image/Dockerfile b/components/contrib/rapidsai-notebook-image/Dockerfile
@@ -12,8 +12,6 @@ ENV HOME /home/$NB_USER
 ENV CONDA_DIR=/conda
 ENV PATH $CONDA_DIR/bin:$PATH
 ENV PATH $CONDA_DIR/envs/rapids/bin:$PATH
-# anticipate a default GKE nvidia mount
-ENV PATH /usr/local/nvidia/bin:$PATH
 
 # Use bash instead of sh
 SHELL ["/bin/bash", "-c"]
@@ -57,31 +55,8 @@ RUN cd /tmp && \
 
 RUN chown -R ${NB_USER}:users $HOME
 
-ENV GITHUB_REF https://raw.githubusercontent.com/kubeflow/kubeflow/master/components/tensorflow-notebook-image
-
-ADD --chown=jovyan:users $GITHUB_REF/jupyter_notebook_config.py /tmp
-
-# Wipe $HOME for PVC detection later
-WORKDIR $HOME
-RUN rm -fr $(ls -A $HOME)
-
-# Get init scripts from kubeflow
-ADD --chown=jovyan:users \
-    $GITHUB_REF/start-notebook.sh \
-    $GITHUB_REF/start-singleuser.sh \
-    $GITHUB_REF/start.sh \
-    $GITHUB_REF/pvc-check.sh \
-    /usr/local/bin/
-
-RUN chmod a+rx /usr/local/bin/*
-
-# HACK: GKE late-binding of NVIDIA driver mount
-# seems to leave us with a stale cache for the CUDA libs;
-# cudf librmm.so can't find libcuda* from LD_LIBRARY_PATH
-RUN chown -R ${NB_USER}:users /etc
-RUN sed -i '/JUPYTERHUB_API_TOKEN/i\ldconfig' /usr/local/bin/start-notebook.sh
-
 # Configure container startup
 EXPOSE 8888
+USER jovyan
 ENTRYPOINT ["tini", "--"]
-CMD ["start-notebook.sh"]
+CMD ["sh","-c", "jupyter notebook --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]
diff --git a/components/contrib/rapidsai-notebook-image/README.md b/components/contrib/rapidsai-notebook-image/README.md
@@ -1,6 +1,6 @@
 # Rapids AI notebook
 
-This Dockerfile builds an image that is derived from the current [RAPIDS image](https://ngc.nvidia.com/catalog/containers) but which is compatible for launching from the Kubeflow JupyterHub. [RAPIDS](https://devblogs.nvidia.com/gpu-accelerated-analytics-rapids/) uses NVIDIA CUDA for high-performance GPU execution, exposing that GPU parallelism and high memory bandwidth through user-friendly Python interfaces. RAPIDS provides several Python API including cuDF, a GPU DataFrame library with a pandas-like API, and cuML, a GPU-accelerated library of machine learning algorithms.
+This Dockerfile builds an image that is derived from the current [RAPIDS image](https://ngc.nvidia.com/catalog/containers) but which is compatible for launching from the Kubeflow notebook controller. [RAPIDS](https://devblogs.nvidia.com/gpu-accelerated-analytics-rapids/) uses NVIDIA CUDA for high-performance GPU execution, exposing that GPU parallelism and high memory bandwidth through user-friendly Python interfaces. RAPIDS provides several Python API including cuDF, a GPU DataFrame library with a pandas-like API, and cuML, a GPU-accelerated library of machine learning algorithms.
 
 Requirements:
 
@@ -21,22 +21,3 @@ Specify whatever repo and image tag you need for your purposes.
 ## cuDF E2E notebook
 
 The demonstration of the cuDF API in `E2E.ipynb` performs intensive ETL of Fannie Mae mortgage data that can be downloaded into your notebook. The defaults in the notebook use 8 dask workers (for 8 GPU) and assumes mortgage data for 16 years partitioned across 16 files. This configuration may exceed the capabilities of what you actually have available for GPU count and GPU RAM. Simply change the defaults in notebook cells 2 and 4 to match workers to available GPU and use a reduced span of years and partitions from the mortgage dataset.
-
-## GKE deployment notes
-
-### Driver
-
-It is possible to run this image with the latest NVIDIA drivers available, even if they are not installed yet in GKE. You need to login into your Kubeflow GKE cluster and apply this [daemonset](https://github.com/GoogleCloudPlatform/container-engine-accelerators/blob/master/daemonset.yaml) for the installation of the desired NVIDIA drivers. You must add the following environment variables to the init container in the daemonset YAML:
-
-```yaml
-          - name: NVIDIA_DRIVER_VERSION
-            value: "410.79"
-          - name: IGNORE_MISSING_MODULE_SYMVERS
-            value: "1"
-```
-
-Once you have applied the new daemonset, you must recycle the GPU nodes in your cluster. Note this is just a temporary solution until such time that newer NVIDIA drivers are added to GKE, and in fact may no longer be applicable at some point in the future.
-
-### Linker
-
-When deploying this Kubeflow image to GKE, you will need to ensure that the `/etc/ld.so.cache` is updated after launch in order to use the cuDF Python API. Simply open a terminal in JupyterHub and execute `ldconfig`. Permissions have been modified in this image to allow the `ld.so.cache` update by the `jovyan` user. cuDF provides shared libs that are currently unable to dynamically resolve their CUDA dependencies from the `LD_LIBRARY_PATH` alone.