-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make AzureML examples more self-contained #484
Make AzureML examples more self-contained #484
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -10,6 +10,8 @@ select = [ | |||
"F", | |||
# isort | |||
"I", | |||
# numpy | |||
"NPY", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deployment of the examples/rapids-azureml-hpo/train_rapids.py
script failed like this:
Traceback (most recent call last):
File "/mnt/azureml/cr/j/525ffdb43cda47b1bd9386f0b02d17ae/exe/wd/train_rapids.py", line 172, in <module>
main()
File "/mnt/azureml/cr/j/525ffdb43cda47b1bd9386f0b02d17ae/exe/wd/train_rapids.py", line 65, in main
mlflow.log_param("n_estimators", np.int(args.n_estimators))
^^^^^^
File "/opt/conda/lib/python3.12/site-packages/numpy/__init__.py", line 394, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'inf'?
Because it's using things like np.int()
that were removed in NumPy 2.0, and NumPy 2.x is making it into the environment.
Adding this ruff
rules catches and auto-fixes such things.
# Activate rapids conda environment | ||
RUN /bin/bash -c "source activate rapids && pip install azureml-mlflow" | ||
RUN conda install --yes -c conda-forge 'dask-ml>=2024.4.4' \ | ||
&& pip install azureml-mlflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately azureml-mlflow
is not available as a conda package. Tracking request to get it on conda-forge
, if you want to subscribe: conda-forge/staged-recipes#23432
rm -rf /var/lib/apt/lists/* | ||
|
||
# Activate rapids conda environment | ||
RUN /bin/bash -c "source activate rapids && pip install azureml-mlflow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not a rapids
conda env in the RAPIDS images any more.
**3. Config.** Within the Workspace, download the `config.json` file, as you will load the details to initialize workspace for running ML training jobs from within your notebook. | ||
|
||
![Screenshot of download config file](../../images/azureml-download-config-file.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AzureML puts this config file into JupyterLab's filesystem at ${JUPYTER_SERVER_ROOT}/config.json
... so we can remove this manual step to simplify things a bit 🎉
Select **Compute** > **+ New** (Create compute instance) > choose a [RAPIDS compatible GPU](https://medium.com/dropout-analytics/which-gpus-work-with-rapids-ai-f562ef29c75f) VM size (e.g., `Standard_NC12s_v3`) | ||
Select **New** > **Compute instance** (Create compute instance) > choose a [RAPIDS compatible GPU](https://docs.rapids.ai/install/#system-req) VM size (e.g., `Standard_NC12s_v3`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This blogpost is from 2019 and not from an NVIDIA or RAPIDS account... let's point to the RAPIDS install selector instead for information about what GPUs RAPIDS is compatible with.
"\n", | ||
"RUN conda install --yes -c conda-forge 'dask-ml>=2024.4.4' \\\n", | ||
" && pip install azureml-mlflow\n", | ||
"EOF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Dockerfile is so minimal... proposing here that we just inline the creation of it in the notebook, to remove the friction of having to create a file and populate it.
This also has the nice side benefit of putting it through the templating we do when rendering the docs, so the base image will now get updates (notice the in the deleted file rapids-azureml-hpo/Dockerfile
, it's using RAPIDS 23.02 😱 ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to go through this so thoroughly @jameslamb.
Contributes to:
I worked through these AzureML examples:
This proposes fixes and simplifications to those examples. Confirmed that they run successfully end-to-end with these changes (and produce some fun plots).