Skip to content

Commit

Permalink
Merge pull request #11 from montgomerymt-NIH/master
Browse files Browse the repository at this point in the history
Introduction to DS Update to use Instance instead of User Managed notebook
  • Loading branch information
kyleoconnell-NIH authored Oct 29, 2024
2 parents 46aeadb + e6863f1 commit 46c0571
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 13 deletions.
6 changes: 3 additions & 3 deletions GoogleCloud/1- Intro to Machine Learning Decision Trees.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1293,7 +1293,7 @@
"!conda install -c anaconda graphviz -y\n",
"import graphviz\n",
"# Quantitative metrics of Model performance\n",
"from sklearn.metrics import mean_squared_error"
"from sklearn.metrics import root_mean_squared_error"
]
},
{
Expand Down Expand Up @@ -2041,7 +2041,7 @@
"outputs": [],
"source": [
"# This prints the RMSE value for the performance of the model using 2020 Data\n",
"print(f\"RMSE on 2020 test set: {mean_squared_error(pred_vs_test_2020['cases_per_100000'], pred_vs_test_2020['Predicted'], squared=False)}\")"
"print(f\"RMSE on 2020 test set: {root_mean_squared_error(pred_vs_test_2020['cases_per_100000'], pred_vs_test_2020['Predicted'])}\")"
]
},
{
Expand All @@ -2054,7 +2054,7 @@
"outputs": [],
"source": [
"# This prints the RMSE value for the performance of the model using 2020 Data\n",
"print(f\"RMSE on 2021 test set: {mean_squared_error(pred_vs_test_2021['cases_per_100000'], pred_vs_test_2020['Predicted'], squared=False)}\")"
"print(f\"RMSE on 2021 test set: {root_mean_squared_error(pred_vs_test_2021['cases_per_100000'], pred_vs_test_2020['Predicted'])}\")"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@
"import seaborn as sns\n",
"\n",
"# Quantitative metrics of Model performance\n",
"from sklearn.metrics import mean_squared_error"
"from sklearn.metrics import root_mean_squared_error"
]
},
{
Expand Down Expand Up @@ -191,7 +191,7 @@
"outputs": [],
"source": [
"# This prints the RMSE value for the performance of the model using 2020 data\n",
"print(f\"RMSE on 2020 test set: {mean_squared_error(pred_vs_test_2020['cases_per_100000'], pred_vs_test_2020['Predicted'], squared=False)}\")"
"print(f\"RMSE on 2020 test set: {root_mean_squared_error(pred_vs_test_2020['cases_per_100000'], pred_vs_test_2020['Predicted'])}\")"
]
},
{
Expand All @@ -202,7 +202,7 @@
"outputs": [],
"source": [
"# This prints the RMSE value for the performance of the model using 2021 data\n",
"print(f\"RMSE on 2021 test set: {mean_squared_error(pred_vs_test_2021['cases_per_100000'], pred_vs_test_2021['Predicted'], squared=False)}\")"
"print(f\"RMSE on 2021 test set: {root_mean_squared_error(pred_vs_test_2021['cases_per_100000'], pred_vs_test_2021['Predicted'])}\")"
]
},
{
Expand Down Expand Up @@ -342,9 +342,9 @@
"outputs": [],
"source": [
"# This code calculates the correlations within 2020 and 2021 data variables\n",
"correlations_2020 = S2020_training.corr().round(2)\n",
"correlations_2020 = S2020_training.iloc[:,1:].corr().round(2)\n",
"matrix2020 = np.triu(np.ones_like(correlations_2020))\n",
"correlations_2021 = S2021_training.corr().round(2)\n",
"correlations_2021 = S2021_training.iloc[:,1:].corr().round(2)\n",
"matrix2021 = np.triu(np.ones_like(correlations_2021))\n",
"\n",
"# The following creates a composite graph showcasing the correlation charts between both years\n",
Expand Down
6 changes: 3 additions & 3 deletions GoogleCloud/4- Practice - Answer Key.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@
"import graphviz\n",
"\n",
"# Quantitative metrics of Model performance\n",
"from sklearn.metrics import mean_squared_error"
"from sklearn.metrics import root_mean_squared_error"
]
},
{
Expand Down Expand Up @@ -421,7 +421,7 @@
"outputs": [],
"source": [
"# This calculates the RMSE for Model 2020 (OLD MODEL)\n",
"print(f\"RMSE for Model 2020: {mean_squared_error(Old_model['cases_per_100000'], Old_model['Predicted'], squared=False)}\")"
"print(f\"RMSE for Model 2020: {root_mean_squared_error(Old_model['cases_per_100000'], Old_model['Predicted'])}\")"
]
},
{
Expand All @@ -432,7 +432,7 @@
"outputs": [],
"source": [
"# This calculates the RMSE for Model 2021 (NEW MODEL)\n",
"print(f\"RMSE for Model 2021: {mean_squared_error(New_model['cases_per_100000'], New_model['Predicted'], squared=False)}\")"
"print(f\"RMSE for Model 2021: {root_mean_squared_error(New_model['cases_per_100000'], New_model['Predicted'])}\")"
]
},
{
Expand Down
5 changes: 3 additions & 2 deletions GoogleCloud/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,15 +61,16 @@ Included is a tutorial in the form of Jupyter notebooks. The main purpose of the

**3)** Now you will need to download the tutorial files from GitHub. The easiest way to do this would be to clone the repository from NIGMS into your Vertex AI notebook. This can be done by using the `Git` menu in JupyterLab, and selecting the clone option. To clone this repository, use the Git command `git clone https://github.com/NIGMS/Introduction-to-Data-Science-for-Biology.git` in the dropdown menu option in Jupyter notebook. Please make sure you only enter the link for the repository that you want to clone. There are other bioinformatics related learning modules available in the [NIGMS Repository](https://github.com/NIGMS). This will download our tutorial files into a folder called `Introduction-to-Data-Science-for-Biology`.

**3.1)** (Alternative Method) If any menus do not work as described in step 3 above, this is an alternative method to achieve the same result: [New Terminal How-to](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToCreateNewTerminalConsoleJupyterLab.md)

**IMPORTANT NOTE**

Make sure that after you are done with the module, close the tab that appeared when you clicked **OPEN JUPYTERLAB**, then check the box next to the name of the notebook you created in [step 3](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/docs/vertexai.md#:~:text=Click%20Create%20New-,Select,-Advanced%20Options%20at). Then click on **STOP** at the top of the Workbench menu. Wait and make sure that the icon next to your notebook is grayed out.
Make sure that after you are done with the module, close the tab that appeared when you clicked **OPEN JUPYTERLAB**, then check the box next to the name of the notebook you created in [step 3](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToCreateVertexAINotebooks.md#:~:text=Select%20Advanced%20Options%20at%20the%20bottom%20of%20the%20New%20Instance). Then click on **STOP** at the top of the Workbench menu. Wait and make sure that the icon next to your notebook is grayed out.

## **Software Requirements**

Software requirements are satisfied by using a pre-made Google Cloud Platform environment Workbench Notebook. The notebook environment used is named **"Python 3 with Intel® MKL"** ; and it is listed during Step 3 for accessing our module. Software requirements are described in notebook **"Intro to Machine Learning Decision Trees"** step 1.


## **Architecture Design**

Submodule 1 and Submodule 3 will download CSV files stored in a Google Cloud Storage bucket to the Workbench notebook, then it will output additional CSV files that will be used optionally if students want to work on the (optional) Submodule 2. Below is a diagram that illustrates our workflow:
Expand Down

0 comments on commit 46c0571

Please sign in to comment.