Merge pull request #11 from montgomerymt-NIH/master

Introduction to DS Update to use Instance instead of User Managed notebook
NIGMS · Oct 29, 2024 · 46c0571 · 46c0571
2 parents 46aeadb + e6863f1
commit 46c0571
Show file tree

Hide file tree

Showing 4 changed files with 14 additions and 13 deletions.
diff --git a/GoogleCloud/1- Intro to Machine Learning Decision Trees.ipynb b/GoogleCloud/1- Intro to Machine Learning Decision Trees.ipynb
@@ -1293,7 +1293,7 @@
     "!conda install -c anaconda graphviz -y\n",
     "import graphviz\n",
     "# Quantitative metrics of Model performance\n",
-    "from sklearn.metrics import mean_squared_error"
+    "from sklearn.metrics import root_mean_squared_error"
    ]
   },
   {
@@ -2041,7 +2041,7 @@
    "outputs": [],
    "source": [
     "# This prints the RMSE value for the performance of the model using 2020 Data\n",
-    "print(f\"RMSE on 2020 test set: {mean_squared_error(pred_vs_test_2020['cases_per_100000'], pred_vs_test_2020['Predicted'], squared=False)}\")"
+    "print(f\"RMSE on 2020 test set: {root_mean_squared_error(pred_vs_test_2020['cases_per_100000'], pred_vs_test_2020['Predicted'])}\")"
    ]
   },
   {
@@ -2054,7 +2054,7 @@
    "outputs": [],
    "source": [
     "# This prints the RMSE value for the performance of the model using 2020 Data\n",
-    "print(f\"RMSE on 2021 test set: {mean_squared_error(pred_vs_test_2021['cases_per_100000'], pred_vs_test_2020['Predicted'], squared=False)}\")"
+    "print(f\"RMSE on 2021 test set: {root_mean_squared_error(pred_vs_test_2021['cases_per_100000'], pred_vs_test_2020['Predicted'])}\")"
    ]
   },
   {

diff --git a/... (Optional) Quant. Comparison of 2020 DT Model Performance for (2020 vs 2021) Data .ipynb b/... (Optional) Quant. Comparison of 2020 DT Model Performance for (2020 vs 2021) Data .ipynb
@@ -115,7 +115,7 @@
     "import seaborn as sns\n",
     "\n",
     "# Quantitative metrics of Model performance\n",
-    "from sklearn.metrics import mean_squared_error"
+    "from sklearn.metrics import root_mean_squared_error"
    ]
   },
   {
@@ -191,7 +191,7 @@
    "outputs": [],
    "source": [
     "# This prints the RMSE value for the performance of the model using 2020 data\n",
-    "print(f\"RMSE on 2020 test set: {mean_squared_error(pred_vs_test_2020['cases_per_100000'], pred_vs_test_2020['Predicted'], squared=False)}\")"
+    "print(f\"RMSE on 2020 test set: {root_mean_squared_error(pred_vs_test_2020['cases_per_100000'], pred_vs_test_2020['Predicted'])}\")"
    ]
   },
   {
@@ -202,7 +202,7 @@
    "outputs": [],
    "source": [
     "# This prints the RMSE value for the performance of the model using 2021 data\n",
-    "print(f\"RMSE on 2021 test set: {mean_squared_error(pred_vs_test_2021['cases_per_100000'], pred_vs_test_2021['Predicted'], squared=False)}\")"
+    "print(f\"RMSE on 2021 test set: {root_mean_squared_error(pred_vs_test_2021['cases_per_100000'], pred_vs_test_2021['Predicted'])}\")"
    ]
   },
   {
@@ -342,9 +342,9 @@
    "outputs": [],
    "source": [
     "# This code calculates the correlations within 2020 and 2021 data variables\n",
-    "correlations_2020 = S2020_training.corr().round(2)\n",
+    "correlations_2020 = S2020_training.iloc[:,1:].corr().round(2)\n",
     "matrix2020 = np.triu(np.ones_like(correlations_2020))\n",
-    "correlations_2021 = S2021_training.corr().round(2)\n",
+    "correlations_2021 = S2021_training.iloc[:,1:].corr().round(2)\n",
     "matrix2021 = np.triu(np.ones_like(correlations_2021))\n",
     "\n",
     "# The following creates a composite graph showcasing the correlation charts between both years\n",

diff --git a/GoogleCloud/4- Practice - Answer Key.ipynb b/GoogleCloud/4- Practice - Answer Key.ipynb
@@ -149,7 +149,7 @@
     "import graphviz\n",
     "\n",
     "# Quantitative metrics of Model performance\n",
-    "from sklearn.metrics import mean_squared_error"
+    "from sklearn.metrics import root_mean_squared_error"
    ]
   },
   {
@@ -421,7 +421,7 @@
    "outputs": [],
    "source": [
     "# This calculates the RMSE for Model 2020 (OLD MODEL)\n",
-    "print(f\"RMSE for Model 2020: {mean_squared_error(Old_model['cases_per_100000'], Old_model['Predicted'], squared=False)}\")"
+    "print(f\"RMSE for Model 2020: {root_mean_squared_error(Old_model['cases_per_100000'], Old_model['Predicted'])}\")"
    ]
   },
   {
@@ -432,7 +432,7 @@
    "outputs": [],
    "source": [
     "# This calculates the RMSE for Model 2021 (NEW MODEL)\n",
-    "print(f\"RMSE for Model 2021: {mean_squared_error(New_model['cases_per_100000'], New_model['Predicted'], squared=False)}\")"
+    "print(f\"RMSE for Model 2021: {root_mean_squared_error(New_model['cases_per_100000'], New_model['Predicted'])}\")"
    ]
   },
   {

diff --git a/GoogleCloud/README.md b/GoogleCloud/README.md
@@ -61,15 +61,16 @@ Included is a tutorial in the form of Jupyter notebooks. The main purpose of the
 
 **3)** Now you will need to download the tutorial files from GitHub. The easiest way to do this would be to clone the repository from NIGMS into your Vertex AI notebook. This can be done by using the `Git` menu in JupyterLab, and selecting the clone option. To clone this repository, use the Git command `git clone https://github.com/NIGMS/Introduction-to-Data-Science-for-Biology.git` in the dropdown menu option in Jupyter notebook. Please make sure you only enter the link for the repository that you want to clone. There are other bioinformatics related learning modules available in the [NIGMS Repository](https://github.com/NIGMS). This will download our tutorial files into a folder called `Introduction-to-Data-Science-for-Biology`.
 
+**3.1)** (Alternative Method) If any menus do not work as described in step 3 above, this is an alternative method to achieve the same result: [New Terminal How-to](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToCreateNewTerminalConsoleJupyterLab.md)
+
 **IMPORTANT NOTE** 
 
-Make sure that after you are done with the module, close the tab that appeared when you clicked **OPEN JUPYTERLAB**, then check the box next to the name of the notebook you created in [step 3](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/docs/vertexai.md#:~:text=Click%20Create%20New-,Select,-Advanced%20Options%20at). Then click on **STOP** at the top of the Workbench menu. Wait and make sure that the icon next to your notebook is grayed out.
+Make sure that after you are done with the module, close the tab that appeared when you clicked **OPEN JUPYTERLAB**, then check the box next to the name of the notebook you created in [step 3](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToCreateVertexAINotebooks.md#:~:text=Select%20Advanced%20Options%20at%20the%20bottom%20of%20the%20New%20Instance). Then click on **STOP** at the top of the Workbench menu. Wait and make sure that the icon next to your notebook is grayed out.
 
 ## **Software Requirements**
 
 Software requirements are satisfied by using a pre-made Google Cloud Platform environment Workbench Notebook. The notebook environment used is named **"Python 3 with Intel® MKL"** ; and it is listed during Step 3 for accessing our module. Software requirements are described in notebook **"Intro to Machine Learning Decision Trees"** step 1. 
 
-
 ## **Architecture Design**
 
 Submodule 1 and Submodule 3 will download CSV files stored in a Google Cloud Storage bucket to the Workbench notebook, then it will output additional CSV files that will be used optionally if students want to work on the (optional) Submodule 2. Below is a diagram that illustrates our workflow: