Azure Machine Learning designer (preview) gives you a cloud-based interactive, visual workspace that you can use to easily and quickly prep data, train and deploy machine learning models. It supports Azure Machine Learning compute, GPU or CPU. Machine Learning designer also supports publishing models as web services on Azure Kubernetes Service that can easily be consumed by other applications.
In this lab, we will be using a subset of NYC Taxi & Limousine Commission - green taxi trip records available from Azure Open Datasets. The data is enriched with holiday and weather data. Based on the enriched dataset, we will learn to use the Azure Machine Learning Graphical Interface to process data, build, train, score, and evaluate a regression model to predict NYC taxi fares. To train the model, we will create Azure Machine Learning Compute resource. We will do all of this from the Azure Machine Learning designer without writing a single line of code.
-
In Azure portal, open the available machine learning workspace.
-
Select Launch now under the Try the new Azure Machine Learning studio message.
-
When you first launch the studio, you may need to set the directory and subscription. If so, you will see this screen:
For the directory, select Udacity and for the subscription, select Azure Sponsorship. For the machine learning workspace, you may see multiple options listed. Select any of these (it doesn't matter which) and then click Get started.
-
From the studio, select Datasets, + Create dataset, From web files. This will open the
Create dataset from web files
dialog on the right. -
In the Web URL field provide the following URL for the training data file:
https://introtomlsampledata.blob.core.windows.net/data/nyc-taxi/nyc-taxi-sample-data.csv
-
Provide
nyc-taxi-sample-data
as the Name, leave the remaining values at their defaults and select Next.
-
On the Settings and preview panel, set the column headers drop down to
All files have same headers
. -
Scroll the data preview to right to observe the target column:
totalAmount
. After you are done reviewing the data, select Next
-
Select columns from the dataset to include as part of your training data. Leave the default selections and select Next
-
In the settings panel on the right, select Select compute target.
-
In the
Set up compute target
editor, select the available compute, and then select Save.
Note: If you are facing difficulties in accessing pop-up windows or buttons in the user interface, please refer to the Help section in the lab environment.
-
Select Datasets section in the left navigation. Next, select My Datasets, nyc-taxi-sample-data and drag and drop the selected dataset on to the canvas.
-
Select Data Transformation section in the left navigation. Follow the steps outlined below:
-
Select the Split Data prebuilt module
-
Drag and drop the selected module on to the canvas
-
Fraction of rows in the first output dataset: 0.7
-
Connect the
Dataset
to theSplit Data
module
-
Note that you can submit the pipeline at any point to peek at the outputs and activities. Running pipeline also generates metadata that is available for downstream activities such selecting column names from a list in selection dialogs.
-
Select Machine Learning Algorithms section in the left navigation. Follow the steps outlined below:
-
Select the Linear Regression prebuilt module
-
Drag and drop the selected module on to the canvas
-
-
Select Model Training section in the left navigation. Follow the steps outlined below:
-
Select the Train Model prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the
Linear Regression
module to the first input of theTrain Model
module -
Connect the first output of the
Split Data
module to the second input of theTrain Model
module -
Select the Edit column link to open the
Label column
editor
-
-
The
Label column
editor allows you to specify yourLabel or Target column
. Type in the label column name totalAmount and then select Save.
-
Select Model Scoring & Evaluation section in the left navigation. Follow the steps outlined below:
-
Select the Score Model prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the
Train Model
module to the first input of theScore Model
module -
Connect the second output of the
Split Data
module to the second input of theScore Model
module
-
Note that Split Data
module will feed data for both model training and model scoring. The first output (0.7 fraction) will connect with the Train Model
module and the second output (0.3 fraction) will connect with the Score Model
module.
-
Select Model Scoring & Evaluation section in the left navigation. Follow the steps outlined below:
-
Select the Evaluate Model prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the
Score Model
module to the first input of theEvaluate Model
module
-
-
Select Submit to open the
Setup pipeline run
editor.Please note that the button name in the UI is changed from Run to Submit.
-
In the
Setup pipeline run editor
, select Experiment, Create new and provideNew experiment name:
designer-run, and then select Submit. -
Wait for pipeline run to complete. It will take around 8 minutes to complete the run.
-
While you wait for the model training to complete, you can learn more about the training algorithm used in this lab by selecting Linear Regression module.
-
Select Score Model, Outputs, Visualize to open the
Score Model result visualization
dialog. -
Observe the predicted values under the column Scored Labels. You can compare the predicted values (
Scored Labels
) with actual values (totalAmount
).
-
Select Evaluate Model, Outputs, Visualize to open the
Evaluate Model result visualization
dialog. -
Evaluate the model performance by reviewing the various evaluation metrics, such as Mean Absolute Error, Root Mean Squared Error, etc.
Congratulations! You have trained and evaluated your first machine learning model. You can continue to experiment in the environment but are free to close the lab environment tab and return to the Udacity portal to continue with the lesson.