The main aim of a recommendation system is to recommend one or more items to users of the system. Examples of an item to be recommended, might be a movie, restaurant, book, or song. In general, the user is an entity with item preferences such as a person, a group of persons, or any other type of entity you can imagine.
There are two principal approaches to recommender systems:
- The
content-based
approach, which makes use of features for both users and items. Users can be described by properties such as age or gender. Items can be described by properties such as the author or the manufacturer. Typical examples of content-based recommendation systems can be found on social matchmaking sites. - The
Collaborative filtering
approach, which uses only identifiers of the users and the items. It is based on a matrix of ratings given by the users to the items. The main source of information about a user is the list the items they've rated and the similarity with other users who have rated the same items.
The SVD recommender module in Azure Machine Learning designer is based on the Singular Value Decomposition algorithm. It uses identifiers of the users and the items, and a matrix of ratings given by the users to the items. It's a typical example of collaborative recommender.
In this lab, we make use of the Train SVD Recommender module available in Azure Machine Learning designer (preview), to train a movie recommender engine. We use the collaborative filtering approach: the model learns from a collection of ratings made by users on a subset of a catalog of movies. Two open datasets available in Azure Machine Learning designer are used the IMDB Movie Titles
dataset joined on the movie identifier with the Movie Ratings
dataset.
The Movie Ratings data consists of approximately 225,000 ratings for 15,742 movies by 26,770 users, extracted from Twitter using techniques described in the original paper by Dooms, De Pessemier and Martens. The paper and data can be found on GitHub.
We will both train the engine and score new data, to demonstrate the different modes in which a recommender can be used and evaluated. The trained model will predict what rating a user will give to unseen movies, so we'll be able to recommend movies that the user is most likely to enjoy. We will do all of this from the Azure Machine Learning designer without writing a single line of code.
-
In Azure portal, open the available machine learning workspace.
-
Select Launch now under the Try the new Azure Machine Learning studio message.
-
When you first launch the studio, you may need to set the directory and subscription. If so, you will see this screen:
For the directory, select Udacity and for the subscription, select Azure Sponsorship. For the machine learning workspace, you may see multiple options listed. Select any of these (it doesn't matter which) and then click Get started.
-
From the studio, select Designer, +. This will open a
visual pipeline authoring editor
.
-
In the settings panel on the right, select Select compute target.
-
In the
Set up compute target
editor, select the available compute, and then select Save.
Note: If you are facing difficulties in accessing pop-up windows or buttons in the user interface, please refer to the Help section in the lab environment.
-
Select Datasets section in the left navigation. Next, select Samples, Movie Ratings and drag and drop the selected dataset on to the canvas.
-
Select Datasets section in the left navigation. Next, select Samples, IMDB Movie Titles and drag and drop the selected dataset on to the canvas.
-
Select Data Transformation section in the left navigation. Follow the steps outlined below:
-
Select the Join Data prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the output of the
Movie Ratings
module to the first input of theJoin Data
module. -
Connect the output of the
IMDB Movie Titles
module to the second input of theJoin Data
module.
-
-
Select the
Join Data
module. -
Select the Edit column link to open the
Join key columns for left dataset
editor. Select the MovieId column in theEnter column name
field. -
Select the Edit column link to open the
Join key columns for right dataset
editor. Select the Movie ID column in theEnter column name
field.
Note that you can submit the pipeline at any point to peek at the outputs and activities. Running pipeline also generates metadata that is available for downstream activities such selecting column names from a list in selection dialogs.
-
Select Python Language section in the left navigation. Follow the steps outlined below:
-
Select the Execute Python Script prebuilt module.
-
Drag and drop the selected module on to the canvas.
-
Connect the
Join Data
output to the input of theExecute Python Script
module.
-
-
Select Edit code to open the
Python script
editor, clear the existing code and then enter the following lines of code to select the UserId, Movie Name, Rating columns from the joined dataset. Please ensure that there is no indentation for the first line and the second and third lines are indented.def azureml_main(dataframe1 = None, dataframe2 = None): df1 = dataframe1[['UserId','Movie Name','Rating']] return df1,
Note: In other pipelines, for selecting a list of columns from a dataset, we could have used the
Select Columns from Dataset
prebuilt module. This one returns the columns in the same order as in the input dataset. This time we need the output dataset to be in the format: user id, movie name, rating.This column order is required at the input of the Train SVD Recommender module.
-
Select Data Transformation section in the left navigation. Follow the steps outlined below:
-
Select the Remove Duplicate Rows prebuilt module.
-
Drag and drop the selected module on to the canvas.
-
Connect the first output of the
Execute Python Script
to the input of theRemove Duplicate Rows
module. -
Select the Edit columns link to open the
Select columns
editor and then enter the following list of columns to be included in the output dataset: Movie Name, UserId.
-
-
Select Data Transformation section in the left navigation. Follow the steps outlined below:
-
Select the Split Data prebuilt module
-
Drag and drop the selected module on to the canvas
-
Fraction of rows in the first output dataset: 0.5
-
Connect the
Dataset
to theSplit Data
module
-
-
Select Recommendation section in the left navigation. Follow the steps outlined below:
-
Select the Train SVD Recommender prebuilt module.
-
Drag and drop the selected module on to the canvas
-
Connect the first output of the
Split Data
module to the input of theTrain SVD Recommender
module -
Number of factors: 200. This option specify the number of factors to use with the recommender. With the number of users and items increasing, it's better to set a larger number of factors. But if the number is too large, performance might drop.
-
Number of recommendation algorithm iterations: 30. This number indicates how many times the algorithm should process the input data. The higher this number is, the more accurate the predictions are. However, a higher number means slower training. The default value is 30.
-
For Learning rate: 0.001. The learning rate defines the step size for learning.
-
-
Select Data Transformation section in the left navigation. Follow the steps outlined below:
-
Select the Select Columns in Dataset prebuilt module.
-
Drag and drop the selected module on to the canvas.
-
Connect the
Split Data
second output to the input of theSelect columns in Dataset
module. -
Select the Edit columns link to open the
Select columns
editor and then enter the following list of columns to be included in the output dataset: UserId, Movie Name.
-
-
Select Recommendation section in the left navigation. Follow the steps outlined below:
-
Select the Score SVD Recommender prebuilt module.
-
Drag and drop the selected module on to the canvas
-
Connect the output of the
Train SVD Recommender
module to the first input of theScore SVD Recommender
module, which is the Trained SVD Recommendation input. -
Connect the output of the
Select Columns in Dataset
module to the second input of theScore SVD Recommender
module, which is the Dataset to score input. -
Select the
Score SVD Recommender
module on the canvas. -
Recommender prediction kind: Rating Prediction. For this option, no other parameters are required. When you predict ratings, the model calculates how a user will react to a particular item, given the training data. The input data for scoring must provide both a user and the item to rate.
-
-
Select Recommendation section in the left navigation. Follow the steps outlined below:
-
Select the Evaluate Recommender prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the
Score SVD Recommender
module to the second input of theEvaluate Recommender
module, which is the Scored dataset input. -
Connect the second output of the
Split Data
module (train set) to the first input of theEvaluate Recommender
module, which is the Test dataset input.
-
-
Select Submit on the right corner of the canvas to open the
Setup pipeline run
editor.Please note that the button name in the UI is changed from Run to Submit.
-
In the
Setup pipeline run editor
, select Experiment, Create new and provideNew experiment name:
movie-recommender, and then select Submit. -
Wait for pipeline run to complete. It will take around 20 minutes to complete the run.
-
While you wait for the model training to complete, you can learn more about the SVD algorithm used in this lab by selecting Train SVD Recommender.
-
Select Score SVD Recommender, Outputs, Visualize to open the
Score SVD Recommender result visualization
dialog or just simply right-click theScore SVD Recommender
module and select Visualize Scored dataset. -
Observe the predicted values under the column Rating.
-
Select Evaluate Recommender, Outputs, Visualize to open the
Evaluate Recommender result visualization
dialog or just simply right-click theEvaluate Recommender
module and select Visualize Evaluation Results. -
Evaluate the model performance by reviewing the various evaluation metrics, such as Mean Absolute Error, Root Mean Squared Error, etc.
Congratulations! You have trained a simple movie recommender using the prebuilt Recommender modules in the AML visual designer. You can continue to experiment in the environment but are free to close the lab environment tab and return to the Udacity portal to continue with the lesson.