Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML Notebook Enhancements #68

Open
luisquintanilla opened this issue Sep 15, 2022 · 0 comments
Open

ML Notebook Enhancements #68

luisquintanilla opened this issue Sep 15, 2022 · 0 comments

Comments

@luisquintanilla
Copy link
Contributor

luisquintanilla commented Sep 15, 2022

  • Add reference documentation links to classes like transforms and trainers. (i.e. LightGBM)
  • Include parameter names in method calls. (i.e. mlContext.Data.TrainTestSplit(data,testFraction: 0.2))
  • Use real data for examples. It makes it easier to understand the problem that's being solved opposed to randomly generated data.
  • Watch for code comments. Instead of embedding them in the code, promote them to text in a Markdown cell.

  • Put related code together. Break up cells containing large chunks of code and add Markdown cells explaining what each of the cells is doing.

Example

Original

var context =new MLContext(seed: 1);
var pipeline = context.Transforms.Concatenate("Features", "X")
  .Append(context.Auto().Regression("y", useLbfgs: false, useSdca: false, useFastForest: false));

var monitor = new NotebookMonitor();
var experiment = context.Auto().CreateExperiment();
experiment.SetPipeline(pipeline)
  .SetEvaluateMetric(RegressionMetric.RootMeanSquaredError, "y")
  .SetTrainingTimeInSeconds(30)
  .SetDataset(trainTestSplit.TrainSet, trainTestSplit.TestSet)
  .SetMonitor(monitor);

// Configure Visualizer			
monitor.SetUpdate(monitor.Display());

var res = await experiment.RunAsync();

Update

Initialize MLContext

MLContext is the starting point for all ML.NET applications.

var context =new MLContext(seed: 1);

Define training pipeline

  • Concatenate: Takes the input column X and creates a feature vector in the Features column.
  • Regression: Defines the task AutoML needs to find the best algorithm and hyperparameters for. In this case, Lbfgs, Sdca, and FastForest algorithms won't be explored since their respective parameters are set to false.
var pipeline = context.Transforms.Concatenate("Features", "X")
      .Append(context.Auto().Regression("y", useLbfgs: false, useSdca: false, useFastForest: false));

Initialize Monitor

The notebook monitor provides visualizations of the training progress as AutoML tries to find the best model for your data.

var monitor = new NotebookMonitor();

Initialize AutoML Experiment

An AutoML experiment is a collection of trials in which algorithms are explored.

var experiment = context.Auto().CreateExperiment();

Configure AutoML Experiment

The AutoML experiment tries to find the best algorithm using an evaluation metric. In this case, the evaluation metric selected is Root Mean Squared Error. The goal is to find the optimal evaluation metric in the provided training time which is set to 30 seconds. The longer you train, the more algorithms and hyperparameters AutoML is able to explore. The training set is the dataset that AutoML uses to train the model and the test set is used to calculate the evaluation metric to see how well a particular model selected by AutoML performs.

experiment.SetPipeline(pipeline)
        .SetEvaluateMetric(RegressionMetric.RootMeanSquaredError, "y")
        .SetTrainingTimeInSeconds(30)
        .SetDataset(trainTestSplit.TrainSet, trainTestSplit.TestSet)
        .SetMonitor(monitor);

Set monitor to display

monitor.SetUpdate(monitor.Display());

Run AutoML experiment

var res = await experiment.RunAsync();

  • NotebookMonitor: Display evaluation metric for best trial, active trial, and y-axis on graph.
  • When adding feeds, add link to document on how to reference them in VS / dotnet CLI
  • When installing NuGet packages that are not part of the BCL, list them in a Markdown cell where the packages are installed, and add a link to NuGet. (i.e. Microsoft.ML).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant