diff --git a/README.md b/README.md index 0a9f6a77..d1a1c030 100644 --- a/README.md +++ b/README.md @@ -63,6 +63,7 @@ In order to understand the tutorials you need to be familiar with general concep - [WandB](https://github.com/logicalclocks/hopsworks-tutorials/tree/master/integrations/wandb): Build a machine learning model with Weights & Biases. - [Great Expectations](https://github.com/logicalclocks/hopsworks-tutorials/tree/master/integrations/great_expectations): Introduction to Great Expectations concepts and classes which are relevant for integration with the Hopsworks MLOps platform. - [Neo4j](integrations/neo4j): Perform Anti-money laundering (AML) predictions using Neo4j Graph representation of transactions. + - [Polars](https://github.com/logicalclocks/hopsworks-tutorials/tree/master/advanced_tutorials/polars/quickstart.ipynb) : Introductory tutorial on using Polars. - [Monitoring](https://github.com/logicalclocks/hopsworks-tutorials/tree/master/integrations/monitoring): How to implement feature monitoring in your production pipeline. - [Bytewax](https://github.com/logicalclocks/hopsworks-tutorials/tree/master/integrations/bytewax): Real time feature computation using Bytewax. - [Apache Beam](https://github.com/logicalclocks/hopsworks-tutorials/tree/master/integrations/java/beam): Real time feature computation using Apache Beam, Google Cloud Dataflow and Hopsworks Feature Store. diff --git a/advanced_tutorials/polars/quickstart_polars.ipynb b/advanced_tutorials/polars/quickstart_polars.ipynb new file mode 100644 index 00000000..b9581ef4 --- /dev/null +++ b/advanced_tutorials/polars/quickstart_polars.ipynb @@ -0,0 +1,1788 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "8WLB6QFXksxw" + }, + "source": [ + "![Screenshot from 2022-06-16 14-24-57.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/quickstart.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is the quick start tutorial for using **Polars** with the **Hopsworks Feature Store**. As part of this tutorial, you will work with data related to credit card transactions. \n", + "The objective of this tutorial is to demonstrate how to use polars with the Hopworks Feature Store with a goal of training and saving a model that can predict fraudulent transactions. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NpwpPe1wxQ5M" + }, + "source": [ + "## 💽 Loading the Data \n", + "\n", + "The data you will use comes from three different CSV files:\n", + "\n", + "* credit_cards.csv: information such as the expiration date and provider.\n", + "* transactions.csv: events containing information about when a credit card was used, such as a timestamp, location, and the amount spent. A boolean fraud_label variable (True/False) tells us whether a transaction was fraudulent or not.\n", + "* profiles.csv: credit card user information such as birthdate and city of residence.\n", + "\n", + "In a production system, these CSV files would originate from separate data sources or tables, and probably separate data pipelines. All three files have a common credit card number column cc_num, which you will use later to join features together from the different datasets.\n", + "\n", + "Now, you can go ahead and load the data.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import joblib\n", + "import os\n", + "import time\n", + "\n", + "import polars as pl\n", + "import numpy as np\n", + "from matplotlib import pyplot\n", + "import seaborn as sns\n", + "from math import radians\n", + "\n", + "import xgboost as xgb\n", + "from sklearn.metrics import confusion_matrix\n", + "from sklearn.metrics import f1_score\n", + "\n", + "# Mute warnings\n", + "import warnings\n", + "warnings.filterwarnings(\"ignore\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 143 + }, + "id": "ARrJ_Bp5xMIk", + "outputId": "14e7a020-e04a-40d5-fdea-c4ba71f8a034" + }, + "outputs": [], + "source": [ + "# Specify the window length as \"4h\"\n", + "window_len = \"4h\"\n", + "\n", + "# Specify the URL for the data\n", + "url = \"https://repo.hops.works/master/hopsworks-tutorials/data/card_fraud_data/\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Read the 'credit_cards.csv' file\n", + "credit_cards_df = pl.read_csv(url + \"credit_cards.csv\")\n", + "\n", + "# Read the 'profiles.csv' file\n", + "# Parse the 'birthdate' column as dates\n", + "profiles_df = pl.read_csv(url + \"profiles.csv\", try_parse_dates=True)\n", + "\n", + "# Read the 'transactions.csv' file\n", + "# Parse the 'datetime' column as dates\n", + "trans_df = pl.read_csv(url + \"transactions.csv\", try_parse_dates=True)\n", + "\n", + "# Display the first 3 rows of the 'transactions.csv' DataFrame\n", + "trans_df.head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HPq2qUtNxjaM" + }, + "source": [ + "## 🛠️ Feature Engineering \n", + "\n", + "Fraudulent transactions can differ from regular ones in many different ways. Typical red flags would for instance be a large transaction volume/frequency in the span of a few hours. It could also be the case that elderly people in particular are targeted by fraudsters. To facilitate model learning, we will create additional features based on these patterns. In particular, we will create two types of features:\n", + "\n", + "* Features that aggregate data from different data sources. This could for instance be the age of a customer at the time of a transaction, which combines the birthdate feature from profiles.csv with the datetime feature from transactions.csv.\n", + "* Features that aggregate data from multiple time steps. An example of this could be the transaction frequency of a credit card in the span of a few hours, which is computed using a window function.\n", + "\n", + "Now you are ready to start with the first category.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "id": "ngEPnNzAxqsJ", + "outputId": "c8cf6082-1d1d-4bf7-9d81-ac2294146a27" + }, + "outputs": [], + "source": [ + "# Merge the 'trans_df' DataFrame with the 'profiles_df' DataFrame based on the 'cc_num' column\n", + "age_df = trans_df.join(profiles_df, on=\"cc_num\", how=\"left\")\n", + "\n", + "# Merge the 'trans_df' DataFrame with the 'credit_cards_df' DataFrame based on the 'cc_num' column\n", + "card_expiry_df = trans_df.join(credit_cards_df, on=\"cc_num\", how=\"left\")\n", + "\n", + "# Convert the 'expires' column to datetime format\n", + "card_expiry_df = card_expiry_df.with_columns(pl.col(\"expires\").str.to_datetime(\"%m/%y\"))\n", + "\n", + "# Compute the age at the time of each transaction and store it in the 'age_at_transaction' column\n", + "trans_df = trans_df.with_columns(age_at_transaction = (age_df[\"datetime\"] - age_df[\"birthdate\"]).dt.days()/365)\n", + "\n", + "# Compute the days until the card expires and store it in the 'days_until_card_expires' column\n", + "trans_df = trans_df.with_columns(days_until_card_expires = (card_expiry_df[\"expires\"] - card_expiry_df[\"datetime\"]).dt.days())\n", + "\n", + "# Display the 'age_at_transaction' and 'days_until_card_expires' columns for the first few rows\n", + "trans_df[[\"age_at_transaction\", \"days_until_card_expires\"]].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zEC12W4ux2Uk" + }, + "source": [ + "The next step is that you will create features from aggregations that are computed over every credit card over multiple time steps.\n", + "\n", + "You start by computing a feature that captures the physical distance between consecutive transactions, which we will call `loc_delta`. Here, you will use Haversine distance to quantify the distance between two longitude and latitude coordinates.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rQ-g4ETOx4O5" + }, + "outputs": [], + "source": [ + "# Sort the 'trans_df' DataFrame based on the 'datetime' column in ascending order\n", + "trans_df = trans_df.sort(\"datetime\")\n", + "\n", + "# Convert the 'longitude' and 'latitude' columns to radians\n", + "trans_df = trans_df.with_columns(pl.col(\"latitude\").map_elements(radians),\n", + " pl.col(\"longitude\").map_elements(radians))\n", + "\n", + "# Define a function to compute Haversine distance between consecutive coordinates\n", + "def haversine(long, lat):\n", + " \"\"\"Compute Haversine distance between each consecutive coordinate in (long, lat).\"\"\"\n", + "\n", + " # Shift the longitude and latitude columns to get consecutive values\n", + " long_shifted = long.shift()\n", + " lat_shifted = lat.shift()\n", + "\n", + " # Calculate the differences in longitude and latitude\n", + " long_diff = long_shifted - long\n", + " lat_diff = lat_shifted - lat\n", + "\n", + " # Haversine formula to compute distance\n", + " a = np.sin(lat_diff/2.0)**2\n", + " b = np.cos(lat) * np.cos(lat_shifted) * np.sin(long_diff/2.0)**2\n", + " c = 2*np.arcsin(np.sqrt(a + b))\n", + "\n", + " return c\n", + "\n", + "# Apply the haversine function to compute the 'loc_delta' column\n", + "trans_df = trans_df.with_columns(trans_df.groupby(\"cc_num\")\n", + " .agg(pl.map_groups(exprs=[\"longitude\", \"latitude\"], function = lambda x : haversine(x[0], x[1]))\n", + " .alias(\"loc_delta\")).explode(\"loc_delta\").fill_null(0).select(\"loc_delta\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a_MHfwYsGfbo" + }, + "source": [ + "Next you will compute windowed aggregates. Here you will use 4-hour windows, but feel free to experiment with different window lengths by setting `window_len` below to a value of your choice." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "id": "jmywmIVKGgLR", + "outputId": "32ad2881-9a27-483f-c8e4-9ef94af6dd6e" + }, + "outputs": [], + "source": [ + "window_aggs_df = trans_df[[\"cc_num\", \"amount\", \"datetime\", \"loc_delta\"]].rolling(\n", + " period=window_len, \n", + " index_column=\"datetime\",\n", + " by=[\"cc_num\"]\n", + ").agg(pl.col(\"amount\").mean().alias(\"trans_volume_mavg\"),\n", + " pl.col(\"amount\").std().alias(\"trans_volume_mstd\"),\n", + " pl.col(\"amount\").count().alias(\"trans_freq\"),\n", + " pl.col(\"loc_delta\").mean().alias(\"loc_delta_mavg\"),).fill_null(0)\n", + "window_aggs_df.tail()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yB90r9qszLe2" + }, + "source": [ + "## 🪄 Creating Feature Groups \n", + "\n", + "A feature group can be seen as a collection of conceptually related features that are computed together at the same cadence. In your case, you will create a feature group for the transaction data and a feature group for the windowed aggregations on the transaction data. Both will have `tid` as primary key, which will allow you to join them together to create training data in a follow-on tutorial.\n", + "\n", + "Feature groups provide a namespace for features, so two features are allowed to have the same name as long as they belong to different feature groups. For instance, in a real-life setting we would likely want to experiment with different window lengths. In that case, we can create feature groups with identical schema for each window length.\n", + "\n", + "Before you can create a feature group we need to connect to our feature store.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "WFmD_15TzMHX", + "outputId": "6acf8632-6993-485c-fb2a-31f27f7b462f" + }, + "outputs": [], + "source": [ + "import hopsworks\n", + "\n", + "project = hopsworks.login(host = \"c.app.hopsworks.ai\", \n", + " api_key_value=\"pDqRJZfnqRvbmZ7b.PtADpllk5K808lKYuGiF4zQddu0uJb2EoxSPVJOU8iQlHk7Vo9tXlJNBhxfPZWpd\",\n", + " port=443)\n", + "\n", + "fs = project.get_feature_store()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get or create the 'transactions' feature group\n", + "trans_fg = fs.get_or_create_feature_group(\n", + " name=\"transactions\",\n", + " version=1,\n", + " description=\"Transaction data\",\n", + " primary_key=[\"cc_num\"],\n", + " event_time=\"datetime\",\n", + " online_enabled=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A full list of arguments can be found in the [documentation](https://docs.hopsworks.ai/feature-store-api/latest/generated/api/feature_store_api/#create_feature_group).\n", + "\n", + "At this point, you have only specified some metadata for the feature group. It does not store any data or even have a schema defined for the data. To make the feature group persistent you need to populate it with its associated data using the `insert` function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Insert data into feature group\n", + "trans_fg.insert(trans_df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Update feature descriptions\n", + "feature_descriptions = [\n", + " {\"name\": \"tid\", \"description\": \"Transaction id\"},\n", + " {\"name\": \"datetime\", \"description\": \"Transaction time\"},\n", + " {\"name\": \"cc_num\", \"description\": \"Number of the credit card performing the transaction\"},\n", + " {\"name\": \"category\", \"description\": \"Expense category\"},\n", + " {\"name\": \"amount\", \"description\": \"Dollar amount of the transaction\"},\n", + " {\"name\": \"latitude\", \"description\": \"Transaction location latitude\"},\n", + " {\"name\": \"longitude\", \"description\": \"Transaction location longitude\"},\n", + " {\"name\": \"city\", \"description\": \"City in which the transaction was made\"},\n", + " {\"name\": \"country\", \"description\": \"Country in which the transaction was made\"},\n", + " {\"name\": \"fraud_label\", \"description\": \"Whether the transaction was fraudulent or not\"},\n", + " {\"name\": \"age_at_transaction\", \"description\": \"Age of the card holder when the transaction was made\"},\n", + " {\"name\": \"days_until_card_expires\", \"description\": \"Card validity days left when the transaction was made\"},\n", + " {\"name\": \"loc_delta\", \"description\": \"Haversine distance between this transaction location and the previous transaction location from the same card\"},\n", + "]\n", + "\n", + "for desc in feature_descriptions: \n", + " trans_fg.update_feature_description(desc[\"name\"], desc[\"description\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "At the creation of the feature group, you will be prompted with an URL that will directly link to it; there you will be able to explore some of the aspects of your newly created feature group.\n", + "\n", + "[//]: <> (insert GIF here)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can move on and do the same thing for the feature group with our windows aggregation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get or create the 'transactions' feature group with aggregations using specified window len\n", + "window_aggs_fg = fs.get_or_create_feature_group(\n", + " name=f\"transactions_{window_len}_aggs\",\n", + " version=1,\n", + " description=f\"Aggregate transaction data over {window_len} windows.\",\n", + " primary_key=[\"cc_num\"],\n", + " event_time=\"datetime\",\n", + " online_enabled=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Insert data into feature group\n", + "window_aggs_fg.insert(window_aggs_df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Update feature descriptions\n", + "feature_descriptions = [\n", + " {\"name\": \"datetime\", \"description\": \"Transaction time\"},\n", + " {\"name\": \"cc_num\", \"description\": \"Number of the credit card performing the transaction\"},\n", + " {\"name\": \"loc_delta_mavg\", \"description\": \"Moving average of location difference between consecutive transactions from the same card\"},\n", + " {\"name\": \"trans_freq\", \"description\": \"Moving average of transaction frequency from the same card\"},\n", + " {\"name\": \"trans_volume_mavg\", \"description\": \"Moving average of transaction volume from the same card\"},\n", + " {\"name\": \"trans_volume_mstd\", \"description\": \"Moving standard deviation of transaction volume from the same card\"},\n", + "]\n", + "\n", + "for desc in feature_descriptions: \n", + " window_aggs_fg.update_feature_description(desc[\"name\"], desc[\"description\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "### 🔪 Feature Selection \n", + "\n", + "You will start by selecting all the features you want to include for model training/inference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Select features for training data\n", + "selected_features = trans_fg.select([\"fraud_label\", \"category\", \"amount\", \"age_at_transaction\", \"days_until_card_expires\", \"loc_delta\"])\\\n", + " .join(window_aggs_fg.select_except([\"cc_num\"]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Uncomment this if you would like to view your selected features\n", + "# selected_features.show(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Recall that you computed the features in `transactions_4h_aggs_fraud_batch_fg` using 4-hour aggregates. If you had created multiple feature groups with identical schema for different window lengths, and wanted to include them in the join you would need to include a prefix argument in the join to avoid feature name clash. See the [documentation](https://docs.hopsworks.ai/feature-store-api/latest/generated/api/query_api/#join) for more details." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 🤖 Transformation Functions \n", + "\n", + "\n", + "You will preprocess our data using *min-max scaling* on numerical features and *label encoding* on categorical features. To do this you simply define a mapping between our features and transformation functions. This ensures that transformation functions such as *min-max scaling* are fitted only on the training data (and not the validation/test data), which ensures that there is no data leakage." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load transformation functions.\n", + "label_encoder = fs.get_transformation_function(name=\"label_encoder\")\n", + "\n", + "# Map features to transformations.\n", + "transformation_functions = {\n", + " \"category\": label_encoder,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## ⚙️ Feature View Creation \n", + "\n", + "The Feature View is the collection of features (from feature groups) and transformation functions used to train models and serve precomputed features to deployed models.\n", + "\n", + "The Feature View includes all of the features defined in the query object you created earlier. It can additionally include filters, one or more columns identified as the target(s) (or label) and the set of transformation functions and the features they are applied to. \n", + "\n", + "You create a Feature View with `fs.create_feature_view()`. \n", + "You retrieve a reference to an existing feature view with: `fs.get_feature_view('transactions_view',version=1)`.\n", + "In addition you can use `fs.get_or_create_feature_view()` method in order to retrieve existing feature view or create if it does not exist.\n", + "This code first tries to get a reference to the feature_view, if it doesn't exist it creates the feature_view." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get or create the 'transactions_view' feature view\n", + "feature_view = fs.get_or_create_feature_view(\n", + " name='transactions_view',\n", + " version=1,\n", + " query=selected_features,\n", + " labels=[\"fraud_label\"],\n", + " transformation_functions=transformation_functions,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## 🏋️ Training Dataset Creation\n", + "\n", + "In Hopsworks training data is a query where the projection (set of features) is determined by the parent FeatureView with an optional snapshot on disk of the data returned by the query.\n", + "\n", + "**Training Dataset may contain splits such as:** \n", + "* Training set - the subset of training data used to train a model.\n", + "* Validation set - the subset of training data used to evaluate hparams when training a model\n", + "* Test set - the holdout subset of training data used to evaluate a mode\n", + "\n", + "Training dataset is created using `feature_view.train_test_split()` method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TEST_SIZE = 0.2\n", + "\n", + "X_train, X_test, y_train, y_test = feature_view.train_test_split(\n", + " description='transactions fraud training dataset',\n", + " test_size=TEST_SIZE,\n", + " dataframe_type=\"polars\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Combining training data and labels for sorting\n", + "training_data = pl.concat([X_train, y_train], how=\"horizontal\")\n", + "\n", + "# Sort the training features DataFrame based on the 'datetime' column\n", + "training_data = training_data.sort(\"datetime\")\n", + "\n", + "X_train = training_data.select(pl.exclude(\"fraud_label\"))\n", + "\n", + "y_train = training_data.select(\"fraud_label\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Combining training data and labels for sorting\n", + "test_data = pl.concat([X_test, y_test], how=\"horizontal\")\n", + "\n", + "# Sort the test features DataFrame based on the 'datetime' column\n", + "test_data = test_data.sort(\"datetime\")\n", + "\n", + "X_test = test_data.select(pl.exclude(\"fraud_label\"))\n", + "\n", + "y_test = test_data.select(\"fraud_label\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Drop the 'datetime' column from the training features DataFrame 'X_train'\n", + "X_train = X_train.drop([\"datetime\"])\n", + "\n", + "# Drop the 'datetime' column from the test features DataFrame 'X_test'\n", + "X_test = X_test.drop([\"datetime\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Display the normalized value counts of the target variable 'y_train'\n", + "y_train[\"fraud_label\"].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that the distribution is extremely skewed, which is natural considering that fraudulent transactions make up a tiny part of all transactions. Thus you should somehow address the class imbalance. There are many approaches for this, such as weighting the loss function, over- or undersampling, creating synthetic data, or modifying the decision threshold. In this example, you will use the simplest method which is to just supply a class weight parameter to our learning algorithm. The class weight will affect how much importance is attached to each class, which in our case means that higher importance will be placed on positive (fraudulent) samples." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 🧬 Modeling\n", + "\n", + "Next you will train a model. Here, you set larger class weight for the positive class." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create an XGBoost classifier\n", + "clf = xgb.XGBClassifier()\n", + "\n", + "# Fit XGBoost classifier to the training data\n", + "clf.fit(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Predict the training data using the trained classifier\n", + "y_pred_train = clf.predict(X_train)\n", + "\n", + "# Predict the test data using the trained classifier\n", + "y_pred_test = clf.predict(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Compute f1 score\n", + "metrics = {\n", + " \"f1_score\": f1_score(y_test, y_pred_test, average='macro')\n", + "}\n", + "metrics" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Calculate and print the confusion matrix for the test predictions\n", + "results = confusion_matrix(y_test, y_pred_test)\n", + "print(results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create a DataFrame for the confusion matrix results\n", + "df_cm = pl.DataFrame(\n", + " results, \n", + ")\n", + "\n", + "# Create a heatmap using seaborn with annotations\n", + "cm = sns.heatmap(df_cm, annot=True)\n", + "\n", + "# Get the figure and display it\n", + "fig = cm.get_figure()\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "### ⚙️ Model Schema\n", + "\n", + "The model needs to be set up with a [Model Schema](https://docs.hopsworks.ai/3.0/user_guides/mlops/registry/model_schema/), which describes the inputs and outputs for a model.\n", + "\n", + "A Model Schema can be automatically generated from training examples, as shown below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from hsml.schema import Schema\n", + "from hsml.model_schema import ModelSchema\n", + "\n", + "# Create a Schema for the input features using the values of X_train\n", + "input_schema = Schema(X_train.to_numpy())\n", + "\n", + "# Create a Schema for the output using y_train\n", + "output_schema = Schema(y_train.to_numpy())\n", + "\n", + "# Create a ModelSchema using the defined input and output schemas\n", + "model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema)\n", + "\n", + "# Convert the model schema to a dictionary for inspection\n", + "model_schema.to_dict()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 📝 Register model\n", + "\n", + "One of the features in Hopsworks is the model registry. This is where we can store different versions of models and compare their performance. Models from the registry can then be served as API endpoints." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Specify the directory name for saving the model and related artifacts\n", + "model_dir = \"quickstart_fraud_model\"\n", + "\n", + "# Check if the directory already exists; if not, create it\n", + "if not os.path.isdir(model_dir):\n", + " os.mkdir(model_dir)\n", + "\n", + "# Save the trained XGBoost classifier to a joblib file in the specified directory\n", + "joblib.dump(clf, model_dir + '/xgboost_model.pkl')\n", + "\n", + "# Save the confusion matrix heatmap figure to an image file in the specified directory\n", + "fig.savefig(model_dir + \"/confusion_matrix.png\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get the model registry\n", + "mr = project.get_model_registry()\n", + "\n", + "# Create a Python model named \"fraud\" in the model registry\n", + "fraud_model = mr.python.create_model(\n", + " name=\"fraud\", \n", + " metrics=metrics, # Specify the metrics used to evaluate the model\n", + " model_schema=model_schema, # Use the previously defined model schema\n", + " input_example=[4700702588013561], # Provide an input example for testing deployments\n", + " description=\"Quickstart Fraud Predictor\", # Add a description for the model\n", + ")\n", + "\n", + "# Save the model to the specified directory\n", + "fraud_model.save(model_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## 🚀 Model Deployment\n", + "\n", + "\n", + "### About Model Serving\n", + "Models can be served via KFServing or \"default\" serving, which means a Docker container exposing a Flask server. For KFServing models, or models written in Tensorflow, you do not need to write a prediction file (see the section below). However, for sklearn models using default serving, you do need to proceed to write a prediction file.\n", + "\n", + "In order to use KFServing, you must have Kubernetes installed and enabled on your cluster." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 📎 Predictor script for Python models\n", + "\n", + "\n", + "Scikit-learn and XGBoost models are deployed as Python models, in which case you need to provide a **Predict** class that implements the **predict** method. The **predict()** method invokes the model on the inputs and returns the prediction as a list.\n", + "\n", + "The **init()** method is run when the predictor is loaded into memory, loading the model from the local directory it is materialized to, *ARTIFACT_FILES_PATH*.\n", + "\n", + "The directive \"%%writefile\" writes out the cell before to the given Python file. We will use the **predict_example.py** file to create a deployment for our model. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile predict_example.py\n", + "import os\n", + "import numpy as np\n", + "import hsfs\n", + "import joblib\n", + "\n", + "\n", + "class Predict(object):\n", + "\n", + " def __init__(self):\n", + " \"\"\" Initializes the serving state, reads a trained model\"\"\" \n", + " # Get feature store handle\n", + " fs_conn = hsfs.connection()\n", + " self.fs = fs_conn.get_feature_store()\n", + " \n", + " # Get feature view\n", + " self.fv = self.fs.get_feature_view(\"transactions_view\", 1)\n", + " \n", + " # Initialize serving\n", + " self.fv.init_serving(1)\n", + "\n", + " # Load the trained model\n", + " self.model = joblib.load(os.environ[\"ARTIFACT_FILES_PATH\"] + \"/xgboost_model.pkl\")\n", + " print(\"Initialization Complete\")\n", + "\n", + " def predict(self, inputs):\n", + " \"\"\" Serves a prediction request usign a trained model\"\"\"\n", + " feature_vector = self.fv.get_feature_vector({\"cc_num\": inputs[0][0]})\n", + " feature_vector = feature_vector[:-1]\n", + " \n", + " return self.model.predict(np.asarray(feature_vector).reshape(1, -1)).tolist() # Numpy Arrays are not JSON serializable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you wonder why we use the path Models/fraud_tutorial_model/1/model.pkl, it is useful to know that the Data Sets tab in the Hopsworks UI lets you browse among the different files in the project. Registered models will be found underneath the Models directory. Since you saved you model with the name fraud_tutorial_model, that's the directory you should look in. 1 is just the version of the model you want to deploy.\n", + "\n", + "This script needs to be put into a known location in the Hopsworks file system. Let's call the file predict_example.py and put it in the Models directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get the dataset API from the project\n", + "dataset_api = project.get_dataset_api()\n", + "\n", + "# Specify the file to upload (\"predict_example.py\") to the \"Models\" directory, and allow overwriting\n", + "uploaded_file_path = dataset_api.upload(\"predict_example.py\", \"Models\", overwrite=True)\n", + "\n", + "# Construct the full path to the uploaded predictor script\n", + "predictor_script_path = os.path.join(\"/Projects\", project.name, uploaded_file_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 👩🏻‍🔬 Create the deployment\n", + "\n", + "Here, you fetch the model you want from the model registry and define a configuration for the deployment. For the configuration, you need to specify the serving type (default or KFserving)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Deploy the fraud model\n", + "deployment = fraud_model.deploy(\n", + " name=\"fraud\", # Specify the deployment name\n", + " script_file=predictor_script_path, # Provide the path to the predictor script\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Deployment is warming up...\")\n", + "time.sleep(45)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### The deployment has now been registered. However, to start it you need to run the following command:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Start the deployment and wait for it to be running, with a maximum waiting time of 180 seconds\n", + "deployment.start(await_running=180)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get the current state of the deployment and describe its details\n", + "deployment_state = deployment.get_state().describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QKCTKfcaimxo" + }, + "source": [ + "## 📡 Test your Model with an Inference Request \n", + "\n", + "Finally you can start making predictions with your model! \n", + "\n", + "Send inference requests to the deployed model as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 52 + }, + "id": "aL3-2W39tC-u", + "outputId": "fbda67a5-ce89-49ff-f113-a7dc8bbc2b6d" + }, + "outputs": [], + "source": [ + "# Make predictions using the deployed model\n", + "predictions = deployment.predict(\n", + " inputs=fraud_model.input_example,\n", + ")\n", + "predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 👾 Try out your Model Interactively \n", + "\n", + "We will build a user interface with Gradio to allow you to enter a credit card category and amount to see if the credit card transaction will be marked as suspected of fraud or not." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install gradio --quiet\n", + "!pip install typing-extensions==4.3.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "import gradio as gr\n", + "import numpy as np\n", + "\n", + "def greet(credit_card_example):\n", + " cc_data = credit_card_example.iloc[0].astype(\"float\")\n", + " # Add missing feature values to the feature vector. Here we hard-code the values,\n", + " # but if you enable the Online Feature Store, you could retrieve them with the following commented out code\n", + " # entry = { \"cc_num\" : credit_card_example[0]}\n", + " # passed_features = {\"category\": credit_card_example[0], \"amount\" : credit_card_example[1]}\n", + " # feature_vector = feature_view.get_feature_vector(entry, passed_features)\n", + " res = deployment.predict(inputs=cc_data.tolist())\n", + " res = res[\"predictions\"][0]\n", + " if res == 0 :\n", + " return \"Not Suspected of Fraud\"\n", + " return \"Suspected of Fraud\"\n", + "\n", + "credit_card_example = gr.Dataframe(\n", + " headers=[\"Credit card number\"],\n", + " value=[[fraud_model.input_example[0]]]\n", + ")\n", + "\n", + "demo = gr.Interface(greet, \n", + " credit_card_example,\n", + " \"text\",\n", + " title=\"Live Credit Card Fraud Detector\",\n", + " description=\"Enter credit card transaction details.\",\n", + " allow_flagging=\"never\"\n", + ")\n", + "\n", + "\n", + "demo.launch(share=True, debug=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HmwNUVbHjO5I" + }, + "source": [ + "## 🥳 Next Steps\n", + "\n", + "Congratulations you've now completed the quickstart example for Managed Hopsworks.\n", + "\n", + "\n", + "Check out our other tutorials on ➡ https://github.com/logicalclocks/hopsworks-tutorials\n", + "\n", + "Or documentation at ➡ https://docs.hopsworks.ai" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "quickstart.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.18" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "192f266700894002b24eeff9b2136db3": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1b7c2a4d212646239113f831eeafc1cd": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "24536f939afa41f1acff4e55fae4423c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3f7e59807b824e86b47319bd47e32650": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_66c0d3f6020845d7a4203f33abf33818", + "IPY_MODEL_f22754bd6225452f8253ecff644c31d5", + "IPY_MODEL_77e516d3f3ce4c5488bf68825d260061" + ], + "layout": "IPY_MODEL_f91eeaa7449541819c9970f42963e2dd" + } + }, + "43c0b243b2c9417abc5072f82ba22457": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_4e326043046247f5ae5bfac3306cb145", + "IPY_MODEL_a16b8eae9f614a44a9ee7f7ea900d1f2", + "IPY_MODEL_a0e314c90c5a413d9701a6efcded884b" + ], + "layout": "IPY_MODEL_24536f939afa41f1acff4e55fae4423c" + } + }, + "4559fa2eaf7348ae9dd5d1cfdd3e0bd5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4e326043046247f5ae5bfac3306cb145": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a26b3a1aa96f4eeaaad89d4d662fa010", + "placeholder": "​", + "style": "IPY_MODEL_ed9e0fc80c9e483cb7aafad007b57ea0", + "value": "Deployment is running: 100%" + } + }, + "5fd0bd75549a47ae8a3042f1e0c61ba5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "66c0d3f6020845d7a4203f33abf33818": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d1c010a576d94f8bbcc6f535741f514b", + "placeholder": "​", + "style": "IPY_MODEL_5fd0bd75549a47ae8a3042f1e0c61ba5", + "value": "Model export complete: 100%" + } + }, + "71ad3ff88d62426abda44fdb300b0484": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7529396c14464b7b9e349c6ba003cddc": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "77e516d3f3ce4c5488bf68825d260061": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_192f266700894002b24eeff9b2136db3", + "placeholder": "​", + "style": "IPY_MODEL_1b7c2a4d212646239113f831eeafc1cd", + "value": " 6/6 [00:25<00:00, 5.19s/it]" + } + }, + "7887571c2ea4415080f60ea18be441b1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a0e314c90c5a413d9701a6efcded884b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7529396c14464b7b9e349c6ba003cddc", + "placeholder": "​", + "style": "IPY_MODEL_71ad3ff88d62426abda44fdb300b0484", + "value": " 1/1 [00:20<00:00, 5.12s/it]" + } + }, + "a16b8eae9f614a44a9ee7f7ea900d1f2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7887571c2ea4415080f60ea18be441b1", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_4559fa2eaf7348ae9dd5d1cfdd3e0bd5", + "value": 1 + } + }, + "a26b3a1aa96f4eeaaad89d4d662fa010": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d1c010a576d94f8bbcc6f535741f514b": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d6462acb3ac6479f942e1a1b8da309a8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ed9e0fc80c9e483cb7aafad007b57ea0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f22754bd6225452f8253ecff644c31d5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f2ccbe863c224765bbe755fcb0ba4be7", + "max": 6, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d6462acb3ac6479f942e1a1b8da309a8", + "value": 6 + } + }, + "f2ccbe863c224765bbe755fcb0ba4be7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f91eeaa7449541819c9970f42963e2dd": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + } + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}