This document describes implementation specifics of the solution.
Solution's resources are created through deployments of multiple Azure Resource Manager (ARM) templates, which are linked together by pdm-arm.json. (Linked templates are covered in great detail in this article.)
If you wish to customize this solution by cloning this GitHub repository, be sure to update the gitHubBaseUrl
variable accordingly in the main ARM template so that it points to your clone repository.
The ARM templates can also be reused outside of GitHub, which would require deploying them in a certain order such that resource and input/output parameter dependencies are maintained.
In addition to the ARM deployments, the solution depends on the following custom configuration activities implemented as WebJobs:
- Python and storage setup, which configures Python 3.6 runtime on the Azure App Service and creates several storage tables used by the solution
- Databricks and simulated devices setup, which creates a Databricks cluster used for real-time feature engineering as well as several "test" IoT devices used by the data generator.
These Web Jobs are implemented as "continuous," although, technically, they only run once. Please refer to the source code for more details.
The Web Jobs mentioned above, the Dashboard (Flask application) and several additional Web Job are all part of the same Web Application, which is deployed to Azure App Service via this ARM template. Notice that the template expects the Web Application to be packaged into a ZIP file (which can be found in the binaries directry).
The data generator is implemented as yet another "continuous" Web Job with the name Simulator. When this Web Job runs, it automatically discovers IoT devices (created during provisioning) and starts sending messages to IoT Hub. Additional simulated devices can be created manually through the solution's Dashboard.
More information on device simulation is available in the DataGeneration.ipynb Jupyter notebook.
Data sent to IoT Hub by the Generator is read (using Azure Event Hub Connector) and processed by solution's Spark Structured Streaming Job running on the Databricks cluster created during solution provisioning. This job is implemented as a Scala sbt project. It demonstrates how real-time feature engineering can be done using Spark.
Aggregated feature data is written by the Spark job into an Azure table, from where it is read by the Scorer Web Job, which defers scoring to the ML model operationalized as a Web service.
The scores (or predictions) generated by the ML model are also written into a storage table. Both feature aggregates and predictions are rendered in the solution's Dashboard as charts and tables.
The solution is deployed with a pre-trained AI model running as a container on Azure Container Instances (ACI).
To train and operationalize a custom AI model, please refer to the Jupyter Notebooks included with the Solution. (These notebooks are also uploaded to an instance of the Linux Data Science VM (DSVM) deployed as part of the Solution. The instructions on how to access the DSVM as available in the Dashboard.)