In this exercise you will explore the capabilities of the newly integrared Data Explorer runtime in Synapse Analytics.
Azure Synapse data explorer provides you with a dedicated query engine optimized and built for log and time series data workloads. With this new capability now part of Azure Synapse's unified analytics platform, you can easily access your machine and user data to surface insights that can directly improve business decisions. To complement the existing SQL and Apache Spark analytical runtimes, Azure Synapse data explorer is optimized for efficient log analytics, using powerful indexing technology to automatically index structured, semi-structured, and free-text data commonly found in telemetry data.
The tasks you will perform in this exercise are:
- Exercise 5 - Log and telemetry analytics
- Task 1 - Create a Data Explorer database and a table for data ingestion
- Task 2 - Perform manual data ingestion using a Synapse Pipeline
- Task 3 - Run KQL queries from Synapse Studio
- Task 4 - Load Data Explorer data into a Spark dataframe
In this task, you will create a Data Explorer database and a table to ingest sales telemetry data.
-
In Synapse Studio, navigate to the
Data
hub and then select from the+
dropdown theData Explorer database
option. -
Provide the following inputs to configure the database to be created:
- Pool name: select the Data Explorer pool (should be named
dataexplorerpoolNNNNNN
) - Name:
Sales
Leave the other options with default values and select
Create
. - Pool name: select the Data Explorer pool (should be named
-
In Synapse Studio, navigate to the
Develop
hub and then select from the+
dropdown theKQL script
option. -
In the
Connect to
dropdown, select the Data Explorer pool (should bedataexplorerpoolNNNNNN
), and then select theSales
database.Use the following KQL query and select
Run
:.create table SalesTelemetry ( CustomerId:int32, ProductId:int32, Timestamp:datetime, Url:string)
-
In Synapse Studio, navigate to the
Data
hub and check that you have theSalesTelemetry
table showing up under theSales
database in the Data Explorer pool (dataexplorerpoolNNNNNN
).
In this task, you will trigger a Synapse Pipeline to peform manual data ingestion into the table created in Task 1.
-
In Synapse Studio, navigate to the
Integrate
hub, select theExercise 5 - Import sales telemetry data
pipeline and then selectDebug
to trigger the pipeline. -
With the pipeline debug window open, wait until the pipeline completes (should take around 3 to 4 minutes).
In this task, you will query the sales telemetry data using a KQL script run from Synapse Studio.
-
In Synapse Studion, navigate to the
Develop
hub and create a new KQL query with the following content:SalesTelemetry | where Url startswith "/product/search" | take 1000
Run the query and observe the results (the first 1000 product searches).
BONUS EXERCISE
Change the KQL query to return the most recent 1000 product searches.
In this task, you will load data from a Data Explorer table into a Spark dataframe.
-
In Synapse Studio, navigate to the
Data
hub, locate theSalesTelemetry
table, select the...
context menu activator, and then selectNew notebook > Load to DataFrame
.This will open a Spark notebook with some PySpark code that loads data from the
SalesTelemetry
table into a Spark dataframe. -
In the
Attach to
dropdown, select theSparkPool01
Spark bool and then select the run button to run the notebook.NOTE
It may take up to a few minutes for the Spark pool and the associated Spark session to be created.
-
Add a new cell to the notebook and change the
kustoQuery
option with the following snippet:SalesTelemetry | where Url startswith '/product/search'
Run the new cell and observe the result (the total number of product searches logged into the telemetry data).