Name		Name	Last commit message	Last commit date
parent directory ..
01_TweetAnalysis_DataPrep.ipynb		01_TweetAnalysis_DataPrep.ipynb
02_TweetAnalysis_Analysis.ipynb		02_TweetAnalysis_Analysis.ipynb
NumberOfMentions_SQL_Query.sql		NumberOfMentions_SQL_Query.sql
readme.md		readme.md

readme.md

TweetAnalysis Demo

Description

A set of two Spark notebooks and a T-SQL script that show the use of Spark.NET and .NET interactive with focus on:

How to use a .NET function in a notebook as a Spark UDF
How to use some of the Spark functions from .NET (including different ways to reference columns)
call into SparkSQL
Create the Parquet backed Spark tables to be used with other Spark applications and notebooks and even SQL engines
Create a visualization with the Plotly library
Use SQL on-demand to query the tables created with Spark.

Demo data location

You can get a set of CSV-files from the /Data/Tweets directory.

If you would like to use your own tweet data, you can use https://tweetdownload.net and save the result as CSV.

Installation instruction

Get the demo data from the above location and upload it to your Azure Synapse ADLS Gen2 account.
Upload the two Spark notebooks and one T-SQL script to your Azure Synapse account.
Replace the dummy inputfile path in the notebook 01_TweetAnalysis_DataPrep.ipynb with the location path to the files in your ADLS Gen2 account where you placed the Tweet files.

Demo Details

Demo Data

Data consists of a set of header-less CSV files containing tweets. They have 4 columns: Date, Time, Author, and Tweet.

`01_TweetAnalysis_DataPrep.ipynb`

This is the notebook to run first.

It will create a dataframe from the CSV files, use a user-defined function written in .NET to extract mentions and topics, and finally create a Synapse database with Spark, that contains two Parquet-backed tables called Topics and Mentions.

It provides the option to quickly run the notebook without intermediate result by setting the display variable to false, thus avoiding intermediate execution of the Spark expressions.

`02_TweetAnalysis_Analysis.ipynb`

This notebook will analyze the Topics table. The main example takes the top 5 quarterly tweet topics of MikeDoesBigData and visualizes it as a bar chart using the built-in Plotly library.

Uncomment and run the last cell, if you changed the tables in another notebook while the Spark Pool used by this notebook was active to make sure you get the latest version of the data.

`NumberOfMentions_SQL_Query.sql`

This T-SQL script should be run against the SQL on-demand Pool and run against the same database that was created in the 01_TweetAnalysis_DataPrep.ipynb notebook.

It accesses the Mentions table that got created in Spark earlier to find the number of times that MikeDoesBigData got mentioned by the different authors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweets

Tweets

readme.md

TweetAnalysis Demo

Description

Demo data location

Installation instruction

Demo Details

Demo Data

`01_TweetAnalysis_DataPrep.ipynb`

`02_TweetAnalysis_Analysis.ipynb`

`NumberOfMentions_SQL_Query.sql`

Files

Tweets

Directory actions

More options

Directory actions

More options

Latest commit

History

Tweets

Folders and files

parent directory

readme.md

TweetAnalysis Demo

Description

Demo data location

Installation instruction

Demo Details

Demo Data

01_TweetAnalysis_DataPrep.ipynb

02_TweetAnalysis_Analysis.ipynb

NumberOfMentions_SQL_Query.sql

`01_TweetAnalysis_DataPrep.ipynb`

`02_TweetAnalysis_Analysis.ipynb`

`NumberOfMentions_SQL_Query.sql`