Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 583 Bytes

README.md

File metadata and controls

9 lines (7 loc) · 583 Bytes

Big_Data_NTUA

Project for the 2020-2021 NTUA ECE class "Advanced Databases". Τeam members: Skourtsidis Giorgos, Fivos Kalogiannis

  • Students had to write queries using both PySpark's interfaces: RDD and SparkSQL. SQL queries had to be tested on both CSV and PARQUET files and compare differences between all the results.

  • In part B, we had to implement 2 distributed join algorithms (repartition and broadcast join) and compare the results. We also had to experiment with Spark's query join optimizer.