The purpose of the Spark program (PerKeyAverage.java) is to find "per key average" for all keys. Here we provide two classes:
org.dataalgorithms.chapB03.perkeyaverage.spark.PerKeyAverage
(without using Lambda Expressions)org.dataalgorithms.chapB03.perkeyaverage.sparkwithlambda.PerKeyAverage
(with using Lambda Expressions)
Each record has the following format:
<key-as-string><:><value-as-double>
<key> <average-per-key>
cat ./run_perkeyaverage_spark.sh
#!/bin/bash
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home
export BOOK_HOME=/Users/mparsian/zmp/github/data-algorithms-book
export SPARK_HOME=/Users/mparsian/spark-1.5.2
export SPARK_MASTER=spark://localhost:7077
export APP_JAR=$BOOK_HOME/dist/data_algorithms_book.jar
#
prog=org.dataalgorithms.chap03.perkeyaverage.spark.PerKeyAverage
$SPARK_HOME/bin/spark-submit --class $prog --master $SPARK_MASTER $APP_JAR
./run_perkeyaverage_spark.sh
pandas:10.0
zebra:4.0
duck:5.0
If you have any questions/comments/suggestions, please let me know: [email protected]
Thanks,
best regards,
Mahmoud Parsian