v0.0.23 - Basic RDD Support + Spark ML Cookbook
Pre-release
Pre-release
anthony-khong
released this
19 Aug 03:22
·
153 commits
to develop
since this release
Preliminary RDD support with only certain transformations completed and completion of two parts of the cookbook for Spark ML.
- Basic RDD support: mainly basic transformations such as
map
,reduce
,map-to-pair
andreduce-by-key
. The main challenge has been doing serialisation of functions which are mainly taken from Sparkling and sparkplug. - Spark ML cookbook: added two chapters on Spark ML pipelines and ported customer segmentation blog post with non-negative matrix factorisation.
- Better Geni CLI: new
--submit
command-line argument to emulatespark-submit
. - Better CI steps: automated Geni CLI tests to avoid manual testing of the Geni REPL.
- Completed benchmark results: added results from dplyr, data.table, tablecloth and tech.ml.dataset.