spark-session cannot be changed (any more ?) #301

behrica · 2020-12-06T14:49:07Z

Following the minikube guide:
https://github.com/zero-one-group/geni/blame/develop/docs/kubernetes_basic.md

the verification of line 118 fails.

It seems that I cannot change the spark-session, by calling
g/create-spark-session

I am pretty sure, that it worked at one moment.

The text was updated successfully, but these errors were encountered:

behrica · 2020-12-06T14:52:08Z

I just saw that this is not a Delay any more.

So we get the SparkSession initialized once we require the name space.
And I suppose, it cannot be changed any more.

So it cannot be re-configured.

anthony-khong · 2020-12-06T15:02:20Z

I see... I think we can change it back to a delay. Would you like to make a PR for that?

behrica · 2020-12-06T16:30:27Z

Maybe there is a better way.

Maybe the default "configuration map" for the session https://github.com/behrica/geni/blob/482c4b934f037d32b849916211b509c94d89800e/src/clojure/zero_one/geni/defaults.clj#L5

Could become an "atom" , which can be changed if needed before
requiring the default name space 'zero-one.geni.core'

I think that the current feature to potentially change the session itself is not super usefull,
because Spark does not really support this cleanly, correct ?
If I have read it right, the spark session is meant to be instantiated ones in the lifetime of a JVM.
I can try this out to see if it works.

behrica · 2020-12-06T17:14:20Z

I think it could work this way.

The issue would be to keep the "full automatic" session configuration of the geni-cli.
My opinion is, that the current way of the geni cli session initialization, which:

initialises a spark object session object with defaults
and can somehow be overridden with some "tricks" (delays, futures)

is brittle as it will not work always and depends on "order" of requiring ns / using functions.

I think we have three options for this:

Not have it full automatic, but a methods which needs to be called (init-default-spark) or similar
-> this could then allow changing config settings
Allow to change spark session configuration from outside repl by either:

read a config file
take config options on the geni.sh

I still like the overall idea of the geni CLI as a quick user friendly entry point, but it needs to allow arbitrary session configs.
(or we do not allow any custom session config for the geni cli, and see it as a "demo")
The other spark shells can be fully configured from command line (and do neither allow to change session from inside)

erp12 · 2020-12-07T00:07:04Z

Here is a link to our previous discussions for reference

I think that the current feature to potentially change the session itself is not super usefull,
because Spark does not really support this cleanly, correct ?
If I have read it right, the spark session is meant to be instantiated ones in the lifetime of a JVM.

You are correct. Typically, a user's spark session settings would be set during the call to spark-submit. The default session settings in geni will only be applied if no call to spark-submit is made (ie. running locally).

Most Spark usage (across all languages) happens by launching a spark "application" (for example, a Geni REPL) on an existing spark cluster. It is not expected that the spark application creates it's own cluster, and thus the session config is supplied when the .jar and main class are specified.

I'm not too familiar with Kubernetes, so I am having trouble following the guide. It looks like the Geni CLI is being started outside of spark-submit. I think the more traditional pattern would be to call spark-submit in the container for the cluster's driver and pass an uberjar of Geni and --class zero-one.geni.main along with any other spark session config you want.

I have had success with starting Geni REPLs on flintrock clusters using spark-submit.

behrica changed the title ~~spark-session cannot be changed (any more)~~ spark-session cannot be changed (any more ?) Dec 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-session cannot be changed (any more ?) #301

spark-session cannot be changed (any more ?) #301

behrica commented Dec 6, 2020

behrica commented Dec 6, 2020

anthony-khong commented Dec 6, 2020

behrica commented Dec 6, 2020

behrica commented Dec 6, 2020 •

edited

Loading

erp12 commented Dec 7, 2020

spark-session cannot be changed (any more ?) #301

spark-session cannot be changed (any more ?) #301

Comments

behrica commented Dec 6, 2020

behrica commented Dec 6, 2020

anthony-khong commented Dec 6, 2020

behrica commented Dec 6, 2020

behrica commented Dec 6, 2020 • edited Loading

erp12 commented Dec 7, 2020

behrica commented Dec 6, 2020 •

edited

Loading