Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create NotebookSparkSession error on windows #1082

Open
AnhQuanTran opened this issue Apr 18, 2023 · 4 comments
Open

Create NotebookSparkSession error on windows #1082

AnhQuanTran opened this issue Apr 18, 2023 · 4 comments

Comments

@AnhQuanTran
Copy link

AnhQuanTran commented Apr 18, 2023

Hi,

I setup almond sh kernel for jupyter-lab on windows. It work well when run scala code. But when i create NotebookSparkSession, i facing an issue like that

val spark = {
    NotebookSparkSession.builder().master("yarn").getOrCreate()
}

Error
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/D:/Users/<user-name>/AppData/Roaming/jupyter/kernels/scala/launcher.jar!/coursier/bootstrap/launcher/jars/ammonite-compiler-interface_2.12.10-2.5.6-4-4a07420b-sources.jar!

The first jar file is launcher: D:/Users//AppData/Roaming/jupyter/kernels/scala/launcher.jar
The second jar file is a class in launcher.jar: /coursier/bootstrap/launcher/jars/ammonite-compiler-interface_2.12.10-2.5.6-4-4a07420b-sources.jar!`

Version:

  • almond 0.13.4
  • scala 2.12.10
  • spark 3.2.2

Steps setup kernel:

  • setup java jdk 1.8.0
  • setup jupyter-lab
  • setup almond sh kernel
bitsadmin /transfer downloadCoursierCli https://git.io/coursier-cli "%cd%\coursier"
bitsadmin /transfer downloadCoursierBat https://git.io/coursier-bat "%cd%\coursier.bat"
echo -n | openssl s_client -showcerts -connect repo1.maven.org:443 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > maven.crt
set JAVA_HOME=C:\jdk1.8.0_241
"%JAVA_HOME%\bin\keytool.exe" -import -trustcacerts -keystore maven.crt -storepass changeit -noprompt -alias maven -file maven.crt

coursier bootstrap --standalone almond:0.13.4 --scala 2.12.10 -o almond 
almond --install --force
  • config kernel
{
  "argv": [
	"java",
	"-jar",
	"D:\\Users\\<user-name>\\AppData\\Roaming\\jupyter\\kernels\\scala\\launcher.jar",
	"--connection-file",
	"{connection_file}"
  ],
  "display_name": "Scala",
  "language": "scala",
  "env": {
	"JAVA_HOME": "C:\\jdk1.8.0_241",
	"HADOOP_HOME": "D:\\DATA\\Environment\\hadoop-2.7.2",
	"HADOOP_CONF_DIR": "D:\\DATA\\Environment\\hadoop-2.7.2\\etc\\hadoop",
	"SPARK_HOME": "D:\\DATA\\Environment\\spark-3.2.2-bin-hadoop2.7",
	"COURSIER_CACHE": "D:\\Users\\<user-name>\\AppData\\Local\\Coursier\\cache\\v1",
	"JVM_OPT": "-Dhttp.proxyHost=http://proxy.xxx.com.vn -Dhttp.proxyPort=8080 -Dhttps.proxyHost=http://proxy.xxx.com.vn -Dhttps.proxyPort=8080"
  }
}
  • run kernel in jupyter-lab
import $ivy.`org.apache.spark::spark-sql:3.2.2`
import $ivy.`sh.almond::almond-spark:0.13.4`
import org.apache.spark.sql._
val spark = {
	NotebookSparkSession.builder().master("yarn").getOrCreate()
}

how can i fix it. thank you

@alexarchambault
Copy link
Member

@AnhQuanTran I think that originates from the way Almond is installed. I think the current documentation is misguiding… I'm planning to update it soon.

In the mean time, if you use coursier launch commands as in the current documentation, you can ensure that you use coursier >= 2.1.1 (released a few days ago), and pass --hybrid to it. It's what I've been using in an Almond installation I'm working on right now, and Spark works fine with it. --hybrid generates launchers that don't rely on the jar: protocol.

@AnhQuanTran
Copy link
Author

AnhQuanTran commented Apr 19, 2023

@alexarchambault
I flow by your guide and error is fixed. But i still can not create notebook spark session
Steps i tried:

bitsadmin /transfer downloadCoursierCli https://github.com/coursier/launchers/raw/master/coursier "%cd%\coursier"
bitsadmin /transfer downloadCoursierBat https://github.com/coursier/launchers/raw/master/coursier.bat "%cd%\coursier.bat"
coursier version = 2.1.2
echo -n | openssl s_client -showcerts -connect repo1.maven.org:443 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > maven.crt
set JAVA_HOME=C:\jdk1.8.0_241
"%JAVA_HOME%\bin\keytool.exe" -import -trustcacerts -keystore maven.crt -storepass changeit -noprompt -alias maven -file maven.crt
coursier bootstrap --hybrid almond:0.13.4 --scala 2.12.10 -o almond
almond --install --force
{
  "argv": [
	"java",
	"-jar",
	"D:\\Users\\<user-name>\\AppData\\Roaming\\jupyter\\kernels\\scala\\launcher.jar",
	"--connection-file",
	"{connection_file}"
  ],
  "display_name": "Scala",
  "language": "scala",
  "env": {
	"JAVA_HOME": "C:\\jdk1.8.0_241",
	"HADOOP_HOME": "D:\\DATA\\Environment\\hadoop-2.7.2",
	"HADOOP_CONF_DIR": "D:\\DATA\\Environment\\hadoop-2.7.2\\etc\\hadoop",
	"SPARK_HOME": "D:\\DATA\\Environment\\spark-3.2.2-bin-hadoop2.7",
	"COURSIER_CACHE": "D:\\Users\\<user-name>\\AppData\\Local\\Coursier\\cache\\v1",
	"JVM_OPT": "-Dhttp.proxyHost=http://proxy.xxx.com.vn -Dhttp.proxyPort=8080 -Dhttps.proxyHost=http://proxy.xxx.com.vn -Dhttps.proxyPort=8080"
  }
}

It's seem, almond can't detect spark config in spark-default file and hadoop config in etc/hadoop folder. Nothing happened after i run create session.

@AnhQuanTran
Copy link
Author

Finally, it raise an exception like this. It's seem connect to timeline server is issue, but when i run spark-shell from prompt, it still working

java.io.IOException: java.lang.RuntimeException: Failed to connect to timeline server. Connection retries limit exceeded. The posted timeline event may be missing
  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:433)
  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:395)
  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:314)
  org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:207)
  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
  org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
  org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
  org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
  org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
  scala.Option.getOrElse(Option.scala:189)
  org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
  org.apache.spark.sql.ammonitesparkinternals.AmmoniteSparkSessionBuilder.getOrCreate(AmmoniteSparkSessionBuilder.scala:353)
  ammonite.$sess.cmd3$Helper.<init>(cmd3.sc:2)
  ammonite.$sess.cmd3$.<init>(cmd3.sc:7)
  ammonite.$sess.cmd3$.<clinit>(cmd3.sc:-1)
java.lang.RuntimeException: Failed to connect to timeline server. Connection retries limit exceeded. The posted timeline event may be missing
  org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:360)
  org.apache.hadoop.yarn.client.api.impl.TimelineConnector.operateDelegationToken(TimelineConnector.java:220)
  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:213)
  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:426)
  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:395)
  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:314)
  org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:207)
  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
  org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
  org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
  org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
  org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
  scala.Option.getOrElse(Option.scala:189)
  org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
  org.apache.spark.sql.ammonitesparkinternals.AmmoniteSparkSessionBuilder.getOrCreate(AmmoniteSparkSessionBuilder.scala:353)
  ammonite.$sess.cmd3$Helper.<init>(cmd3.sc:2)
  ammonite.$sess.cmd3$.<init>(cmd3.sc:7)
  ammonite.$sess.cmd3$.<clinit>(cmd3.sc:-1)

@AnhQuanTran
Copy link
Author

@alexarchambault Hi, i use spark 3.2.2 and scala 2.12.15. What version almond is compatible with these? I tried almond 0.13.9 but it error: java.lang.NoClassDefFoundError: org/apache/logging/log4j/LogManager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants