-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flintrock constant failure : Exception: Failed to install Spark #290
Comments
What kind of AMI are you using in your cluster config? Flintrock only supports Amazon Linux and similar OSes (like CentOS). If you want to use an
It also looks like you may be using an AMI that has Spark already installed on it. Is that the case? |
I am using the default ami that flintlock configure specified. I just changed the bare minimum fields on the yamal file. Namely the spark download link, hdfs link, pem file, ec2 user and number of slaves. |
Here is the config,yaml for flintrock
|
How did you install Flintrock? Can you try installing Flintrock using |
I have a Ubuntu virtual machine. I installed flintrock on it using pip3 on this machine. |
Can someone guide me as to what the issue could be? I have followed the instructions exactly but still ut keeps on failing. The machines get created and then all get deleted. I had sent the log files earlier. It seems like Spark is not getting installed properly for some reason (a lnk is trying to get created and it already exists.) Please help me. I am stuck for quite some time now. |
The errors about Did you try that? Please show the console output from when you installed Flintrock. |
Nick, I did try from a new virtual environment. I think I may have figured out the issue. The spark tarball does include Hadoop. I think in my script file I was installing both hdfs and spark(which already includes Hadoop. I believe this was causing the issue. I opted to not install hdfs and now the scripts runs fine. I am new to Spark installation so do not know if this will cause issues, but the cluster is running now. Please let me know if I do not install hdfs, will that cause any issues. |
There should be no conflict between Spark and HDFS. Are the tarballs hosted by Do you have the same issue if you use the official Apache mirrors instead of the host you currently have configured? |
I took that mirror from https://www.apache.org/dyn/closer.lua/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz. I did not do a comparison though. I will try and let you know. |
For the record, I just successfully launched a few clusters using Flintrock I just noticed that you are using an older version of HDFS in your config. The configuration template specifies HDFS 2.8.5, and that's what I recommend. Have you tried that? It might resolve the If updating your configured version of HDFS has the effect I expect, then please reinstall Flintrock into a new virtual environment and show me the entire log of how you did it. Maybe there's a clue in there about the second error. |
Still having the same issue, @shail-burman? If so, please try the things I suggested in my previous message and let me know how they turned out. |
Sorry Nick for the tardy response, I have been stuck on a problem involving loading millions of small files. Saw yr note on that forum that there is no good way. Anyways, Thanks so much Hadoop 2.8.5 solved the beeline issue. You may close this issue. I had a few questions aroiund flint rock. A) Can we create regular instances instead of spot instances, so we could use them in semi prod environment. B) Can we use different configurations for driver and workers?\ Thanks, |
Glad to hear the HDFS version fixed your issues. A) Yes, Flintrock by default creates on-demand instances if you don't specify a spot price in your config. B) Unfortunately, no. There's some discussion on this in #199 and several issues linked to or from there. |
Here is the log file:
The text was updated successfully, but these errors were encountered: