Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two users getting the same H2O Notebook #5728

Open
arunaryasomayajula opened this issue May 24, 2024 · 0 comments
Open

Two users getting the same H2O Notebook #5728

arunaryasomayajula opened this issue May 24, 2024 · 0 comments
Assignees

Comments

@arunaryasomayajula
Copy link

arunaryasomayajula commented May 24, 2024

  1. SW cluster 1 started (multiple nodes) by user 1 with flow UI service on a certain port (for example, 000.000.000.001::54321).
  2. For some reason (could be timeout, oom etc.), the SW cluster 1 was dead and 000.000.000.001::54321 was released.
  3. In Spectrum Conductor, the status of the cluster 1 is still "started" with the flow UI link (000.000.000.001::54321).
  4. SW cluster 2 started by user 2 and it took 000.000.000.001::54321 and assigned flow UI service to this port.
  5. Now user 1 and user 2 will see the same cluster from Spectrum Conductor with flow UI service on 000.000.000.001::54321.

Sparkling Water Context:

  • Sparkling Water Version: 3.40.0.1-1-2.4
  • H2O name: k023042
  • cluster size: 6
  • list of used nodes:
    (executorId, host, port)

(0,10.119.198.87,54323)
(1,10.119.198.87,54325)
(2,10.119.198.88,54323)
(3,10.119.198.88,54325)
(4,10.119.198.173,54325)
(5,10.119.198.173,54335)

Open H2O Flow in browser: https://ppvra00a0011.osds..net:54325 (CMD + click in Mac OSX)

I suspect Flow UI crashed for some reason and port 54323 is released at Feb/20 05:02:30.

H2OContext has been closed! Please create a new H2OContext to a healthy and reachable (web enabled)
H2O cluster.
at ai.h2o.sparkling.H2OContext$$anon$1.run(H2OContext.scala:359)
Caused by: ai.h2o.sparkling.backend.exceptions.RestApiNotReachableException: H2O node https://10.119.198.87:54323 is not reachable.

AIMD H2O notebook starts at Feb/21 08:11:31, UI Flow binds to freed port 54323.
Providing us with the observed and expected behavior definitely helps. Giving us with the following information definitively helps:

  • Sparkling Water/PySparkling/RSparkling version
  • Hadoop Version & Distribution
  • Execution mode YARN-client, YARN-cluster, standalone, local ..
  • YARN logs in case of running on yarn. To collect such a logs you may run yarn logs -applicationId <application ID> where the application ID is displayed when Sparkling Water is started
  • H2O & Spark logs if not running on YARN. You can find these logs in Spark work directory
  • Are you using Windows/Linux/MAC?
  • Spark & Sparkling Water configuration including the memory configuration

Please also provide us with the full and minimal reproducible code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants