Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed mode fails. #160

Open
Frank-Wu opened this issue Jul 17, 2014 · 1 comment
Open

Distributed mode fails. #160

Frank-Wu opened this issue Jul 17, 2014 · 1 comment

Comments

@Frank-Wu
Copy link

Hi, I tried to run h-store in the cluster but the system reported several errors.
In the experiment, I have three machines, one for client, and two for databases.
The cluster.txt is:

node1:0:0-2
node2:1:3-4

Then I run TPCC benchmark in the small cluster. And the program fails with the complaint:

[java] 17:47:09,193 INFO  - Waiting for 2 HStoreSites with 5 partitions to finish initialization
[java] 17:47:26,706 ERROR - Failed to poll 'site-00-node1' [exitValue=0]
[java] 17:47:26,706 FATAL - Process 'site-00-node2' failed. Halting benchmark!
[java] 17:47:28,211 FATAL - Failed to complete benchmark
[java] java.lang.RuntimeException: Failed to start all HStoreSites. Halting benchmark
[java]     at edu.brown.api.BenchmarkController.startSites(BenchmarkController.java:640)
[java]     at edu.brown.api.BenchmarkController.setupBenchmark(BenchmarkController.java:507)
[java]     at edu.brown.api.BenchmarkController.main(BenchmarkController.java:2245)

Then I checked the system log, for node1 it writes:

#2014-07-17T17:47:09.188.0
Buildfile: /temp/frank/h-store/build.xml
hstore-site:
17:47:11,076 [main] (HStore.java:227) WARN  - 
?????????????????????????????????????????????????????????????????????????????????????????
? !!! WARNING !!!                                                                       ?
? H-Store is executing with JVM asserts enabled. This will degrade runtime performance. ?
? You can disable them by setting the config option 'site.jvm_asserts' to FALSE         ?
? See the online documentation for more information:                                    ?
?    http://hstore.cs.brown.edu/documentation/deployment/client-configuration           ?
?????????????????????????????????????????????????????????????????????????????????????????
Site is ready for action : Site=H00 / Address=node1:21212 / Partitions={0, 1, 2}
17:47:25,205 [H00-coord] (HStoreCoordinator.java:207) ERROR - MessengerListener has stopped!
java.lang.AssertionError
    at edu.brown.hstore.HStoreCoordinator$2.run(HStoreCoordinator.java:248)
    at edu.brown.hstore.HStoreCoordinator$2.run(HStoreCoordinator.java:237)
    at com.google.protobuf.RpcUtil$1.run(Unknown Source)
    at com.google.protobuf.RpcUtil$1.run(Unknown Source)
    at edu.brown.protorpc.ProtoRpcController.finishRpc(ProtoRpcController.java:161)
    at edu.brown.protorpc.ProtoRpcController.finishRpcSuccess(ProtoRpcController.java:118)
    at edu.brown.protorpc.ProtoRpcChannel.readCallback(ProtoRpcChannel.java:178)
    at edu.brown.protorpc.NIOEventLoop.handleSelectedKeys(NIOEventLoop.java:223)
    at edu.brown.protorpc.NIOEventLoop.runOnce(NIOEventLoop.java:169)
    at edu.brown.protorpc.NIOEventLoop.run(NIOEventLoop.java:154)
    at edu.brown.hstore.HStoreCoordinator$MessengerListener.run(HStoreCoordinator.java:200)
    at java.lang.Thread.run(Thread.java:744)
17:47:25,205 [H00-coord] (HStoreCoordinator.java:224) FATAL - Unexpected error in messenger listener thread
java.lang.AssertionError
    at edu.brown.hstore.HStoreCoordinator$2.run(HStoreCoordinator.java:248)
    at edu.brown.hstore.HStoreCoordinator$2.run(HStoreCoordinator.java:237)
    at com.google.protobuf.RpcUtil$1.run(Unknown Source)
    at com.google.protobuf.RpcUtil$1.run(Unknown Source)
    at edu.brown.protorpc.ProtoRpcController.finishRpc(ProtoRpcController.java:161)
    at edu.brown.protorpc.ProtoRpcController.finishRpcSuccess(ProtoRpcController.java:118)
    at edu.brown.protorpc.ProtoRpcChannel.readCallback(ProtoRpcChannel.java:178)
    at edu.brown.protorpc.NIOEventLoop.handleSelectedKeys(NIOEventLoop.java:223)
    at edu.brown.protorpc.NIOEventLoop.runOnce(NIOEventLoop.java:169)
    at edu.brown.protorpc.NIOEventLoop.run(NIOEventLoop.java:154)
    at edu.brown.hstore.HStoreCoordinator$MessengerListener.run(HStoreCoordinator.java:200)
    at java.lang.Thread.run(Thread.java:744)
BUILD SUCCESSFUL
Total time: 15 seconds

For node2 it writes:

#2014-07-17T17:47:09.192.0
Buildfile: /temp/frank/h-store/build.xml
hstore-site:
17:47:12,457 [main] (HStore.java:227) WARN  - 
?????????????????????????????????????????????????????????????????????????????????????????
? !!! WARNING !!!                                                                       ?
? H-Store is executing with JVM asserts enabled. This will degrade runtime performance. ?
? You can disable them by setting the config option 'site.jvm_asserts' to FALSE         ?
? See the online documentation for more information:                                    ?
?    http://hstore.cs.brown.edu/documentation/deployment/client-configuration           ?
?????????????????????????????????????????????????????????????????????????????????????????
java.lang.RuntimeException
    at edu.brown.hstore.HStoreCoordinator.initConnections(HStoreCoordinator.java:561)
    at edu.brown.hstore.HStoreCoordinator.start(HStoreCoordinator.java:390)
    at edu.brown.hstore.HStoreSite.init(HStoreSite.java:691)
    at edu.brown.hstore.HStoreSite.run(HStoreSite.java:1486)
    at edu.brown.hstore.HStore.main(HStore.java:266)
17:47:16,530 [H01-main] (HStoreCoordinator.java:553) WARN  - Failed to connect to remote sites. Going to try again...
17:47:16,533 [H01-main] (HStoreCoordinator.java:559) FATAL - Site #1 failed to connect to remote sites
17:47:16,536 [H01-main] (HStoreSite.java:558) FATAL - Thread H01-main had a fatal error: null
17:47:16,565 [H01-main] (HStoreCoordinator.java:1557) WARN  - Shutting down cluster with RuntimeException
java.lang.RuntimeException
    at edu.brown.hstore.HStoreCoordinator.initConnections(HStoreCoordinator.java:561)
    at edu.brown.hstore.HStoreCoordinator.start(HStoreCoordinator.java:390)
    at edu.brown.hstore.HStoreSite.init(HStoreSite.java:691)
    at edu.brown.hstore.HStoreSite.run(HStoreSite.java:1486)
    at edu.brown.hstore.HStore.main(HStore.java:266)
17:47:16,584 [H01-main] (HStoreCoordinator.java:1520) ERROR - Trying to send ShutdownPrepareRequest to H00 before the connection was established
17:47:16,587 [H01-main] (HStoreCoordinator.java:1538) INFO  - Waiting for 1 sites to finish shutting down
17:47:26,592 [H01-main] (HStoreCoordinator.java:1541) WARN  - Failed to recieve all shutdown responses

And there's no log for client.

I am pretty sure that each node can be accessed without password and there's no problem with the cluster, since we have run the experiments in three different clusters. But I suspect that the bug is related to the benchmark. The system fails when running tpcc, auctionmark, and wikipedia benchmark. But the system successfully processed the voter and ycsb benchmark. So is there something wrong with the benchmark? Thanks!

PS: I have to say that the bug is non-deterministic. Sometimes the tpcc benchmark runs correctly but I never successfully run the wikipedia benchmark.

@Frank-Wu
Copy link
Author

This problem is solved by downloading and recompiling the newest code release. This issue can be closed now. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant