-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure on {ok, not_found} in rafter_consensus_fsm:600 #21
Comments
@d0rc I'm having trouble reproducing this issue. I've killed nodes, given bad commands, restarted all nodes etc... I saw something similar to this with bugs in the past, but not recently. If you have the logs and the nodes are still crashing like this can you stop the nodes, tar up the data directory and email it to me? astone AT basho Dot Com Additionally, did you have existing data, stop the nodes, pull a new version from github and start again? I'm just trying to nail down potential causes. You can likely clear this up and hopefully not seeing it again by using the latest code, wiping your data (which appears to be test data anyway) and starting again. Thanks for reporting this. Hopefully I'll get it sorted out. |
Without knowing about this issue, I ran into this problem. I wrote an expect script to reproduce it. Basically, I tried to append an entry to the log when the leader had lost contact with its two followers. Unfortunately, it's not possible to attach files to comments, so I have to paste it inline: spawn ./bin/start-node peer1 spawn ./bin/start-node peer2 spawn ./bin/start-node peer3 expect -i $p1 "1>" sleep 3 send -i $p1 "Peers = [{peer1, '[email protected]'}, {peer2, '[email protected]'}, {peer3, '[email protected]'}].\n" sleep 2 send -i $p1 "rafter:get_leader(peer1).\n" send -i $p1 "rafter:op(rafter:get_leader(peer1), {new, ourtable}).\n" send -i $p1 "rafter:op(rafter:get_leader(peer1), {put, ourtable, foo, 1}).\n" sleep 2 if { $leader eq "{peer1,'[email protected]'}" } { foreach { id } [list $p1 $p2 $p3] { send_user "\nkilled everyone except leader\n" sleep 3 send -i $leader_p "rafter:op(rafter:get_leader(peer1), {put, ourtable, foo, 2}).\n" sleep 2 send_user "\nall commands executed\n" foreach { name } [list peer1 peer2 peer3] { foreach { name } [list peer1 peer2 peer3] { sleep 3 foreach { name } [list peer1 peer2 peer3] { foreach { name } [list peer1 peer2 peer3] { spawn ./bin/start-node $leader_name sleep 3 foreach { name } [list peer1 peer2 peer3] { foreach { name } [list peer1 peer2 peer3] { send -i $leader "{}.\n" foreach { name } [list peer1 peer2 peer3] { foreach { name } [list peer1 peer2 peer3] { |
I can confirm this, using Erlang/OTP 17 and rafter 4dbbb75.
|
Honestly, at this point, I've pretty much ceased development on rafter. I'm not sure when I'll really have time to dig into this issue. Rafter is definitely not a production ready project. There are a lot of rough edges, and I don't really have a use case at the moment enticing me to work on it more. Since I've stopped working on rafter I've poured a lot of my energy into Riak Ensemble. It is a production ready consensus protocol that is in use in Riak 2.0 to provide atomic single key operations. While it is different from rafter in that it doesn't provide a globally ordered log, there is no reason a log cannot be built on top of riak ensemble. Additionally, Riak Ensemble provides leader leases allowing 0 round trip reads, and built in integrity trees that protect against some byzantine failure scenarios. It also manages multiple ensemble groups instead of the one managed by rafter. On the downside it requires rewriting active keys on epoch changes and maybe isn't quite as user friendly to get started. The big advantage however, is that it is production ready now and in use in a soon to be released Riak 2.0. |
Here is full failure log and log of peer3 after restart attempt:
The text was updated successfully, but these errors were encountered: