Skip to content

Latest commit

 

History

History
121 lines (88 loc) · 4.82 KB

README.md

File metadata and controls

121 lines (88 loc) · 4.82 KB

RaftFleet

Elixir library to run multiple Raft consensus groups in a cluster of ErlangVMs

Hex.pm Coverage Status

Feature & Design

  • Easy hosting of multiple "cluster-wide state"s
  • Decentralized architecture and fault tolerance
  • Reasonably scalable placement of processes for multiple Raft consensus groups
    • Consensus member processes are distributed to ErlangVMs in a data center-aware manner using rendezvous hashing
    • Automatic rebalancing on adding/removing nodes
  • Location transparency
    • Each consensus group leader is accessible using name (an atom) of the consensus group
    • Actual pids of consensus leader processes are cached in a local ETS table for fast access

Notes on backward compatibility

  • Users of <= 0.6.0 should upgrade to 0.6.1 before upgrading to 0.7.x due to a change in internal data structure. While <= 0.6.0 and 0.7.x are not compatible, 0.6.1 should be able to interact with both <= 0.6.0 and 0.7.x.

Usage example

Suppose we have a cluster of 4 erlang nodes:

$ iex --sname 1 -S mix
iex(1@skirino-Manjaro)>

$ iex --sname 2 -S mix
iex(2@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")

$ iex --sname 3 -S mix
iex(3@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")

$ iex --sname 4 -S mix
iex(4@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")

Load the following module that implements RaftedValue.Data behaviour on all nodes in the cluster.

defmodule JustAnInt do
  @behaviour RaftedValue.Data
  def new(), do: 0
  def command(i, {:set, j}), do: {i, j    }
  def command(i, :inc     ), do: {i, i + 1}
  def query(i, :get), do: i
end

Call RaftFleet.activate/1 on all nodes.

iex(1@skirino-Manjaro)> RaftFleet.activate("zone1")

iex(2@skirino-Manjaro)> RaftFleet.activate("zone2")

iex(3@skirino-Manjaro)> RaftFleet.activate("zone1")

iex(4@skirino-Manjaro)> RaftFleet.activate("zone2")

Create 5 consensus groups each of which replicates an integer and has 3 consensus members.

iex(1@skirino-Manjaro)> rv_config = RaftedValue.make_config(JustAnInt)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus1, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus2, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus3, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus4, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus5, 3, rv_config)

Now we can run query/command from any node in the cluster:

iex(1@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 0}

iex(2@skirino-Manjaro)> RaftFleet.command(:consensus1, :inc)
{:ok, 0}

iex(3@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 1}

Activating/deactivating a node in the cluster triggers rebalancing of consensus member processes.

Deployment notes

To run raft_fleet within an ErlangVM cluster, the followings are our general recommendations.

  • Cluster should consist of at least 3 nodes to tolerate 1 node failure. Similarly cluster nodes should span 3 (or more) data centers, so that the system keeps on functioning in the face of 1 data center failure.

  • When you add new ErlangVM nodes, each node should run the following initialization steps:

    1. establish connections to other running nodes,
    2. call RaftFleet.activate/1.

    These steps are typically done within start/2 of the main OTP application. Information of other running nodes should be available from e.g. IaaS API.

  • When terminating a node you should proceed as follows (although raft_fleet tolerates failures as long as quorums are maintained, it's much better to tell raft_fleet to make preparations beforehand):

    1. call RaftFleet.deactivate/0 within the node-to-be-terminated,
    2. wait for a while (say, 10 min) so that existing consensus group members are migrated to the other nodes, then
    3. finally shutdown the node.

Links