diff --git a/examples/C/src/leader-election/HeartbeatBully.lf b/examples/C/src/leader-election/HeartbeatBully.lf index 31362d82..4e692d29 100644 --- a/examples/C/src/leader-election/HeartbeatBully.lf +++ b/examples/C/src/leader-election/HeartbeatBully.lf @@ -14,19 +14,7 @@ * is set so that each primary fails after sending three heartbeat messages. When all nodes have * failed, then the program exits. * - * This example is designed to be run as a federated program with decentralized coordination. - * However, as of this writing, bugs in the federated code generator cause the program to fail - * because all federates get the same bank_index == 0. This may be related to these bugs: - * - * - https://github.com/lf-lang/lingua-franca/issues/1961 - * - https://github.com/lf-lang/lingua-franca/issues/1962 - * - * When these bugs are fixed, then the federated version should operate exactly the same as the - * unfederated version except that it will become possible to kill the federates instead of having - * them fail on their own. The program should also be extended to include STP violation handlers to - * deal with the fundamental CAL theorem limitations, where unexpected network delays make it - * impossible to execute the program as designed. For example, if the network becomes partitioned, - * then it becomes possible to have two primary nodes simultaneously active. + * This example is designed to be run as a federated program. * * @author Edward A. Lee * @author Marjan Sirjani @@ -101,10 +89,6 @@ reactor Node( } } } - // FIXME - // =} STP (0) {= - // FIXME: What should we do here. - // lf_print_error("Node %d had an STP violation. Ignoring heartbeat as if it didn't arrive at all.", self->bank_index); =} reaction(t) -> reset(Prospect) {= diff --git a/examples/C/src/leader-election/README.md b/examples/C/src/leader-election/README.md index 8a8e3c69..d2cce2c9 100644 --- a/examples/C/src/leader-election/README.md +++ b/examples/C/src/leader-election/README.md @@ -1,11 +1,14 @@ # Leader Election -These federated programs implements a redundant fault-tolerant system where a primary node, if and when it fails, is replaced by a backup node. The protocol is described in this paper: +These federated programs implement redundant fault-tolerant systems where a primary node, if and when it fails, is replaced by a backup node. The HeartbeatBully example is described in this paper: -> Bjarne Johansson; Mats Rågberger; Alessandro V. Papadopoulos; Thomas Nolte, "Consistency Before Availability: Network Reference Point based Failure Detection for Controller Redundancy," Emerging Technologies and Factory Automation (ETFA), 12-15 September 2023, [DOI:10.1109/ETFA54631.2023.10275664](https://doi.org/10.1109/ETFA54631.2023.10275664) +> B. Johansson, M. Rågberger, A. V. Papadopoulos and T. Nolte, "Heartbeat Bully: Failure Detection and Redundancy Role Selection for Network-Centric Controller," IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 2020, pp. 2126-2133, [DOI: 10.1109/IECON43393.2020.9254494](https://doi.org/10.1109/IECON43393.2020.9254494). +The NRP examples extend the algorithm to reduce the likelihood of getting multiple primaries when the network becomes partitioned. The NRP protocol is described in this paper: -The key idea in this protocol is that when a backup fails to detect the heartbeats of a primary node, it becomes primary only if it has access to Network Reference Point (NRP), which is a point in the network. This way, if the network becomes partitioned, only a backup that is on the side of the partition that still has access to the NRP can become a primary. If a primary loses access to the NRP, then it relinquishes its primary role because it is now on the wrong side of a network partition. A backup on the right side of the partition will take over. The "FD" in the names of the programs stands for "fault detection." +> B. Johansson, M. Rågberger, A. V. Papadopoulos, and T. Nolte, "Consistency Before Availability: Network Reference Point based Failure Detection for Controller Redundancy," Emerging Technologies and Factory Automation (ETFA), 12-15 September 2023, [DOI:10.1109/ETFA54631.2023.10275664](https://doi.org/10.1109/ETFA54631.2023.10275664) + +The key idea in the NRP protocol is that when a backup fails to detect the heartbeats of a primary node, it becomes primary only if it has access to Network Reference Point (NRP), which is a point in the network. This way, if the network becomes partitioned, only a backup that is on the side of the partition that still has access to the NRP can become a primary. If a primary loses access to the NRP, then it relinquishes its primary role because it is now on the wrong side of a network partition. A backup on the right side of the partition will take over. The "FD" in the names of the programs stands for "fault detection." ## Prerequisite @@ -15,8 +18,12 @@ To run these programs, you are required to first [install the RTI](https://www.l
+ | HeartbeatBully.lf : Basic leader electrion protocol called "heartbeat bully". | +|
- | NRP_FD.lf : This version has switch1 failing at 3s, node1 failing at 10s, and node2 failing at 15s. | +NRP_FD.lf : Extension using a network reference point (NRP) to help prevent multiple primaries. This version has switch1 failing at 3s, node1 failing at 10s, and node2 failing at 15s. |
diff --git a/examples/C/src/leader-election/img/HeartbeatBully.png b/examples/C/src/leader-election/img/HeartbeatBully.png new file mode 100644 index 00000000..54ed8551 Binary files /dev/null and b/examples/C/src/leader-election/img/HeartbeatBully.png differ |