TY - GEN
T1 - Tolerating slowdowns in replicated state machines using copilots
AU - Ngo, Khiem
AU - Sen, Siddhartha
AU - Lloyd, Wyatt
N1 - Publisher Copyright:
© 2020 Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Replicated state machines are linearizable, fault-tolerant groups of replicas that are coordinated using a consensus algorithm. Copilot replication is the first 1-slowdown-tolerant consensus protocol: it delivers normal latency despite the slowdown of any 1 replica. Copilot uses two distinguished replicas-the pilot and copilot-to proactively add redundancy to all stages of processing a client's command. Copilot uses dependencies and deduplication to resolve potentially differing orderings proposed by the pilots. To avoid dependencies leading to either pilot being able to slow down the group, Copilot uses fast takeovers that allow a fast pilot to complete the ongoing work of a slow pilot. Copilot includes two optimizations-ping-pong batching and null dependency elimination-that improve its performance when there are 0 and 1 slow pilots respectively. Our evaluation of Copilot shows its performance is lower but competitive with Multi-Paxos and EPaxos when no replicas are slow. When a replica is slow, Copilot is the only protocol that avoids high latencies.
AB - Replicated state machines are linearizable, fault-tolerant groups of replicas that are coordinated using a consensus algorithm. Copilot replication is the first 1-slowdown-tolerant consensus protocol: it delivers normal latency despite the slowdown of any 1 replica. Copilot uses two distinguished replicas-the pilot and copilot-to proactively add redundancy to all stages of processing a client's command. Copilot uses dependencies and deduplication to resolve potentially differing orderings proposed by the pilots. To avoid dependencies leading to either pilot being able to slow down the group, Copilot uses fast takeovers that allow a fast pilot to complete the ongoing work of a slow pilot. Copilot includes two optimizations-ping-pong batching and null dependency elimination-that improve its performance when there are 0 and 1 slow pilots respectively. Our evaluation of Copilot shows its performance is lower but competitive with Multi-Paxos and EPaxos when no replicas are slow. When a replica is slow, Copilot is the only protocol that avoids high latencies.
UR - http://www.scopus.com/inward/record.url?scp=85096747669&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096747669&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85096747669
T3 - Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020
SP - 583
EP - 598
BT - Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020
PB - USENIX Association
T2 - 14th USENIX Symposium on Operating Systems Design and Implementation,OSDI 2020
Y2 - 4 November 2020 through 6 November 2020
ER -