TY - GEN
T1 - Speculative Recovery
T2 - 2022 USENIX Annual Technical Conference, ATC 2022
AU - Li, Nanqinqin
AU - Kalaba, Anja
AU - Freedman, Michael J.
AU - Lloyd, Wyatt
AU - Levy, Amit
N1 - Publisher Copyright:
© 2022 USENIX Annual Technical Conference, ATC 2022.All rights reserved.
PY - 2022
Y1 - 2022
N2 - The ubiquity of disaggregated storage in cloud computing has led to a nascent technique for fault tolerance: instead of utilizing application-level replication, newly-launched backup instances recover application state from disaggregated storage (REDS) after a primary's failure. Attractively, REDS provides fault tolerance at a much lower cost than traditional replication schemes, wherein at least two instances are running. Failover in REDS is slow, however, because it sequentially first detects primary failure and only then starts recovery on a backup. We propose speculative recovery to accelerate failover and thus increase the availability of applications using REDS. Instead of proceeding with failover sequentially, speculative recovery safely and efficiently parallelizes detecting primary failure and running recovery on a backup, by employing our new super and collapse primitives for disaggregated storage. Our implementation and evaluation of speculative recovery demonstrate that it considerably reduces failover time.
AB - The ubiquity of disaggregated storage in cloud computing has led to a nascent technique for fault tolerance: instead of utilizing application-level replication, newly-launched backup instances recover application state from disaggregated storage (REDS) after a primary's failure. Attractively, REDS provides fault tolerance at a much lower cost than traditional replication schemes, wherein at least two instances are running. Failover in REDS is slow, however, because it sequentially first detects primary failure and only then starts recovery on a backup. We propose speculative recovery to accelerate failover and thus increase the availability of applications using REDS. Instead of proceeding with failover sequentially, speculative recovery safely and efficiently parallelizes detecting primary failure and running recovery on a backup, by employing our new super and collapse primitives for disaggregated storage. Our implementation and evaluation of speculative recovery demonstrate that it considerably reduces failover time.
UR - http://www.scopus.com/inward/record.url?scp=85140957013&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140957013&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85140957013
T3 - Proceedings of the 2022 USENIX Annual Technical Conference, ATC 2022
SP - 271
EP - 286
BT - Proceedings of the 2022 USENIX Annual Technical Conference, ATC 2022
PB - USENIX Association
Y2 - 11 July 2022 through 13 July 2022
ER -