In this paper, we provide a method to safely store a document in perhaps the most challenging settings, a highly decentralized replicated storage system where up to half of the storage servers may incur arbitrary failures, including alterations to data stored in them. Using an error correcting code (ECC), e.g., a Reed-Solomon code, one can take n pieces of a document, replace each piece with another piece of size larger by a factor of n/n-2r+1 such that it is possible to recover the original set even when up to t of the larger pieces are altered. For t close to n/2 the space blowup factor of this scheme is close to n, and the overhead of an ECC such as the Reed-Solomon code degenerates to that of a trivial replication code. We show a technique to reduce this large space overhead for high values of t. Our scheme blows up each piece by a factor slightly larger than two using an erasure code which makes it possible to recover the original set using n/2 - O(n/d) of the pieces, where d ≈ 80 is a fixed constant. Then we attach to each piece O(d log n/log d) additional bits to make it possible to identify a large enough set of unmodified pieces, with negligible error probability, assuming that at least half the pieces are unmodified and with low complexity. For values of t close to n/2 we achieve a large asymptotic space reduction over the best possible space blowup of any ECC in deterministic setting. Our approach makes use of a d-regular expander graph to compute the bits required for the identification of n/2 - O (n/d) good pieces.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics