(Originally posted here)

**Scenario**

In this scenario, and wants to distribute segments of their data (preferably encrypted). We do not care about coverage, we are trying to maximize the amount of remotely stored data. The game is essentially that any peer, will try to minimize their own storage requirements and thus maximize other peers’.

**Protocol**

Two peers and will repeatedly negotiate upon storage space according to the follwing protocol, where is prover and verifier. Let be the set of segments owned by and stored by . Moreover, let be an abritrary segment and let denote the signature of .

1. picks a random string of length bits (equal to the length of the signature)

2. asks to send the closests segment from , i.e., such that is minimized.

3. verifies the segment via the signature. Then, using the distance , can estimate the number of segments stored by .

Repeat until the variance is low enough. The procedure is performed both ways.

**Approximating **

Intuitively, we can think of as a random (non-linear) code over and the task is to find a minimum-weight codeword in . Then corresponds to the number of codewords in .

Assume that A has stored segments. Without loss of generality, we can assume that is the all-zero string, hence transforming the problem to finding the minimum weight of $n$ binomial variables. Let and . Then with high probability and for larger values of ,

The above can be obtained by Gaussian approximation and the conventional expectation of the minimum of Gaussian random variables. We omit the proof here. Of course, this can be pre-computed for reasonable values of .

Assume there exists a peer with infinite computing power that tries to minimize its required storage for a given minimum distance . Done in an optimal manner, will pick equidistant points in the space . This corresponds to a perfect error-correcting code with minimum distance $d$. The covering-coding bound states that . For a perfect code, which has minimal size, we have . Therefore, the size of the code is , which is also the number of elements needs to store.

A glaring problem is that cannot arbitrarily pick points (since points must be valid signatures). The best can do is to decide upon a perfect code (or pick one that is closest to the current set of points — which is a very hard problem). Then, will look for points and that map to the same codeword and discard the point that is the farthest away from . For larger and smaller , it is not realistic to achieve the covering-coding bound since there in reality are not that many shared segments. So, even given infinite computation, will not be able to sample points until a perfect code is achieved.

The covering-coding bound serves as a crude estimation on how to set the parameters.

**Simulating**

In Figure , we see that the upper bound for expected minimum distance approximates for values larger than and .

Assume that send a signature along with the segment. Let

can now use different bounds at its own choice. Assume that Then, using the expression

determines that most likely has at least segments with high probability. Using the lower bound, has at least segments.

**So, what is all this?**

The intention is to create a simplistic protocol for asserting that shared data actually is retrievable, based on a probabilistic model. This could serve as basis for shared storage, in which segments are distributed. Even if the storer does not have access to the segments it has distributed, it can use the protocol to verify that it *can* retrieve them without actually doing so.