(Originally posted here)
Scenario
In this scenario, and
wants to distribute segments of their data (preferably encrypted). We do not care about coverage, we are trying to maximize the amount of remotely stored data. The game is essentially that any peer, will try to minimize their own storage requirements and thus maximize other peers’.
Protocol
Two peers and
will repeatedly negotiate upon storage space according to the follwing protocol, where
is prover and
verifier. Let
be the set of segments owned by
and stored by
. Moreover, let
be an abritrary segment and let
denote the signature of
.
1. picks a random string
of length
bits (equal to the length of the signature)
2. asks
to send the closests segment from
, i.e., such that
is minimized.
3. verifies the segment
via the signature. Then, using the distance
,
can estimate the number of segments stored by
.
Repeat until the variance is low enough. The procedure is performed both ways.
Approximating
Intuitively, we can think of as a random (non-linear) code
over
and the task is to find a minimum-weight codeword in
. Then
corresponds to the number of codewords in
.
Assume that A has stored segments. Without loss of generality, we can assume that
is the all-zero string, hence transforming the problem to finding the minimum weight of $n$ binomial variables. Let
and
. Then with high probability and for larger values of
,
The above can be obtained by Gaussian approximation and the conventional expectation of the minimum of Gaussian random variables. We omit the proof here. Of course, this can be pre-computed for reasonable values of .
Assume there exists a peer with infinite computing power that tries to minimize its required storage for a given minimum distance
. Done in an optimal manner,
will pick equidistant points in the space
. This corresponds to a perfect error-correcting code with minimum distance $d$. The covering-coding bound states that
. For a perfect code, which has minimal size, we have
. Therefore, the size of the code is
, which is also the number of elements
needs to store.
A glaring problem is that cannot arbitrarily pick points (since points must be valid signatures). The best
can do is to decide upon a perfect code
(or pick one that is closest to the current set of points — which is a very hard problem). Then,
will look for points
and
that map to the same codeword
and discard the point that is the farthest away from
. For larger
and smaller
, it is not realistic to achieve the covering-coding bound since there in reality are not that many shared segments. So, even given infinite computation,
will not be able to sample points until a perfect code is achieved.
The covering-coding bound serves as a crude estimation on how to set the parameters.
Simulating
In Figure , we see that the upper bound for expected minimum distance approximates
for values larger than
and
.
Assume that send a signature
along with the segment. Let
can now use different bounds at its own choice. Assume that
Then, using the expression
determines that
most likely has at least
segments with high probability. Using the lower bound,
has at least
segments.
So, what is all this?
The intention is to create a simplistic protocol for asserting that shared data actually is retrievable, based on a probabilistic model. This could serve as basis for shared storage, in which segments are distributed. Even if the storer does not have access to the segments it has distributed, it can use the protocol to verify that it can retrieve them without actually doing so.