Techniques for providing anonymity in communication has a long history. In the 1980’s, David Chaum published novel techniques based on mix networks (or mixnets), which had several positive ramifications on cryptography as a whole. A mixnet can be described as a box, which accepts messages from multiple senders. The messages are shuffled, and then propagated through the network in random order to the recipients.
An -to-
mixnet.
Assume that a user, Alice, wants to send a message through a mixnet. Then, the following procedure is performed.
- Introduce randomness
to the message.
- Encrypt the message (including the randomness
) using the public key
of the recipient.
- Include the address of the recipient, denoted
and some additional randomness
.
- Encrypt what is obtained in the previous step with the public key
of the mixnet.
The message that Alice sends then is . The following is then performed by the mixnet:
- The mixnet decrypts using its private key.
- The mixnet removes randomness
.
- Extracts the adress of the recipient, i.e.,
, and sends
to that address.
The purpose of the randomness is to avoid direct guessing attacks on the encryption used between sender and recipient – for instance, if the mixnet is malicious. Moreover, the randomness
asserts that the address
remains known only to the mixnet, Alice and obviously, the recipient. The use of mixnets can, and probably should, be iterated in several steps.
The disclosure attack is an attack that targets anonymity systems based on Chaum mixnets and was introduced by Kedogan, Agrawal and Penz in their seminal paper [1]. The adversary is given access to observe incoming and outgoing messages and which senders and recipients that are act in the communication.
The attack is performed under the assumption that a user Alice has partners which are regularly communicated with, and the aim is to unravel the recipients. Denote the total number of users in the system
, the number of senders in each batch is denoted
(where
), and the number of recipients in a batch is denoted
. All senders in a batch are assumed to be distinct, i.e., a sender can only send one message. Then,
since more than one sender can send messages to the same recipient. It is assumed that all senders uniformly chooses their recipient among the set of their partners in each communication.
The attack is divided into two phases, the learning phase and the excluding phase.
- In the learning phase the adversary records all receivers in a batch when a message from Alice is sent. This set of recipients is denoted by
for some
.
Whenever Alice is active, the recipients are recorded and put into
. .
The process is repeated until the adversay as
mutually disjoint sets, i.e., until
Provided that we obtain
sets, each set
contains one and only one partner of Alice. Here is a pitfall: if the sets are sampled sequentially, some set may contain two of Alice’s partners and we will not be able to find
sets.
Expressed in Python code, it looks something like the following.
# The universe of all possible destination addresses U = set(xrange(N)) # Alice's communicating partners P = set(random.sample(U,m)) # Pick (with duplicates) a random set with at least one element from P R = set(random.sample(P,1)).union(set([random.randint(0,N-1) for i in xrange(n-1)])) # Begin learning phase list_of_R = [R] flag = 1 for i in xrange(m-1): while 1: R = set(random.sample(P,1)).union(set([random.randint(0,N-1) for i in xrange(n-1)])) flag = 0 for L in list_of_R: if len(L.intersection(R)) != 0: flag = 1 break if flag == 0: list_of_R.append(R) break
- The attack now proceeds to the excluding phase. A new observation
is recorded (again, when Alice is known to communicate). If the following is satisfied
and
,
then
is replaced by
. With high probability, this is reduce the cardinality of
. This procedure is repeated until all sets
have cardinality 1, i.e., contain only one recipient. Of course, since all sets were disjoint, the communicating parters of Alice have been unraveled.
Again, expressed in Python code, we have the following.
# Excluding phase oflag = 1 while oflag == 1: oflag = 0 flag = 1 for i in xrange(m): R = set(random.sample(P,1)).union(set([random.randint(0,N-1) for k in xrange(n-1)])) L = list_of_R[i] if len(L.intersection(R)) != 0: flag = 0 for j in set(xrange(m)).difference({i}): if len(list_of_R[j].intersection(R)) != 0: flag = 1 if flag == 0: list_of_R[i] = L.intersection(R) if len(list_of_R[i]) > 1: oflag = 1 # Verify that the found set is the set P print P == set.union(*list_of_R)
There is an obvious limitation to the attack. If
then it is not possible to find disjoints sets in the learning phase.
Another take at a disclosure attack
Under the assumptions from before, it is very simple to formulate a statistical attack that is much simpler to execute (and is considerably faster).
By using the recorded observations (when Alice communicates), we may simple make a histogram of the communication in the batches. Let be the number of batches recorded, then the expected number of recorded communications with a non-parter of Alice is
, while for a partner it is
.
In the below graph, . Clearly, determining the communicating partners of Alice is not hard.
This very simple attack construction performs better than [1] and does also work for . The assumptions made in [1] are also a bit unrealistic, in particular the assumption about a uniform recipient distribution among all other communicating parties.
[1] D. Kedogan, D. Agrawal, and S. Penz. Limits of anonymity in open environments. In Revised Papers from the 5th International Workshop on Information Hiding, volume 2578 of Lecture Notes in Computer Science, pages 53–69. Springer-Verlag, 2002.