# Understanding the disclosure attack

Techniques for providing anonymity in communication has a long history. In the 1980’s, David Chaum published novel techniques based on mix networks (or mixnets), which had several positive ramifications on cryptography as a whole. A mixnet can be described as a box, which accepts messages from multiple senders. The messages are shuffled, and then propagated through the network in random order to the recipients.

An $n$-to-$n$ mixnet.

Assume that a user, Alice, wants to send a message $M$ through a mixnet. Then, the following procedure is performed.

1. Introduce randomness $R_0$ to the message.
2. Encrypt the message (including the randomness $R_0$) using the public key $K$ of the recipient.
3. Include the address of the recipient, denoted $A$ and some additional randomness $R_1$.
4. Encrypt what is obtained in the previous step with the public key $K_m$ of the mixnet.

The message that Alice sends then is $E_{K_1}(R_1,E_K(R_0,M),A)$. The following is then performed by the mixnet:

1. The mixnet decrypts using its private key.
2. The mixnet removes randomness $R_1$.
3. Extracts the adress of the recipient, i.e., $A$, and sends $E_K(R_0,M)$ to that address.

The purpose of the randomness $R_0$ is to avoid direct guessing attacks on the encryption used between sender and recipient – for instance, if the mixnet is malicious. Moreover, the randomness $R_1$ asserts that the address $A$ remains known only to the mixnet, Alice and obviously, the recipient. The use of mixnets can, and probably should, be iterated in several steps.

The disclosure attack is an attack that targets anonymity systems based on Chaum mixnets and was introduced by Kedogan, Agrawal and Penz in their seminal paper [1]. The adversary is given access to observe incoming and outgoing messages and which senders and recipients that are act in the communication.

The attack is performed under the assumption that a user Alice has $m$ partners which are regularly communicated with, and the aim is to unravel the recipients. Denote the total number of users in the system $N$, the number of senders in each batch is denoted $b$ (where $1), and the number of recipients in a batch is denoted $n$. All senders in a batch are assumed to be distinct, i.e., a sender can only send one message. Then, $n \leq b$ since more than one sender can send messages to the same recipient. It is assumed that all senders uniformly chooses their recipient among the set of their partners in each communication.

The attack is divided into two phases, the learning phase and the excluding phase.

1. In the learning phase the adversary records all receivers in a batch when a message from Alice is sent. This set of recipients is denoted by $\mathcal R_i$ for some $0 < i \leq m$.

Whenever Alice is active, the recipients are recorded and put into $\mathcal R_t$. .

The process is repeated until the adversay as $m$ mutually disjoint sets, i.e., until

$\mathcal R_i \cap \mathcal R_j = \emptyset, \quad \forall i \neq j.$

Provided that we obtain $m$ sets, each set $\mathcal R_i$ contains one and only one partner of Alice. Here is a pitfall: if the sets are sampled sequentially, some set may contain two of Alice’s partners and we will not be able to find $m$ sets.

Expressed in Python code, it looks something like the following.

# The universe of all possible destination addresses
U = set(xrange(N))

# Alice's communicating partners
P = set(random.sample(U,m))

# Pick (with duplicates) a random set with at least one element from P
R = set(random.sample(P,1)).union(set([random.randint(0,N-1) for i in xrange(n-1)]))

# Begin learning phase
list_of_R = [R]
flag = 1
for i in xrange(m-1):
while 1:
R = set(random.sample(P,1)).union(set([random.randint(0,N-1) for i in xrange(n-1)]))
flag = 0
for L in list_of_R:
if len(L.intersection(R)) != 0:
flag = 1
break
if flag == 0:
list_of_R.append(R)
break

2. The attack now proceeds to the excluding phase. A new observation $\mathcal R$ is recorded (again, when Alice is known to communicate). If the following is satisfied

$\mathcal R \cap \mathcal R_i \neq \emptyset$ and $\mathcal R \cap \mathcal R_j = \emptyset, \quad \forall i \neq j$,

then $\mathcal R_i$ is replaced by $\mathcal R \cap \mathcal R_i$. With high probability, this is reduce the cardinality of $\mathcal R_i$. This procedure is repeated until all sets $\mathcal R_i, 0 < i \leq m$ have cardinality 1, i.e., contain only one recipient. Of course, since all sets were disjoint, the communicating parters of Alice have been unraveled.

Again, expressed in Python code, we have the following.

# Excluding phase
oflag = 1
while oflag == 1:
oflag = 0
flag = 1
for i in xrange(m):
R = set(random.sample(P,1)).union(set([random.randint(0,N-1) for k in xrange(n-1)]))
L = list_of_R[i]
if len(L.intersection(R)) != 0:
flag = 0
for j in set(xrange(m)).difference({i}):
if len(list_of_R[j].intersection(R)) != 0:
flag = 1
if flag == 0:
list_of_R[i] = L.intersection(R)
if len(list_of_R[i]) &gt; 1:
oflag = 1

# Verify that the found set is the set P
print P == set.union(*list_of_R)


There is an obvious limitation to the attack. If

$m > \lfloor N/m \rfloor,$

then it is not possible to find $m$ disjoints sets in the learning phase.

Another take at a disclosure attack

Under the assumptions from before, it is very simple to formulate a statistical attack that is much simpler to execute (and is considerably faster).

By using the recorded observations (when Alice communicates), we may simple make a histogram of the communication in the batches. Let $c$ be the number of batches recorded, then the expected number of recorded communications with a non-parter of Alice is $c(n-1)/N$, while for a partner it is $c(1/m + (n-1)/N)$.

In the below graph, $c = 1000, n = 9, m = 9$. Clearly, determining the communicating partners of Alice is not hard.

This very simple attack construction performs better than [1] and does also work for $m > \lfloor N/m \rfloor$. The assumptions made in [1] are also a bit unrealistic, in particular the assumption about a uniform recipient distribution among all other communicating parties.

[1] D. Kedogan, D. Agrawal, and S. Penz. Limits of anonymity in open environments. In Revised Papers from the 5th International Workshop on Information Hiding, volume 2578 of Lecture Notes in Computer Science, pages 53–69. Springer-Verlag, 2002.