Understanding the disclosure attack

Techniques for providing anonymity in communication has a long history. In the 1980’s, David Chaum published novel techniques based on mix networks (or mixnets), which had several positive ramifications on cryptography as a whole. A mixnet can be described as a box, which accepts messages from multiple senders. The messages are shuffled, and then propagated through the network in random order to the recipients.

mixnet

An n-to-n mixnet.

Assume that a user, Alice, wants to send a message M through a mixnet. Then, the following procedure is performed.

  1. Introduce randomness R_0 to the message.
  2. Encrypt the message (including the randomness R_0) using the public key K of the recipient.
  3. Include the address of the recipient, denoted A and some additional randomness R_1.
  4. Encrypt what is obtained in the previous step with the public key K_m of the mixnet.

The message that Alice sends then is E_{K_1}(R_1,E_K(R_0,M),A). The following is then performed by the mixnet:

  1. The mixnet decrypts using its private key.
  2. The mixnet removes randomness R_1.
  3. Extracts the adress of the recipient, i.e., A, and sends E_K(R_0,M) to that address.

The purpose of the randomness R_0 is to avoid direct guessing attacks on the encryption used between sender and recipient – for instance, if the mixnet is malicious. Moreover, the randomness R_1 asserts that the address A remains known only to the mixnet, Alice and obviously, the recipient. The use of mixnets can, and probably should, be iterated in several steps.

The disclosure attack is an attack that targets anonymity systems based on Chaum mixnets and was introduced by Kedogan, Agrawal and Penz in their seminal paper [1]. The adversary is given access to observe incoming and outgoing messages and which senders and recipients that are act in the communication.

The attack is performed under the assumption that a user Alice has m partners which are regularly communicated with, and the aim is to unravel the recipients. Denote the total number of users in the system N, the number of senders in each batch is denoted b (where 1<b<N), and the number of recipients in a batch is denoted n. All senders in a batch are assumed to be distinct, i.e., a sender can only send one message. Then, n \leq b since more than one sender can send messages to the same recipient. It is assumed that all senders uniformly chooses their recipient among the set of their partners in each communication.

The attack is divided into two phases, the learning phase and the excluding phase.

  1. In the learning phase the adversary records all receivers in a batch when a message from Alice is sent. This set of recipients is denoted by \mathcal R_i for some 0 < i \leq m.
    mixnetalice

    Whenever Alice is active, the recipients are recorded and put into \mathcal R_t. .

    The process is repeated until the adversay as m mutually disjoint sets, i.e., until

    \mathcal R_i \cap \mathcal R_j = \emptyset, \quad \forall i \neq j.

    Provided that we obtain m sets, each set \mathcal R_i contains one and only one partner of Alice. Here is a pitfall: if the sets are sampled sequentially, some set may contain two of Alice’s partners and we will not be able to find m sets.

    Expressed in Python code, it looks something like the following.

    # The universe of all possible destination addresses
    U = set(xrange(N))
    
    # Alice's communicating partners
    P = set(random.sample(U,m))
    
    # Pick (with duplicates) a random set with at least one element from P
    R = set(random.sample(P,1)).union(set([random.randint(0,N-1) for i in xrange(n-1)]))
    
    # Begin learning phase
    list_of_R = [R]
    flag = 1
    for i in xrange(m-1):
        while 1:
            R = set(random.sample(P,1)).union(set([random.randint(0,N-1) for i in xrange(n-1)]))
            flag = 0
            for L in list_of_R:
                if len(L.intersection(R)) != 0:
                    flag = 1
                    break
            if flag == 0:
                list_of_R.append(R)
                break
    
  2. The attack now proceeds to the excluding phase. A new observation \mathcal R is recorded (again, when Alice is known to communicate). If the following is satisfied

    \mathcal R \cap \mathcal R_i \neq \emptyset and \mathcal R \cap \mathcal R_j = \emptyset, \quad \forall i \neq j,

    then \mathcal R_i is replaced by \mathcal R \cap \mathcal R_i. With high probability, this is reduce the cardinality of \mathcal R_i. This procedure is repeated until all sets \mathcal R_i, 0 < i \leq m have cardinality 1, i.e., contain only one recipient. Of course, since all sets were disjoint, the communicating parters of Alice have been unraveled.

    Again, expressed in Python code, we have the following.

    # Excluding phase
    oflag = 1
    while oflag == 1:
        oflag = 0
        flag = 1
        for i in xrange(m):
            R = set(random.sample(P,1)).union(set([random.randint(0,N-1) for k in xrange(n-1)]))
            L = list_of_R[i]
            if len(L.intersection(R)) != 0:
                flag = 0
                for j in set(xrange(m)).difference({i}):
                    if len(list_of_R[j].intersection(R)) != 0:
                        flag = 1
                if flag == 0:
                    list_of_R[i] = L.intersection(R)
            if len(list_of_R[i]) &gt; 1:
                oflag = 1
    
    # Verify that the found set is the set P
    print P == set.union(*list_of_R)
    

There is an obvious limitation to the attack. If

m > \lfloor N/m \rfloor,

then it is not possible to find m disjoints sets in the learning phase.

Another take at a disclosure attack

Under the assumptions from before, it is very simple to formulate a statistical attack that is much simpler to execute (and is considerably faster).

By using the recorded observations (when Alice communicates), we may simple make a histogram of the communication in the batches. Let c be the number of batches recorded, then the expected number of recorded communications with a non-parter of Alice is c(n-1)/N, while for a partner it is c(1/m + (n-1)/N).

In the below graph, c = 1000, n = 9, m = 9. Clearly, determining the communicating partners of Alice is not hard.

graph

This very simple attack construction performs better than [1] and does also work for m > \lfloor N/m \rfloor. The assumptions made in [1] are also a bit unrealistic, in particular the assumption about a uniform recipient distribution among all other communicating parties.

[1] D. Kedogan, D. Agrawal, and S. Penz. Limits of anonymity in open environments. In Revised Papers from the 5th International Workshop on Information Hiding, volume 2578 of Lecture Notes in Computer Science, pages 53–69. Springer-Verlag, 2002.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s