On cycles in hash functions

Assume there is a black box, that given a cryptographic hash function h, finds some x and r in polynomial time, such that applying h(), r times gives x:

h(h(...(h(x)))) = x

The question is if an oracle, denoted \mathcal{O} in the text, can be used to prove if this function is not collision resistant or pre-image resistant.

We claim the following, without proof (for now):

Theorem 1: Given a random hash function h(x) and an oracle \mathcal{O} (according to the above), then a collision h(x) = h(y), x \neq y can be found in \textnormal{lcm}(1,2,\ldots,n) time if h(x) has a cycle of length \leq n.

We will spend the following sections to prove this. Let us first consider the simplest construction for which the above claim holds. We also state the following corollary.

Corollary 1: If the function h in Theorem 1 does not have a cycle, then there exists an polynomial-time probabilistic algorithm which randomizes the cycle structure and solves the problem with probability 1.

By Corollary 1, an adversary cannot craft a special function h that averts the result of Theorem 1.

Fixed points

We start off by looking how the oracle can be used to derive fixed points, i.e. elements in h such that h(x) = x. If the oracle \mathcal{O} was to return the shortest cycle, it would immediately tells us if there is a fixpoint h(x) = x or not.

Let us assume the oracle instead picks a cycle at random (which is a reasonable assumption, as this is how cycle-finding algoritm works). We create an auxilliary function h' as follows

h'(x) = \begin{cases} x & \quad \text{if } h(x) = x,\\ x+1 & \quad \text{if } h(x) \neq x. \end{cases}

Or (equivalently) in Python:

def h_prime(x):
    if h(x) == x: # is fixpoint
        return x
    else:
        return x + 1

Once our new modified function h' is fed to the oracle, the oracle in turn will give a fix point for both h and h' if one such exists in h. More specifically, upon calling the oracle with h', it will return r = 1 and h'(x) = x for some x.
hashfunction_modifiedObviously, no other point can be a fixpoint nor in a cycle (unless the value of x wraps around and no fixpoints exist), just leading up to a fixpoint. So, the oracle can only return fixpoints and will do so if one exists.

Assuming that the function h behaves randomly, then the expected number of fixpoints is 1. As a result, we can, with high probability find at least one element that is a fixpoint. We will see that this can in fact be fixed by using randomization of the cycle structure, by composition with a random permutation.

A preimage as also easy to find. Assume that we want to find h(x) = A. We then construct

g_A(x) = \begin{cases} x & \quad \text{if } h(x) = A,\\ x+1 & \quad \text{otherwise}. \end{cases}

Any value leading to A will be a 1-cycle (or fixpoint) in the g_A. No other cycles will exist.

Random oracles

Clearly, the above idea can be used to find fixpoints in any function h acting as a random oracle (or any cycle of length r that we see fit). We will now see how to extend this idea further.

If a point y leads to a fixpoint x, then necessarily we have a collision h(y) = h(x) = x. We call y a second-order fixpoint (c.f. x, which is a first-order fixpoint).

To include the second-order points in the result from the oracle, we simply remove the first-order fixpoints and turn the second-order fixpoints into first-order fixpoints.

def h_prime(x):
    if h(x) == x: # is fixpoint
        return x+1
    elif h(x) == h(h(x)):
        # leads to fixpoint, so x != h(x)
        return x
    else:
        return x+1

Now, we can use this to determinstically find a collision. Consider an algorithm which calls the oracle with the modified function h'. Assume that \mathcal{O}(h') returns y. If y is of second order, then we have a collision.

Obviously a first-order fixpoint is a 1-cycle. But what about m-cycles? Consider the following (efficiently computable function)

\hat h_m(x) = \begin{cases} x+1 & \quad \text{if } h^m(x) = x,\\ x & \quad \text{else if } h^{m+1}(x) = h(x),\\ x+1 & \quad \text{otherwise}. \end{cases}

How does this work? Every element that is in a m-cycle will be turned into elements leading to a 1-cycle, and every element that jumps into a cycle (but is not part of it) is turned into a 1-cycle.

By setting m = \textnormal{lcm}(1,2,\ldots,n) and calling the oracle with the function \hat h_m(x), we will ultimately find a collision if h admits a cycle of length up to n. If the cost of computing \hat h_m(x) and h^m(x) can be neglected, then the complexity is constant.

cycle

Evidently, the collision is between the elements h^m(x) and y. This concludes the proof.

If no cycle exists of length \leq m (let us assume the adversary created one with special structure to thwart this attack), then it is possible to do a re-randomization of h. By composition with a random permutation \pi, we create h' = \pi \circ h. This will alter the cycle structure (and can be done multiple times). After finding a collision x,y in h', we can find one in h as \pi(x), \pi(y).

Algorithm 1:

Input: function h
Output: Colliding pair x,y

  1. Pick a random (efficiently computable) permutation \pi. Let \hat{h} = \pi \circ h.
  2. Construct h'(x) = \begin{cases} x+1 & \quad \text{if } \hat{h}(x) = x,\\ x & \quad \text{else if } \hat{h}(\hat{h}(x)) = \hat{h}(x),\\ x+1 & \quad \text{otherwise}. \end{cases}
  3. Ask oracle \mathcal{O} to find a cycle in h', call this y. If the returned length is not 1, to go 1 (first step).
  4. Return collision \pi(h'(y)), \pi(y)

As a final note: If the output should have a finite image in, say, k bits, then the x+1 must be taken \mod 2^{k}. This is no problem, as this will not constitute a cycle if there exists at least one 1-cycle.

A simple example

What would a result like this be without a real-world example? Here is a toy example, which assumes n = 3. Hence m = \textnormal{lcm}(1,2,3) = 6.

from random import randint
from hashlib import sha256

def h(x):
    return int(
        sha256(str(x)).hexdigest()[2:6], 16
    )

def h_prime(x):
    # x = h^6(x)
    if x == h(h(h(h(h(h(x)))))):
        return x + 1
    elif h(x) == h(h(h(h(h(h(h(x))))))):
        return x
    else:
        return x + 1

def oracle(h):
    x0 = randint(0, 256**2)
    tortoise = x0
    hare = h(x0)

    while True:
        tortoise = h(tortoise)
        y = h(hare)
        hare = h(y)
        if tortoise == y or tortoise == hare:
            break

    return hare

y = oracle(h_prime)
x = h(h(h(h(h(h(y))))))

assert(y != x)
assert(h(x) == h(y))

The decision oracle flavour

Worth noting, we can solve this problem with a weaker form of the oracle. If we have a fixed-point (or m-cycle) oracle that can be asked if the function h has this property and replies with yes or no, then this can be turned into the original oracle. By construction, we can keep half of the function h to remain the same, while the other part is crafted so that it has no cycles. Then, by using binary search, it is possible to determine the fixed point (or m-cycle).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s