How to backdoor a cipher OR how to solve not-backdoored@MidnightSunCTF

On the 18th to 19th of September MidnightSunCTF was held. Among the challenges, there was a crypto challenge based on a Feistel construction.

The following code was given:

import os
import copy

from secret import flag

class NB_128:

    ROUNDS = 44
    LENGTH = 15
    KEYLEN = 16
    CONST  = 3

    A = [
    B = [

    def F(x, i):
        if i % 2:
            return NB_128.A[x]
            return NB_128.B[x]

    def rot(s):
        return s[1:] + [s[0]]

    def encrypt(self, m):
        m = copy.copy(m)
        for i in range(NB_128.ROUNDS):
            m[0] ^= NB_128.F(
                m[1] ^ self.key[i % NB_128.KEYLEN], i
            ) ^ NB_128.CONST
            m = NB_128.rot(m)
        return m
    def __init__(self, key):
        self.key = key

key = list(bytearray(os.urandom(16)))
message = list(bytearray(flag))

cipher = NB_128(key)
ciphertext = cipher.encrypt(message)

It is quite easy to notice that certain ciphertext bytes depend only on five of the key bytes along with message bytes and therefore can be brute forced in at most 2^{5 \times 8} guesses (which was the methodology pursued by — to my knowledge — all participants), which obviously could be a very badly concealed backdoor. However, this is not the actual backdoor.

The actual backdoor is that the S-box was generated by a polynomial over \mathbf F_{2^8}, but with randomly chosen indices swapped. This thwarts a possible interpolation attack, because the S-box is not a perfect polynomial. So, how does one detect such a relation? One possible answer is Reed-Solomon codes which essentially performs a best-fit interpolation.

If A[x] = y, then it means that P_A(x) = y. So, if we include all 256 points, we get a matrix relation of the form

\mathbf y = \beta G_A = \begin{bmatrix}  \beta_0 & \beta_1 & \cdots & b_{k-1} \end{bmatrix} \begin{bmatrix} 1 & \dots & 1 & \dots & 1 \\ \alpha_1 & \dots & \alpha_k & \dots & \alpha_n \\ \alpha_1^2 & \dots & \alpha_k^2 & \dots & \alpha_n^2 \\ \vdots & & \vdots & & \vdots \\ \alpha_1^{k-1} & \dots & \alpha_k^{k-1} & \dots & \alpha_n^{k-1} \end{bmatrix}

This forms a [2^8, k]-linear (RS) code, where polynomials up to degree k can be found.

R = GF(2^8)
pointsA = [R.fetch_int(A[i]) for i in range(256)]
pointsB = [R.fetch_int(B[i]) for i in range(256)]

# we can find the best approximation under some polynomial
# using the RS-decoder.
C = codes.GeneralizedReedSolomonCode([R.fetch_int(x) for x in range(256)], 128)
D = codes.decoders.GRSBerlekampWelchDecoder(C)
print("S-box A approximation:", D.decode_to_message(vector(pointsA)))
print("S-box B approximation:", D.decode_to_message(vector(pointsB)))

This gives two polynomials that are unique for each S-box. Using these relations, we can compute a polynomial representation of the cipher, albeit with some chance of error depending on the number of rounds. Again, these relations between ciphertext, message and key bytes form a linear code. Note that we treat different powers of key bytes k_i^j and k_i^l where j \neq l as separate.

R2.<k1, k2, k3, k4, k5, k6, k7, k8, k9, k10, k11, k12, k13, k14, k15, k16> = PolynomialRing(R)
v_keys = [k1, k2, k3, k4, k5, k6, k7, k8, k9, k10, k11, k12, k13, k14, k15, k16]

block = 0 # use different different blocks to reveal different key bits, start with the ones involving fewest relations
for m, c in zip(messages, ciphertexts):
    vc = copy(m)
    for i in range(ROUNDS):
        # modify c non-linearly using symbols and x^3 approximation
        vc[0] += F(vc[1] + v_keys[i % KEYS], i) + constant
        vc = rol(vc)
    CC += [c[block]]
    VV += [vc[block]]

Q = []
b = []
for vv, cc in zip(VV, CC):
    b.append(cc)Q = Sequence(Q)

C, vb = Q.coefficient_matrix()
code = codes.LinearCode(C.transpose())
ISD = LeeBrickellISDAlgorithm(code, (0, 18))
sol = C.solve_right(ISD.decode(vector(b)))

This will reveal the key bytes.

Partitioning attacks on Grain128-AEAD

Grain128-AEAD is a stream cipher, which has reached round 2 in the light-weight cryptography competition organized by NIST. The authenticated version is designed by Hell, Johansson, Meier, Sönnerup and Yoshida. In particular, the cipher consists of two main blocks: an LFSR and an NFSR. The authentication works by extracting bits from the keystream, which are exclusively used for the authentication tag, via an accumulator. A nice consequence of this, is that the security of the tag — in some regard — can be reduced to the security of the cipher.

Grain building blocks.

The authentication tag is computed as

\begin{bmatrix}t_0\\t_1\\ \vdots \\ t_{w-1}\end{bmatrix} = \begin{bmatrix}k_{-1} & k_{0} & k_{1} & \cdots & k_{L-2}\\  k_{-2} & k_{-1} & k_{0} & \cdots & k_{L-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ k_{-w} & k_{-w+1} & k_{-w+2} & \cdots & k_{L-w-1}\end{bmatrix} \begin{bmatrix}m_0\\m_1\\ \vdots \\ m_{w-1}\end{bmatrix}

So, given the matrix K, we can solve the equations to find a message which has any tag \mathbf t we want. If we have a set of keys, say {a,b}, we can compute the corresponding key matrices K_a, K_b. Note that since we know {a,b}, we can make the dimension of them arbitrarily large.

Construct K' = \begin{bmatrix} K_a & K_n\end{bmatrix} and solve the equation system \mathbf t = K' \mathbf m for any arbitrary \mathbf t \in \mathbb F_2^{64}. Now, we have solved the problem of the same message evaluating to the same tag. But this not exactly what we want — instead we want equal ciphertexts to evaluate to the same tag. If we denote the keystream used for encryption as \mathbf z and the ciphertext \mathbf c = \mathbf z + \mathbf m, then we can rewrite the above as

\mathbf t = K' (\mathbf c + \mathbf z) \iff \mathbf t + K' \mathbf z = K' \mathbf c

and we solve for c. We now get two equal ciphertexts (and same IV), but different decrypted messages which evaluate to the same arbitrary tag. Of course, we can extend this to an arbitrary amount of messages. Compared to GCM and Poly1305, which both have a nice structured matrix allowing for n^2 complexity, the inclusion of non-linear bits in Grain makes the solving part a bit more expensive and ends up around n^3.

In the context of partition oracles, we typically assume that the keys were generated using a KDF from a large set of passwords which we like to probe for using limited queries.

For instance, the block

\begin{array}{rcl}c &=& \texttt{0000110100011001110111110101101000110001} \\ && \texttt{1111101000111010101010101001110101000010}\\ && \texttt{0100010010000110111101111010110000010111}\\ && \texttt{0011001010000000000000000000000000000000}\\ \\t & = & \texttt{0000000000000000000000000000000000000000000000000000000000000000}\end{array}

decrypts successfully (i.e. tag is valid) under both keys

\begin{array}{rcl}a & = & \texttt{0000000000000000000000000000000000000000000000000000000000000000} \\ && \texttt{0000000000000000000000000000000000000000000000000000000000000000}\\\\b & = & \texttt{1000000000000000000000000000000000000000000000000000000000000000} \\ && \texttt{0000000000000000000000000000000000000000000000000000000000000000}\end{array}

So, using the above cipertext and tag we can simultaneously test for two passwords such that \textsf{KDF}(\texttt{hunter2}) = a and \textsf{KDF}(\texttt{123456}) = b using an oracle which validates the tag. The response can be error based, timing based or some other measurable information.

On cycles in hash functions

Assume there is a black box, that given a cryptographic hash function h, finds some x and r in polynomial time, such that applying h(), r times gives x:

h(h(...(h(x)))) = x

The question is if an oracle, denoted \mathcal{O} in the text, can be used to prove if this function is not collision resistant or pre-image resistant.

We claim the following, without proof (for now):

Theorem 1: Given a random hash function h(x) and an oracle \mathcal{O} (according to the above), then a collision h(x) = h(y), x \neq y can be found in \textnormal{lcm}(1,2,\ldots,n) time if h(x) has a cycle of length \leq n.

We will spend the following sections to prove this. Let us first consider the simplest construction for which the above claim holds. We also state the following corollary.

Corollary 1: If the function h in Theorem 1 does not have a cycle, then there exists an polynomial-time probabilistic algorithm which randomizes the cycle structure and solves the problem with probability 1.

By Corollary 1, an adversary cannot craft a special function h that averts the result of Theorem 1.

Fixed points

We start off by looking how the oracle can be used to derive fixed points, i.e. elements in h such that h(x) = x. If the oracle \mathcal{O} was to return the shortest cycle, it would immediately tells us if there is a fixpoint h(x) = x or not.

Let us assume the oracle instead picks a cycle at random (which is a reasonable assumption, as this is how cycle-finding algoritm works). We create an auxilliary function h' as follows

h'(x) = \begin{cases} x & \quad \text{if } h(x) = x,\\ x+1 & \quad \text{if } h(x) \neq x. \end{cases}

Or (equivalently) in Python:

def h_prime(x):
    if h(x) == x: # is fixpoint
        return x
        return x + 1

Once our new modified function h' is fed to the oracle, the oracle in turn will give a fix point for both h and h' if one such exists in h. More specifically, upon calling the oracle with h', it will return r = 1 and h'(x) = x for some x.
hashfunction_modifiedObviously, no other point can be a fixpoint nor in a cycle (unless the value of x wraps around and no fixpoints exist), just leading up to a fixpoint. So, the oracle can only return fixpoints and will do so if one exists.

Assuming that the function h behaves randomly, then the expected number of fixpoints is 1. As a result, we can, with high probability find at least one element that is a fixpoint. We will see that this can in fact be fixed by using randomization of the cycle structure, by composition with a random permutation.

A preimage as also easy to find. Assume that we want to find h(x) = A. We then construct

g_A(x) = \begin{cases} x & \quad \text{if } h(x) = A,\\ x+1 & \quad \text{otherwise}. \end{cases}

Any value leading to A will be a 1-cycle (or fixpoint) in the g_A. No other cycles will exist.

Random oracles

Clearly, the above idea can be used to find fixpoints in any function h acting as a random oracle (or any cycle of length r that we see fit). We will now see how to extend this idea further.

If a point y leads to a fixpoint x, then necessarily we have a collision h(y) = h(x) = x. We call y a second-order fixpoint (c.f. x, which is a first-order fixpoint).

To include the second-order points in the result from the oracle, we simply remove the first-order fixpoints and turn the second-order fixpoints into first-order fixpoints.

def h_prime(x):
    if h(x) == x: # is fixpoint
        return x+1
    elif h(x) == h(h(x)):
        # leads to fixpoint, so x != h(x)
        return x
        return x+1

Now, we can use this to determinstically find a collision. Consider an algorithm which calls the oracle with the modified function h'. Assume that \mathcal{O}(h') returns y. If y is of second order, then we have a collision.

Obviously a first-order fixpoint is a 1-cycle. But what about m-cycles? Consider the following (efficiently computable function)

\hat h_m(x) = \begin{cases} x+1 & \quad \text{if } h^m(x) = x,\\ x & \quad \text{else if } h^{m+1}(x) = h(x),\\ x+1 & \quad \text{otherwise}. \end{cases}

How does this work? Every element that is in a m-cycle will be turned into elements leading to a 1-cycle, and every element that jumps into a cycle (but is not part of it) is turned into a 1-cycle.

By setting m = \textnormal{lcm}(1,2,\ldots,n) and calling the oracle with the function \hat h_m(x), we will ultimately find a collision if h admits a cycle of length up to n. If the cost of computing \hat h_m(x) and h^m(x) can be neglected, then the complexity is constant.


Evidently, the collision is between the elements h^m(x) and y. This concludes the proof.

If no cycle exists of length \leq m (let us assume the adversary created one with special structure to thwart this attack), then it is possible to do a re-randomization of h. By composition with a random permutation \pi, we create h' = \pi \circ h. This will alter the cycle structure (and can be done multiple times). After finding a collision x,y in h', we can find one in h as \pi(x), \pi(y).

Algorithm 1:

Input: function h
Output: Colliding pair x,y

  1. Pick a random (efficiently computable) permutation \pi. Let \hat{h} = \pi \circ h.
  2. Construct h'(x) = \begin{cases} x+1 & \quad \text{if } \hat{h}(x) = x,\\ x & \quad \text{else if } \hat{h}(\hat{h}(x)) = \hat{h}(x),\\ x+1 & \quad \text{otherwise}. \end{cases}
  3. Ask oracle \mathcal{O} to find a cycle in h', call this y. If the returned length is not 1, to go 1 (first step).
  4. Return collision \pi(h'(y)), \pi(y)

As a final note: If the output should have a finite image in, say, k bits, then the x+1 must be taken \mod 2^{k}. This is no problem, as this will not constitute a cycle if there exists at least one 1-cycle.

A simple example

What would a result like this be without a real-world example? Here is a toy example, which assumes n = 3. Hence m = \textnormal{lcm}(1,2,3) = 6.

from random import randint
from hashlib import sha256

def h(x):
    return int(
        sha256(str(x)).hexdigest()[2:6], 16

def h_prime(x):
    # x = h^6(x)
    if x == h(h(h(h(h(h(x)))))):
        return x + 1
    elif h(x) == h(h(h(h(h(h(h(x))))))):
        return x
        return x + 1

def oracle(h):
    x0 = randint(0, 256**2)
    tortoise = x0
    hare = h(x0)

    while True:
        tortoise = h(tortoise)
        y = h(hare)
        hare = h(y)
        if tortoise == y or tortoise == hare:

    return hare

y = oracle(h_prime)
x = h(h(h(h(h(h(y))))))

assert(y != x)
assert(h(x) == h(y))

The decision oracle flavour

Worth noting, we can solve this problem with a weaker form of the oracle. If we have a fixed-point (or m-cycle) oracle that can be asked if the function h has this property and replies with yes or no, then this can be turned into the original oracle. By construction, we can keep half of the function h to remain the same, while the other part is crafted so that it has no cycles. Then, by using binary search, it is possible to determine the fixed point (or m-cycle).

Attacking the polynomials in the E0 cipher


The short-range wireless technology Bluetooth uses the encryption standard E0. The keystream generator E0 is built using a combination generator with four LFSRs of lengths L_1 = 25, L_2 = 31, L_3 = 33, L_4 = 39 and primitive feedback polynomials

\left\{ \begin{array}{rcl} p_1(x) &=& x^{25} +x^{20} +x^{12} + x^8 +1,\\\\ p_2(x) &=& x^{31} +x^{24} +x^{16} +x^{12} +1,\\\\ p_3(x) &=& x^{33} +x^{28} +x^{24} + x^4 +1,\\\\ p_4(x) &=& x^{39} +x^{36} +x^{28} + x^4 +1, \end{array} \right.


This is a short result on the E0 Core (sometimes called one-level E0), explicitly stating the low-weight polynomials needed in the key-recovery attack proposed in [LV04].

To mount a more efficient attack, the authors of [LV04] developed a maximum-likelihood decoding algorithm based on the fast Walsh transform. The algorithm recovers, given an element in \mathbb{F}_q^n and a linear code over \mathbb{F}_q of dimension L and length n, the closest codeword in time \mathcal{O}(n+L \cdot 2^L) and memory \min(n,2^L).

In order to succeed with the attack, they need to find weight-4 polynomial multiples of p_1(x)p_3(x)p_4(x) which has degree 97.

Parallell implementation

We implement an algorithm based on [LJ14], with multiple threads in mind.

Algorithm 1

Input: Polynomial P(x) of degree d_P, small integer \alpha.
Output: Polynomial multiple K(x) of weight 4.

1. Select a random subset \mathcal{Q} \subset \{1,2\ldots,d_P\}.
2. From P(x), create all residues x^i \bmod P(x) for 0 \leq i \leq 2^{d_P/3 + \alpha} and put (x^i \bmod P(x), i) in list \mathcal{L}_1.
3. Create all \phi_\mathcal{Q}(x^i + x^j) = 0 for 0 \leq i < j < 2^{d_P/3 + \alpha}. This is done by merging the list \mathcal{L}_1 with it self, keeping only elements \phi_\mathcal{Q}(x^i + x^j) = 0.
4. Sort the resulting list \mathcal{L}_2 by residue value.
5. Merge the list \mathcal{L}_2 by itself, keeping only multiples x^i + x^j + x^k + x^l \bmod P(x) = 0. Output a list of all, or the one with smallest degree.

Let the number of worker threads be m, where m is not a multiple of 2. We choose m to be a prime number close to the number of max threads. We then create m^2 buckets, each containing a hashmap or other \mathcal{O}(1) data structure. We order the buckets as a two-dimensional structure T_1[i,j] with 0 < i,j < m.

In implementation, every thread generates the full set of monomials x^0, x^1, \ldots, x^{d_P/3 + \alpha}, each taken modulo P(x). This computation is very light in terms of performance compared to memory reads and writes.

Define the function

f : \mathbb{F}_2[x]/P(x) \longrightarrow \mathbb{Z}_m

as f(x^i) = \left(2^i \bmod P(2)\right) \bmod m. Clearly if \phi(x^i) = \phi(x^j), then necessarily f(\phi(x^i)) = f(\phi(x^j)) when taken modulo P(x).

The functional value of f determines if the thread should consider it or not, i.e., if the value matches the thread number. This way, we achieve thread safety without using mutex.

Skärmavbild 2020-04-26 kl. 20.23.16

A thread j will put element x^i in T_1[ f(\phi(x^i)), j], unless it is already occupied. If so, say with x^k was already occupying that space in T_1, then it proceeds to T_2[f(\phi(x^i + x^k)), j] and puts x^i + x^k there.

In the final step, we swap the accessing order. So each thread will only handle elements x^i + x^j which had equal f-function values. As equal f values are necessary (but not sufficient) condition for collision to occur, it is impossible to filter out correct candidates by mistake.

Skärmavbild 2020-04-27 kl. 11.14.45

Each thread will merge the elements in the corresponding row. Once collision is found, program outputs i,j,k,l such that x^i + x^j + x^k + x^l = 0 \bmod P(x) and terminates. If no collision is found, we re-iterate with a new mask selection \mathcal{Q} for \phi.


The polynomial of degree 72 can be found by setting \alpha = 1 and runs in a few seconds using m = 23 threads. For the 97-degree polynomial, we need about 560 GB of RAM. While this may sound immensly large, it can actually be done by running it on an AWS instance at relatively low cost.

Another solution is to pick |\mathcal{Q}| = d_P/3 - u for some small integer u. This reduces the size of the first collision layer T_1. Then, for each thread, we require that f(x^i + x^j) = 0 for the element to be added into T_2. This reduces the size of the stored data to \mathcal{O}(2^{d_P/3 - u} + 2^{d_P/3 + u}/m), increases the running time by a factor m.

\begin{array}{l|ll} \text{Polynomial} \qquad & \qquad \text{Degree} & \qquad\text{Weight-4 multiple} \\ \\ \hline\hline \\ p_3(x)p_4(x) \qquad & \qquad 72 & \qquad x^{24352899}+x^{10387660}+x^{7833802}+1\\ \\ & & \qquad x^{31807163}+x^{24477832}+x^{4308951}+1\\ \\ p_1(x)p_3(x)p_4(x)\qquad & \qquad 97 & \qquad x^{12647431389}+x^{8008354628}+x^{1277498415}+1\\ \\ & & \qquad x^{12671067191}+x^{6590666154}+x^{1306461382}+1 \end{array}

This actualizes the attack on the E0 cipher by Lu and Vaudenay.


sage: Q0 = R(x^12647431389+x^8008354628+x^1277498415+1)
sage: Q1 = R(x^12671067191+x^6590666154+x^1306461382+1) 
sage: P = R(x^97+x^92+x^89+x^80+x^76+x^73+x^69+x^68+x^64+x^61+x^59+x^58+x^57+x^56+x^55+x^54+x^53+x^51+x^49+x^44+
....: x^43+x^41+x^39+x^36+x^35+x^34+x^32+x^31+x^30+x^21+x^19+x^15+x^13+x^9+x^5+x^3+1) 
sage: Q0 % P 
sage: Q1 % P 


Many thanks to klondike (KISec AB) for providing help with computational resources and improving the code.


[LV04] Yi Lu and Serge Vaudenay, Faster Correlation Attack on Bluetooth Keystream Generator E0, CRYPTO 2004, LNCS vol. 3152, pp. 407-425, Springer-Verlag, 2004.


Most recently, four security researchers disclosed two vulnerabilities on TPM modules, named collectively under the umbrella name TPM-FAIL. Effectively, the attack allows an attacker to leak cryptographic keys stored inside the TPM modules. More specifically, the attack is targeted at a timing-information leakage within elliptic-curve computation.

Elliptic curve signatures

The private and public keys are generated as follows.

\underbrace{d}_{\textnormal{private}} \cdot \underbrace{P}_{\textnormal{generator}} = \underbrace{Q}_{\textnormal{public}},

where d \in \mathbb{Z}_p and P,Q \in E. Below is an (insecure) example of such a construction, in Sage.

p = random_prime(2**128)
R = GF(p)
E = EllipticCurve(R, [-3, 1])

# generate public/private key
d = ZZ(R.random_element())
P = E.random_element()
Q = ZZ(d) * P

The signature in DSA is computed  in the following way. First pick a secret integer k \in \mathbb{Z}_p. Then compute k \cdot Q = (x,y). Set r = x. Then compute

s = k^{-1} \cdot ( H(m) + d\cdot r)

The signature is (r,s). Here is how to compute it.

# message to be signed
message = "sample message"
Hm = int(md5(message).hexdigest(), 16)
k = ZZ(R.random_element())
r,_,_ = k*Q
s = 1/R(k)*(Hm +d*r)
signature = (r, s)

Timing-information leaks

The attacks exploits the simple property that the computation of k \cdot Q does not finish in constant time. If k is small, then it completes measurably faster.

Assume that we have an oracle for this and can obtain signatures with, say, 20 of the leading bits of the secret nonce k set to 0.

# samples obtained from the oracle
ns = 50
A = []
B = []

# real values for k
ks = []

# sign with small k
for i in range(0, ns):
    k = ZZ(R.random_element()) // 2^20
    ks += [k]
    r,_,_ = k*Q
    s = 1/R(k)*(Hm +d*r)
    A += [1/s * Hm]
    B += [1/s * r]

Running this code, gives us enough signatures to recover k.

Lattice attack

The attack proceeds by using lattice-reduction techniques, which finds short vectors in a lattice. The researchers construct a lattice in the following way

\begin{pmatrix}p & & & & & \\ & p & & & & \\ & &\ddots & & & \\ & & & p & & \\ B_1 & B_2 & \cdots & B_t & K/p & \\ A_1 & A_2 & \cdots & A_t & & K \end{pmatrix}

where the blanks are 0. The upper row with the p entries exist to get the behavior of modulo reduction by p.

Here, A and B are defined as

k = \underbrace{s^{-1} \cdot H(m)}_{A} + \underbrace{s^{-1} \cdot r }_{B} \cdot d

which gives us a relation for k in terms of known and computable values. The value K is a weighting factor, which effectively forces the coefficient for A to be 1 in the lattice reduction (whereas a randomly selected d is typically in the order of p). Continuing our example, by setting K=p, we can implement the attack as follows.

K = p
X = p * identity_matrix(QQ, ns)
Z = matrix(QQ, [0] * ns + [K/p] + [0]).transpose()
Z2 = matrix(QQ, [0] * (ns+1) + [K]).transpose()

Y = block_matrix([[X],[matrix(QQ, B)], [matrix(QQ, A)]])
Y = block_matrix([[Y, Z, Z2]])
Y = Y.LLL()
print "Found", Y[-1][0]
print "Should be", ks[0]

With high probability, this yields the secret k. From this, it is trivial to obtain the private key d.

SEC-T CTF – likeason

On this years SEC-T CTF, I made three challenges. One of the challenges, likeason[1] was given as follows:

#! /usr/bin/env python
import sys

from secret import d, n

def oracle(c, x=1):
    res = bin(pow(c, d * x, n)).count('1') & 0x1
    return int(res)

while True:
        c = [int(x) for x in sys.stdin.readline().split(" ")]
        if len(c) == 1:
            sys.stdout.write("%d\n" % oracle(c[0]))
            sys.stdout.write("%d\n" % oracle(c[0], x=c[1]))
    except Exception:


# e = 0x10001
# c = 32028315366572316530941187471534975579021238700122793819215559206747120150118490538115208229905399038122261293920013020124257186389163654867911967899754432511568776857320594304655042217535057811315461534790485879395513727264354857833013662037825295017832478478693519684813603327210332854320793948700855663229

As we can see, the oracle returns the parity of the bits in the decrypted plaintext. We also know that the ciphertext contained only {" ", ".", "-"}, followed by some random padding. A specific format is actually not needed, but it makes the challenge easier. The reader may notice that it has some resemblence with the Bleichenbacher oracle, but an easier way is to treat it as a regular (but unreliable) LSB (or MSB) oracle. We observe that for any unique query

\mathbb{P}(\text{overflow}~|~\text{parity flip}) = 1

but we also have that

\mathbb{P}(\text{overflow}~|~\text{no parity flip}) = 1/2.

We may assume random events, although this is not completely true.

Finding the modulus

First we need to find the modulus n. This is quite simple. Pick a number k and send (2^k)^e by sending the pair 2**k e to the oracle. We know that parity is always 1 if 2^k < n. However, when 2^k \geq n and 2^{k-1} < n, the decryption yields 2^k - n. This may cause a parity flip, but not necessary. So, what can we do about that? Randomization!


Instead, we can send 2^k + r, where r is a random number, sufficiently small (here, r < \sqrt{2^{k}} works) to not cause overflow when 2^k < n. Alternatively, we can use a single bit, i.e. r = 2^j for some j < k. In this manner, we can send queries and w.h.p detect the parity changes.

Finding the plaintext using known plaintext properties

An efficient way of determining the plaintext is as follows:

First, build a set of parities from sufficiently many queries of the form \{ \mathcal{O}(c \cdot (2^e)^i : 0 \leq i \leq M \}, for some integer M. This is mostly for caching purposes, so that no redundant calls are needed to be made.

Define an upperbound and lowerbound on the plaintext m, U and L, respectively. Then, build a search tree using the ideas from attacking LSB (or MSB oracles). If we observe a parity flip, from previous step, then we record a new lowerbound L \leftarrow \frac12(U + L) in its single child node. On the other hand, if there is no flip from previous we cannot be sure (which is the same problem as before, when determining n). Define the parity at step i to be b_i. Then, recursion will be as follows:

f(i, U, L) =   \begin{cases}     f(i+1, U, \frac12(U + L))       & \quad \text{if } b_i \neq b_{i-1}\\     f(i+1, U, \frac12(U + L)), f(i+1, \frac12(U + L), L)  & \quad \text{if } b_i = b_{i-1}.   \end{cases}

So, to be certain, we need to investigate both paths (i.e., assuming an overflow and updating the lowerbound and no overflow, hence setting a new upperbound U \leftarrow \frac12(U + L)). We can do so, and prune paths by using that the bytes for which the upperbound and lowerbound agrees need to be in the alphabet for the message (see above). This was the intended solution, and probably easiest to find.

Finding the plaintext using randomization

Will be added when I have time to describe it in detail.

[1] An anagram for snakeoil in honor to Crown Sterling.


Finding magic hashes with hashcat

During the last months, the problem of finding magic hashes for various hash functions has had a renaissance. After this tweet, the over-a-decade-old problem received new attention from a few people in the security community and, as a consequence, a flurry of new magical hashes were found.

Magical hashes have their roots in PHP. In particular, it all revolves around the improper use of the equality operator ==. PHP uses implicit typing, e.g., if a string has the format 0e[digits], then it is interpreted as the scientific notation for 0 \times 10^{\texttt{[digits]}}. This has some undesired side effects: when comparing hashes from password input it causes a small but non-negligible subset of elements that are all equal under the == operation.

The probability of a hash being magical is easy to compute if we can assume that the hash function behaves completely at random. Let the H : A \rightarrow B be a hash function with l-hexdigit long output. If we disregard all but the ones starting with 0e..., we have

\mathbb{P}(H(x) \text{ is magical}) = \left(\frac{1}{10}\right)^2 \cdot \left(\frac{10}{16}\right)^l

for an x randomly selected over all strings of any length.


The inverse probability amounts to the number of trials (or hash-function oracle calls) required on average to find a magic hash. For instance, SHA-256, which has an output length corresponds to about 2^{50} calls.

Modifying hashcat kernels

We will take SHA-224 as an example, since it is a bit easier to find magic hashes for. First, we are going to define two masks:

The first one returns 0 if the hex representation contains nothing but decimal digits:

#define  IS_MAGIC(x)    ((x & 0x88888888) & (((x & 0x22222222) >> 1 | (x & 0x44444444) >> 2) & ((x & 0x88888888) >> 3)) << 3)

The second one does the same as the first, but it forces the first two hex digits to 0e.

#define  IS_MAGIC_H(x)  ((x & 0x00888888) & (((x & 0x00222222) >> 1 | (x & 0x00444444) >> 2) & ((x & 0x00888888) >> 3)) << 3) | ((x & 0xff000000) ^ 0x0e000000)

We add these lines to inc_hash_sha224.h. Then, we locate the function sha224_final_vector in

DECLSPEC void sha224_final_vector (sha224_ctx_vector_t *ctx)
  MAYBE_VOLATILE const int pos = ctx->len & 63;  
  append_0x80_4x4 (ctx->w0, ctx->w1, ctx->w2, ctx->w3, pos ^ 3);  
  if (pos >= 56)
     sha224_transform_vector (ctx->w0, ctx->w1, ctx->w2, ctx->w3, ctx->h);
     ctx->w0[0] = 0;
     ctx->w0[1] = 0;
     ctx->w0[2] = 0;
     ctx->w0[3] = 0;
     ctx->w1[0] = 0;
     ctx->w1[1] = 0;
     ctx->w1[2] = 0;
     ctx->w1[3] = 0;
     ctx->w2[0] = 0;
     ctx->w2[1] = 0;
     ctx->w2[2] = 0;
     ctx->w2[3] = 0;
     ctx->w3[0] = 0;
     ctx->w3[1] = 0;
     ctx->w3[2] = 0;
     ctx->w3[3] = 0;
  ctx->w3[2] = 0;
  ctx->w3[3] = ctx->len * 8;  
  sha224_transform_vector (ctx->w0, ctx->w1, ctx->w2, ctx->w3, ctx->h);

  // Add these lines
  ctx->h[0] = IS_MAGIC_H(ctx->h[0]); // first block starts with 0e
  ctx->h[1] = IS_MAGIC(ctx->h[1]);
  ctx->h[2] = IS_MAGIC(ctx->h[2]);
  ctx->h[3] = IS_MAGIC(ctx->h[3]);
  ctx->h[4] = IS_MAGIC(ctx->h[4]);
  ctx->h[5] = IS_MAGIC(ctx->h[5]);
  ctx->h[6] = IS_MAGIC(ctx->h[6]);

Finally, we need to modify the corresponding module More specifically, we modify the following function:

KERNEL_FQ void m01300_sxx (KERN_ATTR_VECTOR ())
   * modifier

  const u64 lid = get_local_id (0);
  const u64 gid = get_global_id (0);

  if (gid >= gid_max) return;

   * digest

  const u32 search[4] =

   * base

  const u32 pw_len = pws[gid].pw_len;

  u32x w[64] = { 0 };

  for (u32 i = 0, idx = 0; i < pw_len; i += 4, idx += 1)
    w[idx] = pws[gid].i[idx];

   * loop

  u32x w0l = w[0];

  for (u32 il_pos = 0; il_pos < il_cnt; il_pos += VECT_SIZE)
    const u32x w0r = words_buf_r[il_pos / VECT_SIZE];

    const u32x w0 = w0l | w0r;

    w[0] = w0;

    sha224_ctx_vector_t ctx;

    sha224_init_vector (&ctx);

    sha224_update_vector (&ctx, w, pw_len);

    sha224_final_vector (&ctx);

    const u32x r0 = ctx.h[0];
    const u32x r1 = ctx.h[1];
    const u32x r2 = ctx.h[2];
    const u32x r3 = ctx.h[3];
    const u32x r4 = ctx.h[4];
    const u32x r5 = ctx.h[5];
    const u32x r6 = ctx.h[6];
    // this is one hacky solution
    if ((r4 == 0) && (r5 == 0) && (r6 == 0)) {
       COMPARE_S_SIMD (r0, r1, r2, r3);

This assumes no vectorization. Now build hashcat. Since all magic hashes will be mapped to the all-zero sequence, we run it as follows:

$ ./hashcat.bin -m 1300 00000000000000000000000000000000000000000000000000000000 -a 3 --self-test-disable --keep-guessing

--keep-guessing causes hashcat to continue looking for hashes, even after one has been found. The --self-test-disable is needed, since we have modified how SHA-224 functions, and the testcases defined in hashcat will consequently fail.

Running it on AWS



Relatively quickly, we found several magic hashes using this modification for SHA-224:


For SHA-256:


For a more comprehensive list of hashes, refer to this Github repo.

Thanks to @0xb0bb for providing crucial help with computational resources.

On the hardness of finding minimal-weight codewords in a QC-MDPC code

Problem 1  (Knapsack)

We want to maximize the value of a subset of items such that the weight of the set is no larger than weight W.

\sum_{i \in \mathcal{S}} w_i x_i \leq W
\max \sum_{i \in \mathcal{S}} v_i x_i

Let us first study another problem. It is easy to see that this problem is at least as hard as Problem 1.

Problem 2  (Auxiliary problem)

\max x_1x_2x_3 + x_2x_5 + x_4x_6x_7 + \cdots
\|\begin{matrix} x_1 & x_2 & \cdots & x_k \end{matrix} \|_1 = m, x_i \in \{0, 1\}

We can solve Problem 1 using an oracle for Problem 2. For a knapsack instance with an item of weight w_i and weight v_i, we create a v_i terms \cdot x_1x_2\cdots x_{w_i}. The x_i must be unique for every item. This is done for every item in the knapsack. So, Problem 2 is NP-hard.

Now, we want to build some theory towards polynomials. Consider the following construction. Assign each x_i with a corresponding monomial (x - a_i). Now assume that we want to represent the term x_1 x_2 x_3. Define p(x) = x^p + 1 \in \mathbb{F}_q to be a product \prod_{i=0}^p(x-a_i). p is a prime number.

We map the term x_1 x_2 x_3 to the polynomial

Q_1(x) = p(x) \cdot \left[(x-a_1)(x-a_2)(x-a_3)\right]^{-1}

This means that Q_1(x) \cdot (x-a_1)(x-a_2)(x-a_3) = 0 \bmod p(x). To consider all terms jointly, pick a number large number d. Then, we create a polynomial using the terms Q_2(x) = p(x) \cdot \left[(x-a_2)(x-a_5)\right]^{-1}Q_3(x) = p(x) \cdot \left[(x-a_4)(x-a_6)(x-a_7)\right]^{-1} and so on. We assume there are n terms.

Then, the polynomial Q(x) is constructed as Q(x) = \sum_{i=1}^n x^{di} \cdot Q_i(x). Since d was a large number, it means that the terms of Q_i(x) and Q_{i+1}(x) will not interfere.

We now turn this problem into an instance of finding a low-weight codeword in a QC-MDPC code.

Problem 3  (Low-weight codeword of QC-MDPC code)

Let G be the generator of a QC-MDPC code. Find a non-zero codeword with minimal weight, i.e., minimize \| \mathbf u G \|_1.

There is a bijective mapping from a quasi-cyclic code to a polynomial, i.e., G(x) generates a code as well. So, Q(x) generates a quasi-cyclic code. Let us first assume we have access to an oracle for Problem 2.

Indeed, a optimal solution to Problem 2 will give rise to a codeword in the corresponding code of low weight. Assume that the optimum is x_1x_2x_3 + x_2x_5 + x_4x_6x_7 + \cdots = b. This means that u(x) is of degree m and that \|u(x) G(x)\|_1 \leq p(n-b), i.e., u(x) G(x) has at most p (n-b) non-zero coefficients.

Using an oracle for Problem 3, we find a minimal-weight codeword for the generator Q(x).

Experiments with index calculus

In index calculus, the main problem is to represent powers of g in a predefined prime-number basis. We are interested in unravel x from h = g^x\bmod p.

Normal approach

First, we find some offset hg^j = g^{x+j}\bmod p such that the factorization is B-smooth.

Using the prime-number basis (-1)^{e_1}2^{e_2}\cdots, generate a corresponding vector

\mathbf y = \begin{pmatrix}1 & 0 & 3 & 0 & \cdots & 1\end{pmatrix}

Then, we generate powers g^1, g^2, ... and check if they have the same property. The results are put into a matrix after which one performs Gaussian eliminiation over \mathbb{Z}_{p-1}.

A | \mathbf v^T = \left( \begin{array}{cccccc|cc}0 & 3 & 2 & 2 & \cdots & 1 & \mathbf{13} \\ 1 & 0 & 2 & 1 & \cdots & 0 & \mathbf{112} \\ 0 & 5 & 2 & 0 & \cdots & 0 & \mathbf{127} \\\vdots & \vdots & \vdots & \vdots & & \vdots & \vdots \\ 1 & 1 & 0 & 0 & \cdots & 0 & \mathbf{67114}\end{array}\right)

\mathbf xA = \mathbf y. Then compute \mathbf x \cdot \mathbf v to find x+j.

Different approach

Almost like in the normal approach, we find some offset hg^j = g^{x+j} \bmod p such that the factorization of at least fraction of hg^j \bmod p greater than \sqrt{p} is B-smooth.

Again, Using the prime-number basis (-1)^{e_1}2^{e_2}\cdots, generate a corresponding vector

\mathbf y | \Delta = \left(\begin{array}{cccccc|c}0 & 2 & 5 & 1 & \cdots & 0 & \Delta\end{array} \right)

The vector is chosen such that \Delta corresponds to the product of primes not represented in the chosen basis. In other words, (-1)^0 \cdot 2^2 \cdot 3^5 \cdot 7^1 \cdots > \sqrt{p} for the vector above, or equivalently, \Delta < \sqrt p.

Again, we generate powers g^1, g^2, ... and check if they have the same property as above and solved as the normal approach.

A | \mathbf v^T | \mathbf d^T = \left( \begin{array}{cccccc|c|c} 1 & 0 & 3 & 0 & \cdots & 0 & \mathbf{5} & \delta_1 \\ 0 & 3 & 2 & 2 & \cdots & 1 & \mathbf{13}& \delta_2 \\ 0 & 1 & 2 & 1 & \cdots & 0 & \mathbf{65}& \delta_3 \\ \vdots & \vdots & \vdots & \vdots & & \vdots & \vdots\\ 0 & 2 & 0 & 0 & \cdots & 0 & \mathbf{13121}& \delta_B \end{array}\right)

Find \mathbf xA = \mathbf y. Then compute \mathbf x \cdot \mathbf v to find x+j. There is a catch here. It will be correct if and only if

\prod \delta_i ^{x_i} \bmod p = \Delta.

It remains to see if this actually improves over the normal approach.

Tests with sage.

Suggestion for proof of retrievability

(Originally posted here)


In this scenario, A and B wants to distribute segments of their data (preferably encrypted). We do not care about coverage, we are trying to maximize the amount of remotely stored data. The game is essentially that any peer, will try to minimize their own storage requirements and thus maximize other peers’.


Two peers A and B will repeatedly negotiate upon storage space according to the follwing protocol, where A is prover and B verifier. Let S(B,A) be the set of segments owned by B and stored by A. Moreover, let s \in S(B,A) be an abritrary segment and let \text{sig}(s) denote the signature of s.

1. B picks a random string r of length n bits (equal to the length of the signature)
2. B asks A to send the closests segment from S(B,A), i.e., such that \| r - \text{sig}(s) \|_1 is minimized.
3. B verifies the segment s via the signature. Then, using the distance \| r - \text{sig}(s) \|_1, B can estimate the number of segments stored by A.

Repeat until the variance is low enough. The procedure is performed both ways.

Approximating \delta

Intuitively, we can think of S(B,A) as a random (non-linear) code \mathcal{C} over \mathbb{F}_2^n and the task is to find a minimum-weight codeword in \mathcal{C} + r = \{ c + r | c \in \mathcal C\}. Then N corresponds to the number of codewords in \mathcal{C}.

Assume that A has stored N segments. Without loss of generality, we can assume that r is the all-zero string, hence transforming the problem to finding the minimum weight of $n$ binomial variables. Let X_1, X_2, \dots, X_N \in \textsf{Bin}(n, \frac12) and Z := \min(X_1, X_2, \dots, X_n). Then with high probability and for larger values of N,

\frac n2 - \sqrt{\frac n3 \log N} \geq E(Z) \geq \frac n2 - \sqrt{\frac n2 \log N}.

The above can be obtained by Gaussian approximation and the conventional expectation of the minimum of Gaussian random variables. We omit the proof here. Of course, this can be pre-computed for reasonable values of N.

Assume there exists a peer A with infinite computing power that tries to minimize its required storage for a given minimum distance d. Done in an optimal manner, A will pick equidistant points in the space \mathbb{F}_2^n. This corresponds to a perfect error-correcting code with minimum distance $d$. The covering-coding bound states that d\leq n-k. For a perfect code, which has minimal size, we have d = n-k. Therefore, the size of the code is |\mathcal C| = 2^{n-d}, which is also the number of elements A needs to store.

A glaring problem is that A cannot arbitrarily pick points (since points must be valid signatures). The best A can do is to decide upon a perfect code \mathcal{C} (or pick one that is closest to the current set of points — which is a very hard problem). Then, A will look for points v and v' that map to the same codeword c \in \mathcal{C} and discard the point that is the farthest away from c. For larger n and smaller d, it is not realistic to achieve the covering-coding bound since there in reality are not that many shared segments. So, even given infinite computation, A will not be able to sample points until a perfect code is achieved.

The covering-coding bound serves as a crude estimation on how to set the parameters.


In Figure 1, we see that the upper bound for expected minimum distance approximates N for values larger than 2^7 and n = 256.


Assume that A send a signature s along with the segment. Let

\delta(r,s) = \|r - \text{sig}(s)\|_1 - \frac n2.

B can now use different bounds at its own choice. Assume that \delta(r,s) = 100 - \frac{256}{2}. Then, using the expression

N \approx \exp\left(-\frac 2n \cdot \delta(r,s) \cdot \text{abs}\left(\delta(r,s)\right) \right)

B determines that A most likely has at least 458 segments with high probability. Using the lower bound, A has at least 9775 segments.

So, what is all this?

The intention is to create a simplistic protocol for asserting that shared data actually is retrievable, based on a probabilistic model. This could serve as basis for shared storage, in which segments are distributed. Even if the storer does not have access to the segments it has distributed, it can use the protocol to verify that it can retrieve them without actually doing so.