Finding magic hashes with hashcat

During the last months, the problem of finding magic hashes for various hash functions has had a renaissance. After this tweet, the over-a-decade-old problem received new attention from a few people in the security community and, as a consequence, a flurry of new magical hashes were found.

Magical hashes have their roots in PHP. In particular, it all revolves around the improper use of the equality operator ==. PHP uses implicit typing, e.g., if a string has the format 0e[digits], then it is interpreted as the scientific notation for 0 \times 10^{\texttt{[digits]}}. This has some undesired side effects: when comparing hashes from password input it causes a small but non-negligible subset of elements that are all equal under the == operation.

The probability of a hash being magical is easy to compute if we can assume that the hash function behaves completely at random. Let the H : A \rightarrow B be a hash function with l-hexdigit long output. If we disregard all but the ones starting with 0e..., we have

\mathbb{P}(H(x) \text{ is magical}) = \left(\frac{1}{10}\right)^2 \cdot \left(\frac{10}{16}\right)^l

for an x randomly selected over all strings of any length.

inverse_prob_hashfunction

The inverse probability amounts to the number of trials (or hash-function oracle calls) required on average to find a magic hash. For instance, SHA-256, which has an output length corresponds to about 2^{50} calls.

Modifying hashcat kernels

We will take SHA-224 as an example, since it is a bit easier to find magic hashes for. First, we are going to define two masks:

The first one returns 0 if the hex representation contains nothing but decimal digits:

#define  IS_MAGIC(x)    ((x & 0x88888888) & (((x & 0x22222222) >> 1 | (x & 0x44444444) >> 2) & ((x & 0x88888888) >> 3)) << 3)

The second one does the same as the first, but it forces the first two hex digits to 0e.

#define  IS_MAGIC_H(x)  ((x & 0x00888888) & (((x & 0x00222222) >> 1 | (x & 0x00444444) >> 2) & ((x & 0x00888888) >> 3)) << 3) | ((x & 0xff000000) ^ 0x0e000000)

We add these lines to inc_hash_sha224.h. Then, we locate the function sha224_final_vector in inc_hash_sha224.cl.

DECLSPEC void sha224_final_vector (sha224_ctx_vector_t *ctx)
{
  MAYBE_VOLATILE const int pos = ctx->len & 63;  
  append_0x80_4x4 (ctx->w0, ctx->w1, ctx->w2, ctx->w3, pos ^ 3);  
  if (pos >= 56)
  {
     sha224_transform_vector (ctx->w0, ctx->w1, ctx->w2, ctx->w3, ctx->h);
     ctx->w0[0] = 0;
     ctx->w0[1] = 0;
     ctx->w0[2] = 0;
     ctx->w0[3] = 0;
     ctx->w1[0] = 0;
     ctx->w1[1] = 0;
     ctx->w1[2] = 0;
     ctx->w1[3] = 0;
     ctx->w2[0] = 0;
     ctx->w2[1] = 0;
     ctx->w2[2] = 0;
     ctx->w2[3] = 0;
     ctx->w3[0] = 0;
     ctx->w3[1] = 0;
     ctx->w3[2] = 0;
     ctx->w3[3] = 0;
  }  
  ctx->w3[2] = 0;
  ctx->w3[3] = ctx->len * 8;  
  sha224_transform_vector (ctx->w0, ctx->w1, ctx->w2, ctx->w3, ctx->h);

  // Add these lines
  ctx->h[0] = IS_MAGIC_H(ctx->h[0]); // first block starts with 0e
  ctx->h[1] = IS_MAGIC(ctx->h[1]);
  ctx->h[2] = IS_MAGIC(ctx->h[2]);
  ctx->h[3] = IS_MAGIC(ctx->h[3]);
  ctx->h[4] = IS_MAGIC(ctx->h[4]);
  ctx->h[5] = IS_MAGIC(ctx->h[5]);
  ctx->h[6] = IS_MAGIC(ctx->h[6]);
}

Finally, we need to modify the corresponding module m01300_a3_pure.cl. More specifically, we modify the following function:

KERNEL_FQ void m01300_sxx (KERN_ATTR_VECTOR ())
{
  /**
   * modifier
   */

  const u64 lid = get_local_id (0);
  const u64 gid = get_global_id (0);

  if (gid >= gid_max) return;

  /**
   * digest
   */

  const u32 search[4] =
  {
    digests_buf[digests_offset].digest_buf[DGST_R0],
    digests_buf[digests_offset].digest_buf[DGST_R1],
    digests_buf[digests_offset].digest_buf[DGST_R2],
    digests_buf[digests_offset].digest_buf[DGST_R3]
  };

  /**
   * base
   */

  const u32 pw_len = pws[gid].pw_len;

  u32x w[64] = { 0 };

  for (u32 i = 0, idx = 0; i < pw_len; i += 4, idx += 1)
  {
    w[idx] = pws[gid].i[idx];
  }

  /**
   * loop
   */

  u32x w0l = w[0];

  for (u32 il_pos = 0; il_pos < il_cnt; il_pos += VECT_SIZE)
  {
    const u32x w0r = words_buf_r[il_pos / VECT_SIZE];

    const u32x w0 = w0l | w0r;

    w[0] = w0;

    sha224_ctx_vector_t ctx;

    sha224_init_vector (&ctx);

    sha224_update_vector (&ctx, w, pw_len);

    sha224_final_vector (&ctx);

    const u32x r0 = ctx.h[0];
    const u32x r1 = ctx.h[1];
    const u32x r2 = ctx.h[2];
    const u32x r3 = ctx.h[3];
    const u32x r4 = ctx.h[4];
    const u32x r5 = ctx.h[5];
    const u32x r6 = ctx.h[6];
    
    // this is one hacky solution
    if ((r4 == 0) && (r5 == 0) && (r6 == 0)) {
       COMPARE_S_SIMD (r0, r1, r2, r3);
    }
  }
}

This assumes no vectorization. Now build hashcat. Since all magic hashes will be mapped to the all-zero sequence, we run it as follows:

$ ./hashcat.bin -m 1300 00000000000000000000000000000000000000000000000000000000 -a 3 --self-test-disable --keep-guessing

--keep-guessing causes hashcat to continue looking for hashes, even after one has been found. The --self-test-disable is needed, since we have modified how SHA-224 functions, and the testcases defined in hashcat will consequently fail.

Running it on AWS

TBA?

Results

Relatively quickly, we found several magic hashes using this modification for SHA-224:

noskba6h0:0e615700362721473273572994672194243561543298826708511055
Ssrodxcm0:0e490606683681835610577024835460055379837761934700306599
i04i19pjb:0e487547019477070898434358527128478302010609538219168998
Fuq0guvec:0e469685182044169758492939426426028878580665828076076227
Sjv2n8fjg:0e353928003962053988403389507631422927454987073208369549
L0gg4bvt5:0e730381844465899091531741380493299916620289188395999379
7v2k2to5o:0e676632093970918870688436761599383423043605011248525140

For SHA-256:

Sol7trnk00:0e57289584033733351592613162328254589214408593566331187698889096

For a more comprehensive list of hashes, refer to this Github repo.

Thanks to @0xb0bb for providing crucial help with computational resources.

Advertisements

On the hardness of finding minimal-weight codewords in a QC-MDPC code


Problem 1  (Knapsack)

We want to maximize the value of a subset of items such that the weight of the set is no larger than weight W.

\sum_{i \in \mathcal{S}} w_i x_i \leq W
\max \sum_{i \in \mathcal{S}} v_i x_i


Let us first study another problem. It is easy to see that this problem is at least as hard as Problem 1.


Problem 2  (Auxiliary problem)

\max x_1x_2x_3 + x_2x_5 + x_4x_6x_7 + \cdots
\|\begin{matrix} x_1 & x_2 & \cdots & x_k \end{matrix} \|_1 = m, x_i \in \{0, 1\}


We can solve Problem 1 using an oracle for Problem 2. For a knapsack instance with an item of weight w_i and weight v_i, we create a v_i terms \cdot x_1x_2\cdots x_{w_i}. The x_i must be unique for every item. This is done for every item in the knapsack. So, Problem 2 is NP-hard.

Now, we want to build some theory towards polynomials. Consider the following construction. Assign each x_i with a corresponding monomial (x - a_i). Now assume that we want to represent the term x_1 x_2 x_3. Define p(x) = x^p + 1 \in \mathbb{F}_q to be a product \prod_{i=0}^p(x-a_i). p is a prime number.

We map the term x_1 x_2 x_3 to the polynomial

Q_1(x) = p(x) \cdot \left[(x-a_1)(x-a_2)(x-a_3)\right]^{-1}

This means that Q_1(x) \cdot (x-a_1)(x-a_2)(x-a_3) = 0 \bmod p(x). To consider all terms jointly, pick a number large number d. Then, we create a polynomial using the terms Q_2(x) = p(x) \cdot \left[(x-a_2)(x-a_5)\right]^{-1}Q_3(x) = p(x) \cdot \left[(x-a_4)(x-a_6)(x-a_7)\right]^{-1} and so on. We assume there are n terms.

Then, the polynomial Q(x) is constructed as Q(x) = \sum_{i=1}^n x^{di} \cdot Q_i(x). Since d was a large number, it means that the terms of Q_i(x) and Q_{i+1}(x) will not interfere.

We now turn this problem into an instance of finding a low-weight codeword in a QC-MDPC code.


Problem 3  (Low-weight codeword of QC-MDPC code)

Let G be the generator of a QC-MDPC code. Find a non-zero codeword with minimal weight, i.e., minimize \| \mathbf u G \|_1.


There is a bijective mapping from a quasi-cyclic code to a polynomial, i.e., G(x) generates a code as well. So, Q(x) generates a quasi-cyclic code. Let us first assume we have access to an oracle for Problem 2.

Indeed, a optimal solution to Problem 2 will give rise to a codeword in the corresponding code of low weight. Assume that the optimum is x_1x_2x_3 + x_2x_5 + x_4x_6x_7 + \cdots = b. This means that u(x) is of degree m and that \|u(x) G(x)\|_1 \leq p(n-b), i.e., u(x) G(x) has at most p (n-b) non-zero coefficients.

Using an oracle for Problem 3, we find a minimal-weight codeword for the generator Q(x).

Experiments with index calculus

In index calculus, the main problem is to represent powers of g in a predefined prime-number basis. We are interested in unravel x from h = g^x\bmod p.

Normal approach

First, we find some offset hg^j = g^{x+j}\bmod p such that the factorization is B-smooth.

Using the prime-number basis (-1)^{e_1}2^{e_2}\cdots, generate a corresponding vector

\mathbf y = \begin{pmatrix}1 & 0 & 3 & 0 & \cdots & 1\end{pmatrix}

Then, we generate powers g^1, g^2, ... and check if they have the same property. The results are put into a matrix after which one performs Gaussian eliminiation over \mathbb{Z}_{p-1}.

A | \mathbf v^T = \left( \begin{array}{cccccc|cc}0 & 3 & 2 & 2 & \cdots & 1 & \mathbf{13} \\ 1 & 0 & 2 & 1 & \cdots & 0 & \mathbf{112} \\ 0 & 5 & 2 & 0 & \cdots & 0 & \mathbf{127} \\\vdots & \vdots & \vdots & \vdots & & \vdots & \vdots \\ 1 & 1 & 0 & 0 & \cdots & 0 & \mathbf{67114}\end{array}\right)

\mathbf xA = \mathbf y. Then compute \mathbf x \cdot \mathbf v to find x+j.

Different approach

Almost like in the normal approach, we find some offset hg^j = g^{x+j} \bmod p such that the factorization of at least fraction of hg^j \bmod p greater than \sqrt{p} is B-smooth.

Again, Using the prime-number basis (-1)^{e_1}2^{e_2}\cdots, generate a corresponding vector

\mathbf y | \Delta = \left(\begin{array}{cccccc|c}0 & 2 & 5 & 1 & \cdots & 0 & \Delta\end{array} \right)

The vector is chosen such that \Delta corresponds to the product of primes not represented in the chosen basis. In other words, (-1)^0 \cdot 2^2 \cdot 3^5 \cdot 7^1 \cdots > \sqrt{p} for the vector above, or equivalently, \Delta < \sqrt p.

Again, we generate powers g^1, g^2, ... and check if they have the same property as above and solved as the normal approach.

A | \mathbf v^T | \mathbf d^T = \left( \begin{array}{cccccc|c|c} 1 & 0 & 3 & 0 & \cdots & 0 & \mathbf{5} & \delta_1 \\ 0 & 3 & 2 & 2 & \cdots & 1 & \mathbf{13}& \delta_2 \\ 0 & 1 & 2 & 1 & \cdots & 0 & \mathbf{65}& \delta_3 \\ \vdots & \vdots & \vdots & \vdots & & \vdots & \vdots\\ 0 & 2 & 0 & 0 & \cdots & 0 & \mathbf{13121}& \delta_B \end{array}\right)

Find \mathbf xA = \mathbf y. Then compute \mathbf x \cdot \mathbf v to find x+j. There is a catch here. It will be correct if and only if

\prod \delta_i ^{x_i} \bmod p = \Delta.

It remains to see if this actually improves over the normal approach.

Tests with sage.

Suggestion for proof of retrievability

(Originally posted here)

Scenario

In this scenario, A and B wants to distribute segments of their data (preferably encrypted). We do not care about coverage, we are trying to maximize the amount of remotely stored data. The game is essentially that any peer, will try to minimize their own storage requirements and thus maximize other peers’.

Protocol

Two peers A and B will repeatedly negotiate upon storage space according to the follwing protocol, where A is prover and B verifier. Let S(B,A) be the set of segments owned by B and stored by A. Moreover, let s \in S(B,A) be an abritrary segment and let \text{sig}(s) denote the signature of s.

1. B picks a random string r of length n bits (equal to the length of the signature)
2. B asks A to send the closests segment from S(B,A), i.e., such that \| r - \text{sig}(s) \|_1 is minimized.
3. B verifies the segment s via the signature. Then, using the distance \| r - \text{sig}(s) \|_1, B can estimate the number of segments stored by A.

Repeat until the variance is low enough. The procedure is performed both ways.

Approximating \delta

Intuitively, we can think of S(B,A) as a random (non-linear) code \mathcal{C} over \mathbb{F}_2^n and the task is to find a minimum-weight codeword in \mathcal{C} + r = \{ c + r | c \in \mathcal C\}. Then N corresponds to the number of codewords in \mathcal{C}.

Assume that A has stored N segments. Without loss of generality, we can assume that r is the all-zero string, hence transforming the problem to finding the minimum weight of $n$ binomial variables. Let X_1, X_2, \dots, X_N \in \textsf{Bin}(n, \frac12) and Z := \min(X_1, X_2, \dots, X_n). Then with high probability and for larger values of N,

\frac n2 - \sqrt{\frac n3 \log N} \geq E(Z) \geq \frac n2 - \sqrt{\frac n2 \log N}.

The above can be obtained by Gaussian approximation and the conventional expectation of the minimum of Gaussian random variables. We omit the proof here. Of course, this can be pre-computed for reasonable values of N.

Assume there exists a peer A with infinite computing power that tries to minimize its required storage for a given minimum distance d. Done in an optimal manner, A will pick equidistant points in the space \mathbb{F}_2^n. This corresponds to a perfect error-correcting code with minimum distance $d$. The covering-coding bound states that d\leq n-k. For a perfect code, which has minimal size, we have d = n-k. Therefore, the size of the code is |\mathcal C| = 2^{n-d}, which is also the number of elements A needs to store.

A glaring problem is that A cannot arbitrarily pick points (since points must be valid signatures). The best A can do is to decide upon a perfect code \mathcal{C} (or pick one that is closest to the current set of points — which is a very hard problem). Then, A will look for points v and v' that map to the same codeword c \in \mathcal{C} and discard the point that is the farthest away from c. For larger n and smaller d, it is not realistic to achieve the covering-coding bound since there in reality are not that many shared segments. So, even given infinite computation, A will not be able to sample points until a perfect code is achieved.

The covering-coding bound serves as a crude estimation on how to set the parameters.

Simulating

In Figure 1, we see that the upper bound for expected minimum distance approximates N for values larger than 2^7 and n = 256.

rZ9aT

Assume that A send a signature s along with the segment. Let

\delta(r,s) = \|r - \text{sig}(s)\|_1 - \frac n2.

B can now use different bounds at its own choice. Assume that \delta(r,s) = 100 - \frac{256}{2}. Then, using the expression

N \approx \exp\left(-\frac 2n \cdot \delta(r,s) \cdot \text{abs}\left(\delta(r,s)\right) \right)

B determines that A most likely has at least 458 segments with high probability. Using the lower bound, A has at least 9775 segments.

So, what is all this?

The intention is to create a simplistic protocol for asserting that shared data actually is retrievable, based on a probabilistic model. This could serve as basis for shared storage, in which segments are distributed. Even if the storer does not have access to the segments it has distributed, it can use the protocol to verify that it can retrieve them without actually doing so.

Writeup for Snurre128

The intended solution for Snurre128 is as follows. We note that the non-linear function is almost linear, the higher-order terms have degree 5.

return v[0] ^ v[1] ^ v[2] ^ v[31] ^ \
       v[1]&v[2]&v[3]&v[64]&v[123] ^ \
       v[25]&v[31]&v[32]&v[126]

Whenever a higher-order term becomes non-zero, we will get non-linear behaviour. Also note that since there are two of these terms, they will cancel out with some small probability. We write f = L(x) + P(x) + Q(x). Because all indices in the non-linear terms (\{1,2,3,64,123\} \cap \{25,31,32,126\} = \emptyset) are disjoint, we get

\begin{array}{rcl}\mathbb{P}(P(x) + Q(x) = 1) &=& \mathbb{P}(P(x) \neq Q(x))\\ &=& \mathbb{P}(P(x) = 1) + \mathbb{P}(Q(x) = 1) - 2\cdot \mathbb{P}(P(x) = 1 \land Q(x) = 1)\\ &=& 2 \cdot 2^{-5} - 2 \cdot 2^{-10}\\& =& \frac{31}{512} \end{array}

Let k be the dimension of the state and n be the length of the keystream. Note that there exists a big-ass matrix \mathbf{G} \in \mathbb{F}_2^{k\times n} such that

x\mathbf{G} + e = v \in \mathbb{F}_2^n,

where e is governed by the non-linearity of f. The expected Hamming weight of e is

n \cdot \mathbb{P}(P(x) + Q(x) = 1).

It is possible to solve this using a Walsh transform, but it is rather expensive requiring \mathcal{O}(k \cdot 2^k) computation. Employing BKW will significantly reduce the complexity, by transforming the problem into a problem of smaller dimension at the expense of higher error rate.

Another solution is to treat this as a decoding problem over a random code. There is support for so-called information-set decoding in sage.

from sage.coding.information_set_decoder import LeeBrickellISDAlgorithm

First we generate the generator matrix from the cipher, according to the above.

# define a code from big-ass matrix
C = codes.LinearCode(G.transpose())

The keystream is a vector \mathbb{F}_2^n:

v = vector(GF(2), [0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1])

Using Lee-Brickell’s algorithm (a rather simple and not the most efficient algorithm of today), we can now decode:

num_errors = 118
A = LeeBrickellISDAlgorithm(C, (num_errors, num_errors))
A.calibrate()
x = A.decode(v)
# for the record, this is the error vector
e = v + A*x

Once this is complete, we can solve a set of linear equations to unravel the state.

Also consider to read this writeup by @hellman 🙂

Solving snurre80 @ MidnightsunCTF

Note: I was the author of this challenge. This is my intended solution. Some details have been left out.

Two teams solved this one: TokyoWesterns and LC↯BC, in that chronological order.

class Snurre80:

    """
        Snurre80 is a proprietary stream cipher designed for
        low-power devices, i.e., in Internet of Things contexts.
        More importantly, it offers an incredibly high security
        level of 2^80 (who would need more?).

        It has the following structure:

                      +--------+----+------+
                      |        |    |      |
                    +---+---+-   -+---+    |
                    |   |   | ... |   |  c
                    +-----------------+

        Snurre80 is resistant to distinguishing attacks, since
        it uses a non-linear(tm) boolean function f as filtering
        output function.

        Snurre80 is quantum resistant and blockchain ready. It
        is the stream cipher of the future. It is 100 % cyber.
        We charge a reasonble fee of $0.00001 / encrypted bit.

                                            -- The Designers
    """

    def __init__(self, key):
        self.state = key
        self.mask = 1284576224436276739441733
        self.nbits = self.mask.bit_length()-1

    def output(self):
        var = bin(self.state)[2:].zfill(self.nbits)
        v = [int(v) for v in var]
        return v[0] ^ v[1] ^ v[2] ^ v[31] ^ \
               v[1]&v[2]&v[3] ^ \
               v[25]&v[31]&v[32]&v[33]&v[34]

    def __str__(self):
        j = 0
        poly = []
        x = self.mask
        while x > 0:
            if x & 1:
                poly = ["x^{}".format(j)] + poly
            x >>= 1
            j += 1
        return " + ".join(poly)

    def keystream(self, n):
        for _ in xrange(n):
            self.state = (self.state  self.nbits
            if xor != 0:
                self.state ^= self.mask
            yield self.output()

# Generate a sequence of 800 bits, with a random key.
key = int(os.urandom(10).encode('hex'), 16)
cipher = Snurre80(key)
z = [c for c in cipher.keystream(800)]

# Additionally, this may help you solve the CAPTCHA.
def solve_proof_of_work(prefix):
    i = 0
    while True:
        if sha256(prefix+str(i)).digest()[:3] == "\x00\x00\x00":
            return str(i)
        i += 1

The feedback polynomial of the stream cipher is

\begin{array}{rcl} P(x) & = & x^{80} + x^{76} + x^{66} + x^{64} + x^{58} + x^{54} + x^{50} + x^{42} \\ & + & x^{38} + x^{34} + x^{24} + x^{22} + x^{18} + x^{10} + x^6 + x^2 + 1 \end{array}

To create a distinguisher, we may use the feedback polynomial as parity check. Unfortunately, the output is not linear, but we can approximate it as a linear function. However, by the piling-up lemma, the bias quickly becomes small since there are many terms. So, we need to find a low-weight polynomial multiple to avoid getting too small bias: there exists q(x) such that q(x) P(x) = u(x), where u(x) is of low weight.

But wait. Every power is even, so it means it is a square P(x) = p^2(x). Turns out we can easily find the square root of it; by simply dividing every power with 2, we can find p(x) from P(x).

\begin{array}{rcl} p(x) & = & x^{40} + x^{38} + x^{33} + x^{32} + x^{29} + x^{27} + x^{25} + x^{21} \\ & + & x^{19} + x^{17} + x^{12} + x^{11} + x^{9} + x^{5} + x^3 + x + 1 \end{array}

Now, we can solve the problem of finding a low-weight multiple in a smaller dimension. This can be done in a couple of seconds. In fact, if you stalked my Github repos you will find that I have written code for this exact problem. The idea behind it is to generate a long list of powers x^i \bmod P(x) and use a generalized birthday attack to find x^{i_1} + x^{i_2} + x^{i_3} + x^{i_4} = 0 \bmod P(x). There are several papers on it. This is not as difficult as it may be time consuming to write functional code.

Nonethless, one suitable parity check is

x^{17399} + x^{13567} + x^{4098} + 1

But it will not work for the polynomial P(x)? Well, that if no concern. If q(x) p(x) = u(x), then q^2(x) p^2(x) = q^2(x) P(x) = u^2(x). Since u(x) has weight w, u^2(x) will have weight w.

    p = [17399, 13567,  4098]
    S = sum(
        z[i + 2*p[0]] ^ z[i + 
        2*p[1]] ^ z[i + 
        2*p[2]] ^ z[i] for i in range(0, 1000))
    return  S < 430 # appropriate magic constant

Running it for every sequence, we can efficiently decide the source. This gives the flag

midnight{1z_3z_2_BR34K_W1D_L0W_w31gHTz!!!}

Setting up Google authenticator along SSH auth in macOS

This is a guide on how to setup the use of Google Authenticator along with public-key authentication.

First, we clone the Google Authenticator PAM module from Github:

$ git clone https://github.com/google/google-authenticator-libpam.git

To build it, we need a few packages which are not included by default in macOS.

$ brew install autoconf automake libtool

To be able to get QR codes without revealing secrets to Google, you can install
libqrencode:

$ brew install libqrencode

Now we are ready to build Google Authenticator.

$ ./bootstrap
$ ./configure
$ make
$ sudo make install

To install the PAM module, invoke

$ sudo cp /usr/local/lib/security/pam_google_authenticator.so /usr/lib/pam/

Then, we add the line

auth required pam_google_authenticator.so

to the configuration file /etc/pam.d/sshd. It will look something like this:

# sshd: auth account password session

# Not used by me!
# auth       optional       pam_krb5.so use_kcminit
# auth       optional       pam_ntlm.so try_first_pass
# auth       optional       pam_mount.so try_first_pass
# auth       required       pam_opendirectory.so try_first_pass

# Relevant methods of authentication:
auth       required       pam_google_authenticator.so
account    required       pam_nologin.so
account    required       pam_sacl.so sacl_service=ssh
account    required       pam_opendirectory.so
password   required       pam_opendirectory.so
session    required       pam_launchd.so
session    optional       pam_mount.so

I have removed the alternative methods of authentication. In /etc/ssh/sshd_config, we set

PubkeyAuthentication yes
ChallengeResponseAuthentication yes
UsePAM yes
AuthenticationMethods publickey,keyboard-interactive:pam

This will force the user to prove ownership of a valid SSH-key and along with a verification code from Google Authenticator.

Now we are ready to setup the two-factor authentication. Running

$ /usr/local/bin/google-authenticator

generate a shared secret and provide you with a QR code to be used with your phone. You can use TOTP or OTP verification codes. I suggest going with TOTP and small time windows.

login

Adding some notifications

Another good thing is to add some notifications to your ~/.ssh/rc file (also — if you are in the US, there are a lot of services available for sending SMS, which can be handy, but remember not to send any sensitive data through third parties). For instance:

# Get and sanitize sender address, probably not needed...
ip=`echo $SSH_CONNECTION | cut -d " " -f 1`
ip=${ip//[^a-zA-Z0-9:.]/}

# Sanitize user, probably not needed...
user=`whoami`
user=${user//[^a-zA-Z0-9_]/}

# Show notification
osascript -e 'display notification "SSH connection as '$user' from '$ip'" with title "SSH"'

# Send email
echo "SSH connection as '$user' from '$ip'" | sendmail me@mydomain.com

This will give a notification to you directly if you are at the computer being connected to. Additionally, it will send an email to you just in case you are AFK. The regexp are there to mitigate possible command injection, though I doubt it is possible unless there is severe bug in SSHD and the script above is running as a user with higher privileges. On the other hand, santitation is never bad if it negligible in terms of computations and you are doing it right 🙂

Another way of sending notifications is to use a Telegram bot. See below for an example output.

telegram.png

For this purpose, one might use e.g. telegram-send.