F4 Phantom

With F-4 Phantom II, we want to break the encryption! Please help us!!

‍‍‍nc 54979

We get a key-generation function, as seen below.

def gen_pubkey(e, p):
    assert gmpy.is_prime(p) != 0
    B = bin(p).strip('0b')
    k = random.randrange(len(B))
    k, l = min(k, len(B) - k), max(k, len(B) - k)
    assert k != l
    BB = B[:k] + str(int(B[k]) ^ 1) + B[k+1:l] + str(int(B[l]) ^ 1) + B[l+1:]
    q = gmpy.next_prime(int(BB, 2))
    assert p != q
    n = p*q
    key = RSA.construct((long(n), long(e)))
    pubkey = key.publickey().exportKey("PEM")
    return n, p, q, pubkey

So, pq = p \cdot (p \pm 2^k \pm 2^l + \Delta), where \Delta = \texttt{next\_prime}(p \pm 2^k \pm 2^l) - (p \pm 2^k \pm 2^l). The density of primes (well, asymptotically) is \pi(n) \sim \frac{x}{log x}, so we expect \Delta to be quite small.

We define a := p \pm 2^k \pm 2^l. Now, we solve the following equation

x \cdot (x \pm 2^k \pm 2^l + \Delta) = N \iff x \cdot (x + a) - N = 0 \iff x = \frac{a}{2} \pm \sqrt{\frac{a^2}{4} + N}

Now, we know that the roots should be numerically close to p and q. So, we may simply call \texttt{next\_prime} on them and check if they divide N. Note that we need to do this for different hypotheses (different k).

import gmpy, base64
from Crypto.PublicKey import RSA

def solve(a):
   a2 = a ** 2/4
   L = gmpy.sqrt(a2 + N) # L > a/2
   C_1 = gmpy.next_prime(-a/2 + L)
   if N % C_1 == 0: return C_1
   C_2 = gmpy.next_prime(a/2 + L)
   if N % C_2 == 0: return C_2
   return False

# just some public key
pem_data1 = '''

pub =  RSA.importKey(base64.b64decode(pem_data1))
N = pub.n
m = (len(bin(N))+1)/2-2
print '[ ] Solving equations...'

for mm in range(m, m+3):
  for j in range(2, m/2+1):
    k = mm - j
    l = j
    retval = solve(2**(k-1) + 2**(l-1))
    if retval: 
      q = retval
    retval = solve(2**(k-1) - 2**(l-1))
    if retval: 
      q = retval

p = N / q
print '[+] Found p = {}... and q = {}...'.format(str(p)[:40], str(q)[:40])
λ python
[ ] Solving equations...
[+] Found p = 1381282812140256921334162381757305071089... and q = 1381282811302268925712750063033928508701...

Now, it turns out that \text{gcd}(e, (p-1)(q-1)) \neq 1. We cannot decrypt the message! If we look closely, we see that \text{gcd}(e, (p-1)) = 1. Hmm… so, we can decrypt the message c^d = m \mod p. And the message is the flag, which is constant. What if we sample two ciphertexts encrypting the same message but under different keys? Yes… so, basically, we have m \mod p_1 and m \mod p_2. Then, we can combine it using CRT! This is easy!

import gmpy
from Crypto.Util.number import long_to_bytes

enc1 = 63188108518214820361083256140053967663112132356420859206347143811869148973386950682507343981284848159232100220605963292020722612854139075311063776086523677522160515093598202087755508512714885329251980275085813640578839753523295579661559881983237388178475574713319340090857313275483787001349560782781513895950569634984688753665
q1 = 33452043035349425454164058954054458228134102234436666511159820871022348004023966976017694615018111901825497223286811050478588079083387098695346723120647425399335947
p1 = 33452043035349425454164058954054813129854949698738692548175391185736387867969615080539316436404430573352896343366560167202569408949342999951142479436993639588965567
e1 = 3562731839

enc2 = 162331112890791758781057932826106636167735138703054666826574266304486608255768782351247144197937186145526374008317633308191215438756014724244242639042178681790803086016105986729819920057318114565244866632716401408212076926367974085730803964312385570202673351687846963322223962280999264618753873289802828995706526584379102468
q2 = 1381282812140256921334162381757305071089685391539884775199049048751523002134390007428038278188586333291047282664366863789424963612326970767160301759006158962675449
p2 = 1381282811302268925712750063033928508701820008572424411412024462643800411901779755548441592138469189655615818433739872652769585433967353091413641137354055362924329
e2 = 197840663

d1 = gmpy.invert(e1, p1-1)
d2 = gmpy.invert(e2, p2-1)

assert(gmpy.gcd(e1, p1-1) == 1)
assert(gmpy.gcd(e2, p2-1) == 1)

m = (pow(enc1, d1, p1) * p2 * gmpy.invert(p2, p1) + pow(enc2, d2, p2) * p1 * gmpy.invert(p1, p2)) % (p1*p2)
print long_to_bytes(m)


********** great job! the flag is: ASIS{Still____We_Can_Solve_Bad_F4!}


A fine OTP server

Connect to OTP generator server, and try to find one OTP.

nc 35156

We get the code for generating OTPs:

def gen_otps():
    template_phrase = 'Welcome, dear customer, the secret passphrase for today is: '

    OTP_1 = template_phrase + gen_passphrase(18)
    OTP_2 = template_phrase + gen_passphrase(18)

    otp_1 = bytes_to_long(OTP_1)
    otp_2 = bytes_to_long(OTP_2)

    nbit, e = 2048, 3
    privkey = RSA.generate(nbit, e = e)
    pubkey  = privkey.publickey().exportKey()
    n = getattr(privkey.key, 'n')

    r = otp_2 - otp_1
    if r < 0:
        r = -r
    IMP = n - r**(e**2)
    if IMP > 0:
    	c_1 = pow(otp_1, e, n)
    	c_2 = pow(otp_2, e, n)
    return pubkey, OTP_1[-18:], OTP_2[-18:], c_1, c_2

We note that \texttt{otp\_1}^3 < N. Because of this, the task is really trivial. We can simply compute the cubic root without taking any modulus into consideration. \texttt{gmpy2.iroot(arg, 3)} can solve this efficiently! This gives us:


Secured OTP server

Connect to OTP generator server, and try to find one OTP.
This is secure than first server 🙂

nc 12431

This is almost identical to the previous, but the OTP is chosen such that it will overflow the modulus when cubed.

    template_phrase = '*************** Welcome, dear customer, the secret passphrase for today is: '
    A = bytes_to_long(template_phrase + '00' * 18)

Note that we can write an OTP as m = A \cdot 2^k + B, where B < 2^k. So, we have m^3 = A^3 + 3A^2D + 3AD^2 + D^3. The observant reader have noticed that if we remove A^3, it will not wrap around N. Now, we can mod out the terms containing A, i.e., find m - A^3 \mod A. Now we are back in the scenario of previous challenge… so, we can use the method from before! Hence, (m - A^3 \bmod A)^{1/3}. Finally, we get



You should solve a DLP challenge, but how? Of course , you don’t expect us to give you a regular and boring DLP problem!

nc 28416

The ciphertext is generated using the following function:

def encrypt(nbit, msg):
    msg = bytes_to_long(msg)
    p = getPrime(nbit)
    q = getPrime(nbit)
    n = p*q
    s = getPrime(4)
    enc = pow(n+1, msg, n**(s+1))
    return n, enc

We have that e = (n+1)^m \mod n^s = n^m + {m \choose {m-1}}n^{m-1} + \cdots + + {m \choose 2}n^2 + {m \choose 1}n + 1. We can write this as {m \choose 1}n + 1 + \mathcal{O}(n^2). Now, we can eliminate the ordo term by taking e \mod n^2 = {m \choose 1}n + 1 \iff e - 1 \mod n^2 = mn. Assuming that m < n, we can determine m by simple division. So, \frac{e - 1 \mod n^2}{n} = m. In Python, we have that

Turns out this is the case, and, hence, we obtain


unsecure ASIS sub-d

ASIS has many insecure sub-domains, but we think they are over HTTPS and attackers can’t leak the private data, what do you think?

So, we get a PCAP. The first thing we do is to extract data binwalk -e a.pcap. This generates a folder with all the certificates used.

Then, we look at the moduli.

for f in $FILES
  openssl x509 -inform der -in $f -noout -text -modulus 

This generates a file with all moduli. Let us try something simple! Common moduli! For each pair, we check if \text{gcd}(N_i, N_j) \neq 1. If so, we have found a factor. Turns out two moduli have a common factor, so we can factor each of them and decrypt their traffic:

p1 = 146249784329547545035308340930254364245288876297216562424333141770088412298746469906286182066615379476873056564980833858661100965014105001127214232254190717336849507023311015581633824409415804327604469563409224081177802788427063672849867055266789932844073948974256061777120104371422363305077674127139401263621 
q1 = 136417036410264428599995771571898945930186573023163480671956484856375945728848790966971207515506078266840020356163911542099310863126768355608704677724047001480085295885211298435966986319962418547256435839380570361886915753122740558506761054514911316828252552919954185397609637064869903969124281568548845615791

p2 = 159072931658024851342797833315280546154939430450467231353206540935062751955081790412036356161220775514065486129401808837362613958280183385111112210138741783544387138997362535026057272682680165251507521692992632284864412443528183142162915484975972665950649788745756668511286191684172614506875951907023988325767 
q2 = 136417036410264428599995771571898945930186573023163480671956484856375945728848790966971207515506078266840020356163911542099310863126768355608704677724047001480085295885211298435966986319962418547256435839380570361886915753122740558506761054514911316828252552919954185397609637064869903969124281568548845615791

We can now generate two PEM-keys

d1 = gmpy.invert(e, (p1 - 1)*(q1 - 1))
key = RSA.construct((long(p1*q1), long(e), long(d1)))
f = open('privkey.pem','w')

Putting it into Wireshark, we obtain two images:


I totally agree.

Alice, Bob and Rob

We have developed a miniature of a crypto-system. Can you break it?
We only want to break it, don’t get so hard on our system!

This is McElice PKC. The ciphertexts are generated by splitting each byte in blocks of four bits. The following matrix is used as public key:

G = \begin{pmatrix}1& 1& 0& 0& 0& 1& 1& 0\\ 1& 1& 1& 1& 1& 1& 1& 1\\ 0& 1& 1& 0& 1& 1& 0& 0 \\ 0& 1& 1& 1& 0& 0& 1& 0 \end{pmatrix}

The ciphertext is generated as \mathbf{m}G + \mathbf{e}, which is a function from 4 bits to a byte. \mathbf{e} is an error (or pertubation) vector with only one bit set. This defines a map f: \mathbb{F}_4 \rightarrow \mathbb{F}_8. So, each plaintext byte is two ciphertext bytes.

We can first create a set of codewords

P = numpy.matrix([[1, 1, 0, 0, 0, 1, 1, 0], [1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 1, 0, 1, 1, 0, 0],[0, 1, 1, 1, 0, 0, 1, 0]])
image = {} # set of codewords
for i in range(0, 2**4):
    C = (numpy.array([int(b) for b in (bin(i))[2:].zfill(4)]) * P % 2).tolist()[0]
    image[int(''.join([str(c) for c in C]), 2)] = i

Then, go through each symbol in the ciphertext, flip all possible bits (corresponding to zeroing out \mathbf{e}) and perform lookup in the set of codewords P (compute the intersection between the Hamming ball of the ciperhext block and P).

f = open('flag.enc','r')
out = ''
for i in xrange(18730/2):
    blocks =
    j = ord(blocks[0])
    C1 = 0
    C2 = 0
    for i in range(0, 8):
        if j ^ 2**i in image:
            C1 = image[j ^ 2**i] << 4
    j = ord(blocks[1])
    for i in range(0, 8):
        if j ^ 2**i in image:
            C2 = image[j ^ 2**i]
    out += chr(C1+C2)

Turns out it is a PNG:


Confidence DS CTF ’17 – Public Key Infrastructure

Log in as admin to get the flag.

nc 1337

This was a fun challenge, because it required many steps to complete. The first thing we notice is that MD5 is being used:

def h(x):
    return int(hashlib.md5(x).hexdigest(), 16)

def makeMsg(name, n):
    return 'MSG = {n: ' + n + ', name: ' + name + '}'

def makeK(name, n):
    return 'K = {n: ' + n + ', name: ' + name + ', secret: ' + SECRET + '}'

def sign(name, n):
    k = h(makeK(name, n))
    r = pow(G, k, P) % Q
    s = (modinv(k, Q) * (h(makeMsg(name, n)) + PRIVATE * r)) % Q
    return (r*Q + s)

Naturally, we cannot register as \texttt{admin}. An immediate conclusion is that we need to some kind of collision attack. But how? We cannot exploit any collision in the \texttt{admin} account. What if we can create a collision in r while keeping s distinct?

from coll import Collider, md5pad, filter_disallow_binstrings
import os

def nullpad(payload):
    return payload + b'\x00' * (64 - len(payload) % 64)

def left_randpad(payload):
    random_string = os.urandom(64)
    return random_string[:len(payload) % 64] + payload

# nullpad the prefix so that it does not interfere
# with the two collision blocks
prefix = nullpad(b'K = {n: ')
suffix = b', name: groci, secret:'

collider = Collider()
c1, c2 = collider.get_last_coll()
cols = collider.get_collisions()

# here are our collisions!
A = next(cols)
B = next(cols)

The above code creates a collision such that k = h(makeK(name, n)) generates the same value. The problem is that the server does not return the signature, but str(pow(sign(name, n), 65537, int(n.encode('hex'), 16))).

To get the signature, which is computed as Z = (r \cdot Q + s)^{65537} \mod M, we need to perform some additional computations. Obviously, we could factor the modulus M and compute \phi(M) and then obtain the signature as easy as Z^{65537^{-1} \mod \phi(M)}… but factoring…


The probability that it is smooth enough is not very high. Also, the number is around 300 digits so not a good candidate for msieve or other factoring software… so what then?

Well… what if 2^{96} \cdot A + c and 2^{96} \cdot B + c both are prime? Then, we can easily recover the signature as Z^{65537^{-1} \mod (M-1)}! No need for time-consuming factoring! Embodied in Python, it could look like this:

import hashlib
from Crypto.Util.number import isPrime

def num2b(i):
    c = hex(i).strip('L')
    c = (len(c) % 2) * '0' + c[2:]
    return c.decode('hex')

# get the payloads
A = int(A[64:64*3].encode('hex'), 16)
B = int(B[64:64*3].encode('hex'), 16)

# append the same data to both payload until
# they resulting numbers are BOTH prime
mul = 2 ** 96
for i in range(1, 2**20, 2):
    if isPrime(AA + i) and isPrime(BB + i):
        print AA+i
        print BB+i

As an example, we obtain

\begin{array}{rcl} n_1 &= &  105830311539247723296176540884596952305269885491629657210626981367998\\ &&347855559972583535368860650411949121405830046936667132754993342040282\\ &&396579458814796410615711380733886667185822027819749278255976029130752\\ &&166929087394904457033345839480235723175883533744993499706397792250975\\ &&3538392909994754306314762383722990106537161406818023\\ \end{array}


\begin{array}{rcl} n_2 &=&  105830311539247723296176540884596952305269885520672956204333681813273\\ &&006022550217898599044674310459977554450027911751595002340337873829980\\ &&015343362188266640170082456168713800027076364038815247707214102650115\\ &&472228269311796278082939271876115926467296299772272093195558205398667\\ &&4873073960726016011965433749481512129478713571026663\\ \end{array}

Putting this into action, we might do something like:

def connect(name, n):
   name_encoded = base64.b64encode(name)
   n_encoded = base64.b64encode(n)
   server = remote('', 1337)
   payload = 'register:' + name_encoded + ',' + n_encoded
   server.send(payload + '\n')
   return pow(int(server.recvline()), modinv(65537, n1-1), n1)

payload1 = nullpad('K = {n: ') + num2b(n1)
payload2 = nullpad('K = {n: ') + num2b(n2)

name = 'groci'
PORT = 1331

assert(h(makeK(name, payload1)) == h(makeK(name, payload2)))
assert(h(makeMsg(name, payload1)) != h(makeMsg(name, payload2)))

# Get first signature
sig = connect(name, payload1)
r1, s1 = sig / Q, sig % Q
z1 = h(makeMsg(name, payload1))

# Get second signature
connect(name, payload2)
sig = pow(int(server.recvline()), modinv(65537, n2-1), n2)
r2, s2 = sig / Q, sig % Q
z2 = h(makeMsg(name, payload2))

# Make sure we got a nonce re-use
assert(r1 == r2)

# OK, use standard nonce re-use technique...
delta = modinv(((s1 - s2) % Q), Q)
k = ( ((z1 - z2) % Q) * delta) % Q
r_inv = modinv(r1, Q)
PRIVATE = (((((s1 * k) % Q) - z1) % Q) * r_inv) % Q

# Now that we know the private key, 
# lets forge/sign the admin account!
name = 'admin'
n = 'snelhest'
k = h(makeK(name, n))

r = pow(G, k, P) % Q
s = (modinv(k, Q) * (h(makeMsg(name, n)) + PRIVATE * r)) % Q
sig = r*Q + s

# connect and login
name_encoded = base64.b64encode(name)
n_encoded = base64.b64encode(n)
server = remote('', 1337)
payload = 'login:' + name_encoded + ',' + n_encoded + ',' + base64.b64encode(num2b(sig))
server.send(payload + '\n')
print server.recvline()

We obtain the following signatures:

r_1 = 455070882754437660339639715779025537190883529105 s_1 = 719850042802710537609987071764516609303635832220
r_2 = 455070882754437660339639715779025537190883529105 s_2 = 584220801769294572739392594733340817152152143208

Now, with our forced nonce re-use, we can use standard techniques to obtain the private key and sign our \texttt{admin} account, which gives the flag


Note to reader: I apologize for the excessive use of memes.

0ctf’17 – All crypto tasks


Just a simple scheme.
nc 8221

The encrypt function in the scheme takes the username as input. It hashes the username with MD5, appends the name to the hash and encrypts with a secret key k, i.e. \textsf{Enc}_k(H_\text{MD5}(\text{username}) \|\text{username}). Then, the secret becomes \text{IV} \| C_1 \| C_2 .... Notably, the first block C_1 contains the hash, but encrypted.

Recall how the decryption is defined:

C_i = \textsf{Dec}_k(C_i) \oplus C_{i-1}, C_0 = \text{IV}

So, what would happen if we input u = H_\text{MD5}(\texttt{admin}) \| \texttt{admin} as username? Then, we have encrypted H_\text{MD5}(u) \| u, but we want only u. As mentioned before, the hash fits perfectly into a single block. So, by removing the \text{IV}C_1 becomes the new \text{IV} (which has no visible effect on the plaintext anymore!). Then, we have \textsf{Enc}_k(H_\text{MD5}(\texttt{admin}) \| \texttt{admin}), which is all what we need.

The flag is flag{Easy_br0ken_scheme_cann0t_keep_y0ur_integrity}.


I swear that the safest cryptosystem is used to encrypt the secret!

We start off analyzing the code. Seeing process(m, k) function, we note that this is actually something performed in \mathbb F_2[x]/P(x) with the mapping that, for instance, an integer 11 is \texttt{1011} in binary, which corresponds to a polynomial x^3 + x + 1. The code is doing (m \oplus k)^2 in GF(2^{256}).

The keygen function repeatedly calls key = process(key, seed). The first value for key is random, but the remaining ones does not. seed remains the same. Define K to be the key and S the seed. Note that all elements are in GF(2^{256}). The first stream value Z_0 is unknown.

Z_0 = K
Z_1 = (Z_0 \oplus S)^2 = K^2 \oplus S^2
Z_2 = (Z_1 \oplus S)^2 = Z_1^2 \oplus S^2

So, we can compute the seed and key as S^2 = Z_2 \oplus Z_1^2 and K^2 = Z_1 \oplus S^2 = Z_1 \oplus Z_2 \oplus Z_1^2. The individual square roots exist and are unique.

def num2poly(num):
    poly = R(0)
    for i, v in enumerate(bin(num)[2:][::-1]):
        if (int(v)):
            poly += x ** i
    return poly

def poly2num(poly):
    bin = ''.join([str(i) for i in poly.list()])
    return int(bin[::-1], 2)

def gf2num(ele):
    return ele.polynomial().change_ring(ZZ)(2)

P = 0x10000000000000000000000000000000000000000000000000000000000000425L

fake_secret1 = "I_am_not_a_secret_so_you_know_me"
fake_secret2 = "feeddeadbeefcafefeeddeadbeefcafe"
secret = str2num(urandom(32))

R = PolynomialRing(GF(2), 'x')
x = R.gen()
GF2f = GF(2**256, name='a', modulus=num2poly(P))

f = open('ciphertext', 'r')
A = GF2f(num2poly(int(f.readline(), 16)))
B = GF2f(num2poly(int(f.readline(), 16)))
C = GF2f(num2poly(int(f.readline(), 16)))

b = GF2f(num2poly(str2num(fake_secret1)))
c = GF2f(num2poly(str2num(fake_secret2)))

# Retrieve partial key stream using known plaintexts
Y = B + b
Z = C + c

Q = (Z + Y**2)
K = (Y + Q).sqrt()

print 'flag{%s}' % hex(gf2num(A + K)).decode('hex')

This gives the flag flag{t0_B3_r4ndoM_en0Ugh_1s_nec3s5arY}.


Well, maybe the previous one is too simple. So I designed the ultimate one to protect the top secret!

There are some key insights:

  • The process1(m, k) function is basically the same as in previous challenge, but it computes the multiplication m \otimes k with the exception that elements are in GF(2^{128}) this time.  We omitt the multiplication symbol from now on.
  • The process2(m, k) function might look involved, but all that it does is to compute the matrix multplication between two 2 \times 2 matrices (with elements in GF(2^{128})), i.e., XY = \begin{pmatrix}x_0 & x_1 \\ x_2 & x_3\end{pmatrix}\begin{pmatrix}y_0 & y_1 \\ y_2 & y_3\end{pmatrix}
  • We start with matrices X = \begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix} and Y = \begin{pmatrix}A & B \\ 0 & 1\end{pmatrix}.
  • Raising A to a power yields has a closed form formula: A^s = \begin{pmatrix}A^s & B(A^{s-1} \oplus A \oplus 1 )\\ 0 & 1\end{pmatrix} =\begin{pmatrix}A^s & B\frac{A^{s} \oplus 1 }{A \oplus 1}\\ 0 & 1\end{pmatrix}.
  • The nextrand(rand) function takes the integral value of N, we call this s and computes Y^s = \begin{pmatrix}A & B \\ 0 & 1\end{pmatrix}^s via a square-and-multiply type algorithm. In python, it would be
    def proc2(key):
        AN = A**gf2num(N)
        return key*AN+(AN+1)/(A+1)*B

Let us look at the nextrand(rand) function a little more. Let R be the random value fed to the function. Once Y^s is computed, it returns

Q = R A^s \oplus  \frac{A^s \oplus 1}{A\oplus 1} B

Define U = \frac{B}{A\oplus 1}. Adding this to the above yields

Q\oplus  U = R A^s \oplus  \frac{A^s \oplus 1 \oplus 1}{A\oplus 1} B = A^s(R \oplus \frac{B}{A\oplus 1}) = A^s(R\oplus U).

So, A^s = \frac{Q\oplus U}{R \oplus U}. Note that given two elements of the key stream, all these elements are known. Once determined, we compute the (dicrete) \log \frac{Q\oplus U}{R\oplus U} to find s. And once we have s, we also have N. Then, all secrets have been revealed!

From the plaintext, we can immediately get the key K by XORing the first part of the plaintext with the corresponding part of the ciphertext. This gives K =\texttt{0x2fe7878d67cdbb206a58dc100ad980ef}.

R = PolynomialRing(GF(2), 'x')
x = R.gen()

GF2f = GF(2**128, name='a', modulus=num2poly(0x100000000000000000000000000000087))

A = GF2f(num2poly(0xc6a5777f4dc639d7d1a50d6521e79bfd))
B = GF2f(num2poly(0x2e18716441db24baf79ff92393735345))
G1 = GF2f(num2poly(G[1]))
G0 = GF2f(num2poly(0x2fe7878d67cdbb206a58dc100ad980ef))

U = B/(A+1)
Z = (G1+U)/(G0+U)
N = discrete_log(Z, A, K.order()-1)

We can then run the encryption (the default code) with the parameters N,K fixed to obtain the flag flag{LCG1sN3ver5aFe!!}.

Boston Key Party 2017 – Sponge

In this challenge, we are given a hash function, which essentially splits the input into chunks of ten bytes. It then appends six bytes of null data to it and XORs with the current state. The state is encrypted with the all-zero key and updated. This process is repeated until all blocks have been exhausted.

A simple case is to consider what happens when we hash ten bytes of null data.

\begin{array}{lcll} \textbf{State}_{i} & & \textbf{State}_{i+1} & \textbf{Input}\\ \texttt{00000000000000000000~000000000000}&\rightarrow&\texttt{66e94bd4ef8a2c3b884c~fa59ca342b2e}&\texttt{00000000000000000000}\\ \texttt{e6e94bd4ef8a2c3b884d~fa59ca342b2e} &\rightarrow& \texttt{cccb674e90ee226bea81~557bff1e7123} & \texttt{80000000000000000001} \end{array}

If we can bring it back to the same state, we have a local collision. Here is an example of such a collision:

\begin{array}{lcll} \textbf{State}_{i} & & \textbf{State}_{i+1} & \textbf{Input}\\ \texttt{00000000000000000000~000000000000}&\rightarrow&\texttt{66e94bd4ef8a2c3b884c~fa59ca342b2e}&\texttt{00000000000000000000}\\ \texttt{210e7f3dc8b624d41038 fa59ca342b2e} &\rightarrow& \texttt{26e1e20ccd3ce8e975a6 33c3d2824408} &\texttt{47e734e9273c08ef9874}\\ \texttt{0f809b9375310b0656cf 33c3d2824408}& & \texttt{d979b8a852674734c855 000000000000} & \texttt{2961799fb80de3ef2369}\\ \texttt{00000000000000000000 000000000000} & \rightarrow &\texttt{66e94bd4ef8a2c3b884c~fa59ca342b2e}&\texttt{d979b8a852674734c855}\\ \texttt{e6e94bd4ef8a2c3b884d~fa59ca342b2e} &\rightarrow& \texttt{cccb674e90ee226bea81~557bff1e7123} & \texttt{80000000000000000001} \end{array}

So, how did we create the above collision? Well, actually, it is not too complicated… first, note that we cannot control the last six bytes. Recall that \textsf{Enc}(\textsf{Enc}(x \oplus a \| \texttt{0x00}^6) \oplus b \| \texttt{0x00}^6) \oplus c \| \texttt{0x00}^6 = y. Let us reorder the function as follows:

\textsf{Enc}(x \oplus a \| \texttt{0x00}^6)  = \textsf{Dec}(y \oplus c \| \texttt{0x00}^6) \oplus b \| \texttt{0x00}^6

If we force the trailing six bytes to null and then decrypt that block for different values on y \oplus c \| \texttt{0x00}^6, since we can control c. Equivalently, we can encrypt for different values of x \oplus a \| \texttt{0x00}^6, where the trailing six bits will be the trailing six bits of \textsf{Enc}(\texttt{0x00}^{16}).

Utilizing the birthday paradox, we can find a collision in the six bytes in \sim \sqrt{256^6} time and space. This is done by putting about 2^{24} values of the encryption (or decryption) in a table. Then, we generate the decryptions (or encryptions, respectively) and look in the table.


In Python, it could look something like:

trailing_bytes_first ='\x00'*16).encrypt('\x00'*16)[-6:]

for i in range(0, 2**24):
    plaintext = os.urandom(10) + trailing_bytes_first
    data['\x00'*16).encrypt(plaintext)[-6:]] = plaintext

for i in range(0, 2**24):
    ciphertext = os.urandom(10) + '\x00\x00\x00\x00\x00\x00'
    if'\x00'*16).decrypt(ciphertext)[-6:] in data:
        print [data['\x00'*16).decrypt(ciphertext)[-6:]]], [ciphertext]

Two such values are

local_collision_blocks = ['!\x0e\x7f=\xc8\xb6$\xd4\x108\xfaY\xca4+.', '\xd9y\xb8\xa8RgG4\xc8U\x00\x00\x00\x00\x00\x00']

Finally, we generate a collision as follows

GIVEN = 'I love using sponges for crypto'
A ='\x00'*16).encrypt(local_collision_blocks[0])
B ='\x00'*16).decrypt(local_collision_blocks[1])
local_collision = '\x00'*10 + xorstring(local_collision_blocks[0][:10],'\x00'*16).encrypt('\x00'*16)) + xorstring(A,B)[:10] + xorstring(local_collision_blocks[1][:10], GIVEN[:10])

BSides’17 – Delphi

In this challenge, we are given a server which accepts encrypted commands and returns the resulting output. First we define our oracle go(cmd).

import urllib2

def go(cmd):
	return urllib2.urlopen('' + cmd).read()

This simply return the status from the server. It is common for this kind of CTF challenges to use some block-cipher variant such as some of the AES modes.

The first guess I had was that AES-CBC was being used. That would mean that if we try to flip some bit in a block somewhere in the middle of the ciphertext, the first decrypted part would remain intact, whilst the trailing blocks would get scrambled.

Assume that we have four ciphertext blocks C_0, C_1, C_2, C_3 and the decryption is \textsf{dec}_k : C_0\| C_1\| C_2\| C_3 \mapsto P_0\|P_1\|P_2\|P_3. Now, we flip a bit in C_1 so that we get C_1', then we have \textsf{dec}_k : C_0\| C'_1\| C_2\| C_3 \mapsto P_0\|P'_1\|P'_2\|P'_3. (This is not true, thanks to hellman for pointing that out in the comments).

Turns out this is not the case. In fact, the error did only propagate one block and not further, i.e.,\textsf{dec}_k : C_0\| C'_1\| C_2\| C_3 \mapsto P_0\|P'_1\|P'_2\|P_3. Having a look at the Wikipedia page, I found that this is how AES-CFB/(CBC) would behave (image from Wikipedia):



Since \textsf{dec}_k(C_0) \oplus C_1 = P_1, we can inject some data into the decrypted ciphertext! Assume that we want P'_1 = Q. Then, we can set C'_1 = C_1 \oplus P_1 \oplus Q, since then \textsf{dec}_k(C_0) \oplus C'_1 = P_1\oplus P_1 \oplus Q = Q. Embodying the above in Python, we might get something like

def xor(a, b):
    return ''.join(chr(ord(x) ^ ord(y))
              for x, y in zip(a, b))

cmd = '8d40ab447609a876f9226ba5983275d1ad1b46575784725dc65216d1739776fdf8ac97a8d0de4b7dd17ee4a33f85e71d5065a02296783e6644d44208237de9175abed53a8d4dc4b5377ffa268ea1e9af5f1eca7bb9bfd93c799184c3e0546b3ad5e900e5045b729de2301d66c3c69327'
response = ' to test multiple-block patterns' # the block we attack

split_blocks = [cmd[i * 32: i * 32 + 32]
                for i in range(len(cmd) / 32)]

block = 3 # this is somewhat arbitrary

# get command and pad it with blank space
append_cmd = '  some command'
append_cmd = append_cmd + '\x20' * (16 - len(append_cmd))

new_block = xor(split_blocks[block].decode("hex"),
new_block = xor(new_block.decode("hex"),

split_blocks[block] = new_block
cmd = ''.join(split_blocks)
#print cmd
print go(cmd)

We can verify that this works. Running the server, we get

This is a longer string th\x8a\r\xe4\xd9.\n\xde\x86\xb6\xbd*\xde\xf8X\x15I  some command  e-block patterns\n

OK, so the server accepts it. Nice. Can we exploit this? Obviously — yes. We can guess that the server does something like

echo "{input string}";

First, we break off the echo statement. Then we try to cat the flag and comment out the rest. We can do this in one block! Here is how:

append_cmd = '\"; cat f* #'

Then, the server code becomes

echo "{partial + garbage}"; cat f* #{more string}";

The server gives the following response:

This is a longer string th:\xd7\xb1\xe8\xc2Q\xd7\xe8*\x02\xe8\xe8\x9c\xa6\xf71\n

Indeed, this is the flag. So, we are done!

Some icons for macOS Sierra.

Today, I drew some replacement icons for some apps I use daily (i.e., Terminal, Wireshark, Hex Fiend and Hopper) since I did not like default ones. I release them under the GPL v.3.0 License, so feel free to use them in your OS 🙂


In action (with the dark grey and light grey terminal icons, respectively):

Here are some additional icons:

Covert communication over a noisy channel

Consider a situation in which one part Alice wants to communicate with another part Bob over a discrete memoryless channel with crossover probability p_1  while ensuring a low level of detection in the presence of a warden Willie, who observes the communication through another discrete memoryless channel with crossover probability p_2. We assume that all parts are equally computationally bounded.

Skärmavbild 2016-08-23 kl. 22.44.47

In this situation, we require that p_1 < p_2 for non-zero p_1,p_2. Let \mathbf{u} \in \mathbb{F}_2^l be a sequence of secret bits she wants to transmit.

NOTE: Alice and Bob may on forehand agree upon any strategy they desire, but the strategy is known also to Willie.

Communicating with a secret key

If Alice and Bob share a secret key k, they may use it to pick a common encoding, i.e., an error-correcting code. This encoding in turn is used to encode the message, which of course must be able to correct the errors produced by the channel. Assuming that the correction capability is below the error rate of Willie’s channel, he cannot decode. Let N bits be the length of the publicly transmitted sequence. An established result states that if \sqrt N \cdot \log N bits are shared between Alice and Bob, they can transmit \sqrt N secret bits without Willie seeing them (with high probability).

Communicating without a secret key

Now, assume that Alice and Bob do not share a common key. Alice performs the following steps:

  1. Picks a cryptographically secure random vector \mathbf r.
  2. Computes the scalar product \mathbf r \cdot \mathbf u = k.
  3. Sends the bits \mathbf r \| k over the channel.

This reduces to the problem of \textsc{Learning Parity with Noise} (LPN) and can be solved with the BKW algorithm (or similar) if p_1 is sufficiently low. In particular, if Bob receives

N = 4 \cdot \log 2 \cdot l \cdot \epsilon_1^{-2 (l+1)}

such sequences, or equivalently,

N \cdot l = 4 \cdot \log 2 \cdot l^2 \cdot \epsilon_1^{-2 (l+1)}

bits, he will be able to decode with high probability. Here we have exploited the piling-up lemma disregrading the fact that some bits in \mathbf{u} are zero and does not contribute. For some probabilities p_1<p_2 and natural number N, the information is hidden from Willie. The information rate is determined as follows: N = \mathcal{O}(l^2\cdot \epsilon_1^{-2(l+1)}), so

\mathcal{O}(\sqrt N)=\mathcal{O}(l \cdot\epsilon_1^{-l-1})\iff\mathcal{O}(\sqrt N\cdot\epsilon_1^{l+1})=\mathcal{O}(l).

This bound can be improved upon by an increase in the number of parities.

Here is a more detailed paper on a similar topic. Another paper is here.

Solving problems with lattice reduction

Suppose that we are given a system of modular equations

\begin{array}{rcl} a_0 \cdot x + b_0\cdot y & = & c_0 \pmod q \\ a_1 \cdot x + b_1\cdot y & = & c_1 \pmod q \\ & \ldots & \\ a_n\cdot x + b_n\cdot y & = & c_n \pmod q \\ \end{array}

Trivially, this can be solved for unknown x and y using a simple Gaussian elimination step, i.e., writing the equations as

\mathbf{A} \begin{pmatrix}x & y\end{pmatrix}^T = \mathbf{c} \iff \begin{pmatrix}x & y\end{pmatrix}^T = \mathbf{A}^{-1} \mathbf{c}.

This is perfectly fine as long as the equations share a common modulus q, but what about when the equations share unknowns but are defined under different moduli? Let us take a real-world scenario from the realm of cryptography.

Example: DSA using linear congruential generator (LCG)

The DSA (Digital Signature Algorithm) has two functions \textsf{Sign} and \textsf{Verify}. To sign a message (invoking \textsf{Sign}), the following steps are performed:

  1. Let H be a hash function and m the message to be signed:
  2. Generate a (random) ephemeral value k where 0 < k < q.
  3. Compute r=\left(g^{k}\bmod\,p\right)\bmod\,q. If r=0, go to step 2.
  4. Compute s=k^{-1}\left(H\left(m\right)+r \cdot x\right)\bmod\,q. If s=0, go to step 2.
  5. The signature is \left(r,s\right).

To verify as signature (invoking \textsf{Verify}), the below steps are performed:

  1. If 0 < r < q or 0 < s < q is not satisfied, reject the signature.
  2. Compute w = s^{-1} \bmod\,q.
  3. Compute u_1 = H\left(m\right) \cdot w\, \bmod\,q.
  4. Compute u_2 = r \cdot w\, \bmod\,q.
  5. Compute v = \left(g^{u_1}y^{u_2} \bmod\,p\right) \bmod\,q.
  6. If v = r, accept. Otherwise, reject.

In the case of using LCG as a psuedorandom-number generator for the values k, two consecutively generated values (we assume one signature was generated right after the other) will be correlated as a \cdot k_1 + b = k_2 \pmod m for some public parameters a,b,M. Assuming that M \neq q, we obtain equations

\begin{array}{rclc} s_1 \cdot k_1 - r_1\cdot x & = & H(m_1) &\pmod q \\ s_2 \cdot k_2 - r_2\cdot x & = & H(m_1) &\pmod q \\ -a\cdot k_1 + 1\cdot k_2 & = & c & \pmod M \\ \end{array}

from the fourth step of \textsf{Sign}.

Chinese Remainder Theorem (CRT)

A well-known theorem in number theory is the Chinese Remainder Theorem (commonly referred to with the acronym CRT), which deals with simple equations over different and pairwise-prime moduli. For instance,

\begin{array}{rcl} x & = & 2 \pmod 3 \\x & = & 3 \pmod 5 \\ x & = & 2 \pmod 7 \end{array}

which has the solution x = 23. In solving actual multivariate equations, we will hit a brick wall, as needed operations such as row reduction and modular inversion does not work.

The following code takes any moduli, which do not need to be relatively prime, and computes the CRT:

def crt_lll(m, n):
	p = n[0]
	q = n[1]
	a = m[0] % p
	b = m[1] % q
	L = p*q # can be tweaked

	A = matrix([
		[1, 1, 1/L, 0],
		[p, 0, 0, 0],
		[0, q, 0, 0],
		[0, 0, p*q/L, 0],
		[-a, -b, 0, L]

	B = A.LLL()

	assert(B[-1, 0] == 0 and B[-1, 1] == 0)

	return B[-1, 2] * L

print crt_lll(
	[777 % 234, 777 % 567],
	[234, 567]

Thue’s lemma

Thue’s lemma states that given a modulus n and an integer a \in \mathbb{Z}_n, there exists two integers |x|,|y| \leq \sqrt{n} such that

ax = y \bmod n.

We can solve this with lattice reduction as follows:

def thues_lemma(a, mod):
            Find small elements x, y such that a * x = y
            and |x| < sqrt(mod) and |y| < sqrt(mod).
    R = Zmod(mod)
    A = matrix([
        [1,   a],
        [0, mod]
    A = A.LLL()
    return A[0][0], A[0][1]

# test
p = random_prime(2^512)
q = random_prime(2^512)
n = p * q
a = randint(0, 2**1024)
u, v = thues_lemma(a, n)
assert(a * u % n == v % n)

Lattice reduction

A lattice is a discrete linear space spanned by a set of basis vectors. For instance, \mathbf b_1 and \mathbf b_2 are two basis vectors that spans the space below. They are not unique in that sense, but evidently, they are the shortest basis vectors possible (such a basis can be found using the LLL algorithm). If the two column vectors \mathbf b_1 and \mathbf b_2 are basis vectors, then the corresponding lattice is denoted \mathcal{L}(\mathbf B). Here, \mathbf B = (\mathbf b_1 ~ \mathbf b_2).

Screen Shot 2016-08-11 at 21.53.23

The problem of finding the shortest vector is called \textsf{SVP}. Sloppily formulated: for a given lattice \mathcal{L}(\mathbf B), find the shortest (in terms of Euclidan norm) non-zero vector. The answer to that question would be \mathbf b_2. Starting with different basis vectors, the problem will show be trickier.

A related problem is to find the closest vector, which is commonly called \textsf{CVP}. In this scenario, we are given a vector \mathbf t in the vector space (but it might not be in \mathcal{L}(\mathbf B)) and we want to find the closest vector in \mathcal{L}(\mathbf B). There is a simple reduction from \textsf{CVP} to \textsf{SVP} by setting \mathbf t = \mathbf 0. There is also a reduction in the other direction, which involves extending the basis of \mathcal{L}(\mathbf B) to also include \mathbf t. This is called embedding.

Let us return to the example of DSA and different-moduli equations. The equations we got from before

\begin{array}{rclc} s_1 \cdot k_1 - r_1\cdot x & = & H(m_1) &\pmod q \\ s_2 \cdot k_2 - r_2\cdot x & = & H(m_1) &\pmod q \\ -a\cdot k_1 + 1\cdot k_2 & = & c & \pmod M \\ \end{array}

can be formulated as basis vectors. In matrix form, we have

\begin{pmatrix} -r_1 & s_1 & 0 & q & 0 & 0 \\ -r_2 & 0 & s_2 & 0 & q & 0 \\ 0 & -a & 1 & 0 & 0 & M\end{pmatrix} \cdot \mathbf y = \begin{pmatrix}H(m_1) \\ H(m_2) \\ c\end{pmatrix}

or (by embedding technique):

\begin{pmatrix} -r_1 & s_1 & 0 & q & 0 & 0 & H(m_1) \\ -r_2 & 0 & s_2 & 0 & q & 0 & H(m_2)\\ 0 & -a & 1 & 0 & 0 & M & c \end{pmatrix} \cdot \begin{pmatrix}\mathbf y \\ -1 \end{pmatrix}  = \mathbf 0

The idea is now to force the values corresponding to the guessed values x', k_1', k_2' to be close in the normed space. By adding additional constraints we may perform the following minimization:

\min_{x,k_1,k_2}\left\|\begin{pmatrix} -r_1 & s_1 & 0 & q & 0 & 0 & H(m_1) \\ -r_2 & 0 & s_2 & 0 & q & 0 & H(m_2)\\ 0 & -a & 1 & 0 & 0 & M & c \\ 1/\gamma_x & 0 & 0 & 0 & 0 & 0 & x'/\gamma_x \\ 0 & 1/\gamma_{k_1}  & 0 & 0 & 0 & 0 & k_1'/\gamma_{k_1} \\ 0 & 0 & 1/\gamma_{k_2} & 0 & 0 & 0  & k_2'/\gamma_{k_2}\end{pmatrix} \cdot \begin{pmatrix}\mathbf y \\ -1 \end{pmatrix}  \right\|

where \gamma_x = \min(x',q-x')\gamma_{k_1} = \min(k_1',m-k_1') and \gamma_{k_2} = \min(k_2',m-k_2'). Finding a closest approximation (preferably, using LLL/Babai) yields a solution to the DSA equations using LCG.

Example: Knapsack

The knapsack problem (sometimes subset-sum problem) is stated as follows. Given a weight t and n items/weights \{w_1,w_2,\dots,w_n\}, find the sequence x_1, x_2, \dots, x_n \in \{0,1\}^n such that

\sum_{i=1}^n w_i \cdot x_i = t.

It is not too hard to prove that this is NP-complete, but we omit the reduction here.

In a cryptographic setting, this can be used to encode data in the sequence x_1, x_2, \dots, x_n. This is called the Merkle-Hellman public-key cryptosystem. It is easy to see that encryption actually is the encoding procedure mentioned above. However, the decryption procedure is a bit more involved; in fact it requires a special class of instances. If the weights can be transformed into a super-increasing sequence, the problem becomes trivial to retrieve the sequence x_1, x_2, \dots, x_n.

Think of it this way. Assume that all weights sum to something smaller than the w_n weight. Then, if the target sum t (the ciphertext) is larger than w_1 + w_2 + \cdots + w_{n-1} (if not, t must be smaller assuming there is a unique solution), we know that x_n=1. We can now remove x_n and w_n from the equation by solving for the remaining weights and a recalculated t' = t - w_n. This procedure can be repeated until all weights have been found (or t = 0).

Merkle-Hellman provided a way to transform the super-increasing sequence into a hard one. We omit the details here, but the procedure can be found all over internet.

It turns out we can use lattice reduction for this problem. We create a basis matrix \mathbf A in the following way

\mathbf A = \begin{pmatrix} 1 & 0 & \cdots & 0 & w_1 \\ 0 & 1 & \cdots & 0 & w_2 \\ \vdots & \vdots & \ddots & 0 & \vdots \\ 0 & 0 & \cdots & 1 & w_n \\0 & 0 & 0 & 0 & -t\end{pmatrix}

and perform lattice reduction on it. Since the LLL/BKZ algorithm achieves a set of short basis, it will try to find something that makes the entries of the left-most column small. What this actually means is that it will find a sum of the rows that achieves a solution to the knapsack problem.

Of course, the solution must only contain values in \{0,1\}. Depending on the instance, this may or may not be the case. So, how do we penalize the algorithm for choosing values outside the allow set?

A new approach*

Let us create a new basis matrix \mathbf A' in the following manner:

\mathbf A'_i = \begin{pmatrix} 1 & 0 & \cdots & 0 & w_1 & \alpha\\ 0 & 1 & \cdots & 0 & w_2 & \alpha\\ \vdots & \vdots & \ddots & 0 & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & w_n & \alpha\\0 & 0 & 0 & 0 & -t & -\alpha \cdot i\end{pmatrix}

The algorithm performs the following steps:

  1. Randomly (or deterministically) pick a guess i on the number of non-zero values in the solution.
  2. Update the matrix run LLL/BKZ on \mathbf A'_i.
  3. If a satisfiable reduced-basis vector is found, return it. Otherwise, goto 1.

It does not guarantee that an incorrect solution is penalized, but it increases the probability of it (reduces the set of ‘false’ basis vectors). We omit a formal proof, but think of it this way: Assume that \mathbf v is false solution and a reduced-basis vector of \mathbf A. In \mathbf A'_i it also has to sum to the number of non-zero weights in the correct solution. Assume all false vectors appear randomly (they do not, but let us assume it!). Then for a correct guess of i, the probability of the vector \mathbf v being is {n \choose i} / 2^n. If i = \epsilon \cdot n, this is 2^{n(H(\epsilon)-1)}, which is a noticeable reduction for most values of i.

Here is an example of a CTF challenge which could be solved with the above technique.

* I do not know if this should attributed to someone.

Example: Ring-LWE


Computing sum of divisors for numbers up to N

After fiddling with Project Euler, I started to wonder how to compute the sum of divisors in the most efficient way (albeit, most the problems in PE does not require any super-sophisticated solution). The trivial approach is doing division and adding up the divisors sequentially. This may be good enough for small N, but as N grows this approach gives an infeasible time complexity.

The next thing one may think of is to generate a list of prime numbers (using a sieve) and for each number try all the numbers in the list. This is faster than the trivial approach, but still requires cN \cdot |\mathcal{P}_N| = cN \cdot \pi(N) \sim \mathcal{O}(N^2/\log N). Since the sieve requires time \mathcal{O}(N\cdot \log N), we do not need to do better than that. But can we match it? Turns out we can.

First, we need to establish a simple number theoretical fact: the \sigma(n) is the divisor counting function which has the properties \sigma(p^i) = 1 + p + \cdots p^i (and, thus, \sigma(p^i) = \sigma(p^{i-1}) + p^i) and  \sigma(p^i \cdot q^j) = \sigma(p^i) \cdot \sigma(q^j).

The solution is quite simple; consider the first element in the prime set \mathcal{P}. That is 2. We split the range into even numbers and for even  i compute

\sigma(n_i) = \sigma(2\cdot m) = \underbrace{\sigma(2)}_{\text{store this in }x_i} \cdot ~\sigma(m).

Then for 2^2, compute  \sigma(2^2) = \sigma(2) + 2^2, and so forth up to 2^i < N. The requires N/2 + N/4 + \cdots operations.

\begin{array}{cccccccccccc} 1 & 2 & 3& 4 &5& 6 &7& 8 &9& 10 &11& 12 \\ & \sigma(2) & & \sigma(2) & & \sigma(2) & & \sigma(2) & & \sigma(2) & & \sigma(2) \\ & & & +& & &&+& & && + \\ & & & 2^2& & && 2^2& & && 2^2 \\ & & & & & &&+& & && \\ & & & & & && 2^3& & && \\ \hline & \sigma(2) & & \sigma(4) & & \sigma(2) & & \sigma(8) & & \sigma(2) & & \sigma(4) \end{array}

Running it for all powers and primes, we use in total N \cdot (\sum_{p\in\mathcal{P}} \frac{1}{p} + \sum_{p\in\mathcal{P}} \frac{1}{p^2} + \cdots) operations. Now, we have \sum_{i > 1} \frac{1}{p^i} = \frac{1}{p(p-1)}. So the complexity is \mathcal{O}(N \cdot \sum_{p\in\mathcal{P}} \frac{1}{p}). This is a harmonic series and treating it as an integral, we have \int_{p\in\mathcal{P}} \frac{1}{p} < \log {p}. Of course, we do only generate primes up to N, so the final complexity is \mathcal{O}(N \cdot \log N). So we are done 🙂

As always, let us look at a Python representation:

def sum_of_divisors(N):
    N += 1 # extend limit to inclusive
    primes = primetools.sieve(N)
    numbers = [1]*N
    for q in primes:
        tmp = {}
        mult = q
        while mult <= N:
            for i in range(mult, N, mult):
                if i in tmp: tmp[i] += mult
                else: tmp[i] = 1 + mult
            mult *= q
        for j in tmp: numbers[j] *= tmp[j]
    return numbers[2:]

Below is a plot of the timing of the code. Since the code stores data in memory, some undesired effects comes into play for largers values (due to the need of shuffling data back and forth) and is therefore not present in the graph.


Bounded by the sieving complexity, the algoritm cannot be any faster using the sieve. Can we do better? Note that the algorithm returns a list of sums. If we are really interested in the sum of sums, we may instead compute \sum_{i}^{N}\sigma(i) = \sum_{i}^{N} i \cdot \lfloor \frac{N}{i} \rfloor, which is \mathcal{O}(N) and uses basically no memory. This can be written as

def sum_of_sum_of_divisors(N):
    div_sum = 0
    for i in range(1, N):
        div_sum += i * (N / i)
    return div_sum

How does it work? For each i, it computes the cardinality of the set of numbers divisible by i,i.e., \sum_{i|j, 1 \leq j \leq N} i, which is exactly  \lfloor \frac{N}{i} \rfloor. This contributes with i for every element, so the total contribution is  i\lfloor \frac{N}{i} \rfloor. Doing it for all possible divisors, we end up with the sum of sum of divisors.

Turns out, we can actually do it in \mathcal{O}(\sqrt N). Here is how:

def square_root_time_sum_of_sum_of_divisors(N):
    global dcached
    if N in dcached: return dcached[N]
    div_sum, i = 0, 1
    q = int(math.sqrt(N))
    while i <= q:
        div_sum += (i * (N / i))
        i += 1
    i = 1
    while i <= N / (q + 1):
        m = N / i
        k = N / (i + 1)
        div_sum += (i * (m * (m + 1) - k * (k + 1)) / 2)
        i += 1
    dcached[N] = div_sum
    return div_sum

Consider the below example. In the left table, we have the sum \sum_{i=1}^8 i \cdot \lfloor \frac{8}{i} \rfloor and in the right table \sum_{i=1}^8T(\lfloor 8/i\rfloor) where T(n) = \frac12 \cdot n(n+1):

\begin{array}{c | cc cc cc cc | c} i & & & & & & & & & i \cdot \lfloor \frac{8}{i} \rfloor\\ \hline 1 & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & 8 \cdot 1\\ 2 & \bf{2} & \bf{2} & \bf 2 & \bf 2 & & & & & 2 \cdot 4\\\hline 3 & 3 & 3 & & & & & & & 3 \cdot 2\\ 4 & 4 & 4 & & & & & & & 4 \cdot 2\\ 5 & 5 & & & & & & & & 5 \cdot 1\\ 6 & 6 & & & & & & & & 6 \cdot 1\\ 7 & 7 & & & & & & & & 7 \cdot 1\\ 8 & 8 & & & & & & & & 8 \cdot 1\\ \end{array} \iff \begin{array}{c | cc |cc cc cc | c} i & & & & & & & & & T(\lfloor \frac{8}{i} \rfloor)\\ \hline 1 & 1 & 2 &\bf 3 & \bf 4 & \bf 5 & \bf 6 & \bf 7 & \bf 8 & \frac12 \cdot 9 \cdot 8\\ 2 & 1 & 2 &\bf 3 & \bf 4 & & & & & \frac12 \cdot 5\cdot 4\\ 3 & 1 & 2 & & & & & & & \frac12 \cdot 3\cdot 2\\ 4 & 1 & 2 & & & & & & & \frac12 \cdot 3\cdot 2\\ 5 & 1 & & & & & & & & 1\\ 6 & 1 & & & & & & & & 1\\ 7 & 1 & & & & & & & & 1\\ 8 & 1 & & & & & & & & 1\\ \end{array}

Clearly, it is the same table/matrix but transposed, so the sum of elements are the same. As we notice in the matrices above, their rows become sparser for increasing i. Think about it! What if we were to compute only the bold numbers and adding them? Then we need only two function calls each for left and right. The point where we stop computing left and start to compute right can be chosen arbitrarily, but the optimal is \sqrt{N}. Then \lfloor \frac{N}{\sqrt{N}+1}\rfloor + \lfloor \sqrt{N} \rfloor \approx 2\sqrt{N}.

Example: Project Euler 23
This is a simple problem, where we are supposed to compute all numbers which cannot be expressed as a sum of two abundant numbers. Using the above code, we can solve the problem in about 100 ms using pypy:

def abundant_numbers(N):
    numbers = sum_of_divisors(N)
    abundants = []
    for i in range(0, N):
        if 2*i < numbers[i]: abundants.append(i)
    return abundants[1:]

N = 28123

abundants = abundant_numbers(N)
s = [0]*N

for x in abundants:
    for y in abundants:
        ls = x + y
        if ls < N: s[ls] = 1
        else: break

print sum(i for i in range(0, N) if not s[i])

Example: Project Euler 439
This problem is quite nice, because it requires a very well-thoughtout approach for it to be feasible solving it (it is also among the hardest problems on PE). Here is the question (notation slightly altered):

Let \sigma(k) be the sum of all divisors of k. We define the function S(N) = \sum_{1 \leq i \leq N}\sum_{1 \leq j \leq N} d(i \cdot j). For example, S(3) =\sigma(1) +\sigma(2) +\sigma(3) +\sigma(2) +\sigma(4) +\sigma(6) +\sigma(3) +\sigma(6) +\sigma(9) = 59

You are given that S(10^3) = 563576517282 and S(10^5) \bmod 10^9= 215766508. Find S(10^{11}) \bmod 10^9.

So, running our algorithm would be fine for lower values, but if we would try to store (10^{11})^2 \approx 2^{73} values in memory, we will run into a wall very fast (this is about 7.5 \cdot 10^{10} TBs assuming that each number is a 64-bit integer – note the modular reduction which can be done in every step). Clearly, we need to use the memoryless approach.

Let us first disregard the fact that \sigma(i \cdot j) =\sigma(i) \cdot\sigma(j) only holds when \gcd(i,j) = 1. Then S(N) = (\sum_{i}^{N} i \cdot \lfloor \frac{n}{i} \rfloor)^2 - C(N). The case of S(3), we will have

\begin{array}{rcccccc} (\sigma(1) +\sigma(2) +\sigma(3))^2 = & \sigma(1)\cdot\sigma(1) &+& \sigma(1)\cdot\sigma(2) &+& \sigma(1)\cdot\sigma(3) & + \\ &\sigma(2)\cdot\sigma(1) &+& \sigma(2)\cdot\sigma(2) &+& \sigma(2)\cdot\sigma(3) & +\\&\sigma(3)\cdot\sigma(1) &+& \sigma(3)\cdot\sigma(2) &+& \sigma(3)\cdot\sigma(3)\\\\ = & \sigma(1\cdot 1) &+& \sigma(1 \cdot 2) &+& \sigma(1 \cdot 3)&+\\&\sigma(2 \cdot 1) &+& \underline{\sigma(2)\cdot\sigma(2)} &+& \sigma(2 \cdot 3)&+\\ &\sigma(3 \cdot 1) &+&\sigma(3 \cdot 2) &+& \underline{\sigma(3)\cdot\sigma(3)} \end{array}

OK, so what about C(N)? We can estimate the how many numbers in the product for which it that holds \gcd(i,j) > 1. Take, \sigma(p^i)\cdot\sigma(p^j) - \sigma(p^{i+j}). We have \sigma(p^i) = (p^{i+1}-1)/(p-1), so

\sigma(p^i)\cdot\sigma(p^j) - \sigma(p^{i+j}) = \frac{(p^{i+1}-1)(p^{j+1}-1) - p^{i+j+1}-1}{p-1} = \frac{p^{i+j+2}- p^{i+j+1}-p^{i+1}-p^{j+1}}{p-1}

In particular, \sigma(p)\cdot\sigma(p) - \sigma(p^2) = p. Let us consider the case \gcd(i,j) = p. The (negative) contribution from this set

p \cdot  \underbrace{\left[ \left(\sum_{i=2}^Ni\left\lfloor \frac{N}{i \cdot p} \right\rfloor\right)^2 - C({N}/{p})\right]}_{=S(N/p)}.

Running over all possible gcds (all numbers p), we get

S(N) = \left(\sum_{i=1}^{N} i \cdot \lfloor \frac{N}{i} \rfloor\right)^2 - \underbrace{\sum_{p=2}^N p \cdot S(\lfloor N/p \rfloor)}_{C(N)}

The following code computes the smaller results in matter of milliseconds, but for N = 10^{11}, it takes too long (as we are aiming for the sub-minute mark):

cached = {}
dcached = {}
mod = 10**9

def S(N):
    global cached
    if N in cached: return cached[N]
    dv = square_root_time_sum_of_sum_of_divisors(N)
    S_sum, i = 0, 2
    while i <= N:
        # max recursion depth log2(10^11) ~ 33
        S_sum += (i * S(N / i)) % mod
        i += 1
    cached[N] = (dv * dv - S_sum) % mod
    return cached[N]

OK. Further optimization needed… note that

\begin{array}{rcc} S(3) = & 1 \cdot (\sigma(1/1) +\sigma(2/1) +\sigma(3/1))^2 & - \\ & 2 \cdot (\sigma(1/2) +\sigma(2/2) +\sigma(3/2))^2 &- \\ & 3 \cdot (\sigma(1/3) +\sigma(2/3) +\sigma(3/3))^2 \end{array}

Using the same approach as before, we can derive that

S(N) = \sum_{i=1}^N i \cdot \mu(i) \cdot \left(\sum_{j=1}^{N} j \cdot \lfloor \frac{N}{i \cdot j} \rfloor\right)^2

where \mu(\cdot) is the Möbius function, defined

\mu(n) = \left\{\begin{array}{rl} -1 & n \textnormal{ has an odd~number of primefactors,}\\ 1 & n \textnormal{ has an even~number of primefactors,}\\ 0 & n \textnormal{ has a squared primefactor.} \end{array}\right.

We can compute this in \mathcal{O}(N^{3/2}) time. The Möbius function can be pre-computed in \mathcal{O}(N \cdot \log N) like so:

def mobius(N):
    L = [1]*(N+1)
    i = 2
    while i <= N:
        # if i is not sq. free, then no multiple is
        if L[i] == 1 or L[i] == -1:
            L[i] = -L[i]
            sq = i * i
            j = i + i
            while j <= N:
                # we use 2 to mark non-primes
                if abs(L[j]) == 2:
                    L[j] = -L[j]
                    L[j] = -L[j]*2
                j += i
            j = sq
            while j <= N:
                L[j] = 0
                j += sq
        i += 1
    i = 2
    while i <= (N):
        if abs(L[i]) == 2:
            L[i] = L[i] / 2
        i += 1
    return L