The short-range wireless technology Bluetooth uses the encryption standard E0. The keystream generator E0 is built using a combination generator with four LFSRs of lengths and primitive feedback polynomials
To mount a more efficient attack, the authors of [LV04] developed a maximum-likelihood decoding algorithm based on the fast Walsh transform. The algorithm recovers, given an element in and a linear code over of dimension and length , the closest codeword in time and memory .
In order to succeed with the attack, they need to find weight-4 polynomial multiples of which has degree 97.
We implement an algorithm based on [LJ14], with multiple threads in mind.
Input: Polynomial of degree , small integer .
Output: Polynomial multiple of weight 4.
1. Select a random subset .
2. From , create all residues for and put in list .
3. Create all for . This is done by merging the list with it self, keeping only elements .
4. Sort the resulting list by residue value.
5. Merge the list by itself, keeping only multiples . Output a list of all, or the one with smallest degree.
Let the number of worker threads be , where is not a multiple of 2. We choose to be a prime number close to the number of max threads. We then create buckets, each containing a hashmap or other data structure. We order the buckets as a two-dimensional structure with .
In implementation, every thread generates the full set of monomials , each taken modulo . This computation is very light in terms of performance compared to memory reads and writes.
Define the function
as . Clearly if , then necessarily when taken modulo .
The functional value of determines if the thread should consider it or not, i.e., if the value matches the thread number. This way, we achieve thread safety without using mutex.
A thread will put element in , unless it is already occupied. If so, say with was already occupying that space in , then it proceeds to and puts there.
In the final step, we swap the accessing order. So each thread will only handle elements which had equal -function values. As equal values are necessary (but not sufficient) condition for collision to occur, it is impossible to filter out correct candidates by mistake.
Each thread will merge the elements in the corresponding row. Once collision is found, program outputs such that and terminates. If no collision is found, we re-iterate with a new mask selection for .
The polynomial of degree 72 can be found by setting and runs in a few seconds using threads. For the 97-degree polynomial, we need about 560 GB of RAM. While this may sound immensly large, it can actually be done by running it on an AWS instance at relatively low cost.
Another solution is to pick for some small integer . This reduces the size of the first collision layer . Then, for each thread, we require that for the element to be added into . This reduces the size of the stored data to , increases the running time by a factor .
This actualizes the attack on the E0 cipher by Lu and Vaudenay.
sage: Q0 = R(x^12647431389+x^8008354628+x^1277498415+1) sage: Q1 = R(x^12671067191+x^6590666154+x^1306461382+1) sage: P = R(x^97+x^92+x^89+x^80+x^76+x^73+x^69+x^68+x^64+x^61+x^59+x^58+x^57+x^56+x^55+x^54+x^53+x^51+x^49+x^44+ ....: x^43+x^41+x^39+x^36+x^35+x^34+x^32+x^31+x^30+x^21+x^19+x^15+x^13+x^9+x^5+x^3+1) sage: Q0 % P 0 sage: Q1 % P 0
Many thanks to klondike (KISec AB) for providing help with computational resources and improving the code.