Thank you to everyone who played DownUnderCTF 2021. I hope you enjoyed the challenges :)
Here are writeups for the hard crypto challenges I wrote.
All of the challenge files and solve scripts can be found here.
Challenge | Tags | Solves |
---|---|---|
yadlp | crypto | 14 |
power sign | crypto | 14 |
1337crypt v2 | crypto | 3 |
Substitution Cipher III | crypto | 1 |
yadlp
This challenge follows a similar structure to a commonly seen style of crypto CTF challenges that involve solving the discrete logarithm problem in some group other than the usual . The general approach is to figure out some information about what the group elements are like, and then try to find an isomorphism from the group to a group where we can easily solve the discrete logarithm problem.
Group Structure Part I
We'll see that the group we are dealing with in the challenge is the set of points on some hyperbola over . Although the operations defined in the handout code are called "addition" and "multiplication", it'll be helpful to use multiplication and exponentiation notation. When we say "elements", we usually mean points on the hyperbola, which are represented as tuples with addition operation defined as:
The rand_element
function gives us even more information about elements in this group. In fact, we can use it to get an explicit relation that all elements satisfy. The function first generates a random , then computes as
Rearranging, we get
It can be checked that the set of solutions to this equation, equipped with addition operation above indeed does form a group. We'll call this group .
Diophantine equations of the form are known as Pell equations and are well documented in the literature. We'll see later that the equation we found from the rand_element
can be written as a Pell equation, and use that for analysing the group even more.
Challenge Analysis
For now we'll take a look at how the flag is hidden and what we'll need to recover it. The 48 byte flag is broken up into 6 blocks of 8 bytes each. These blocks are represented as the integers . Then, for some randomly generated elements , the value is computed.
Given and the , the goal is to recover the . This is known as the t-multiple discrete logarithm problem, and in this case, we can solve it by solving the discrete logarithm problem in , combined with lattice techniques.
Let be the order of and let be a generator of (we will see later that is cyclic). Define the discrete logarithm to the base as , that, for any , gives us a value such that . It can be shown that this map is a homomorphism, so . Therefore,
Since we're working in , we can equivalently write
But since the are relatively small (~62 bits) compared to (which we will determine later), we can solve for the using lattice techniques. Specifically, consider the lattice generated by the rows of the matrix
Notice that times the first row, plus times the second row, etc. take times the seventh row, and an appropriate multiple of the last row, gives the short vector
which can be found with lattice basis reduction algorithms such as LLL.
Group Structure Part II
Group Order
All we need left to solve the challenge is a way to solve the discrete logarithm problem in . Determining the group order is a good first step as we could use it to try and see if there is an isomorphism between and an additive group of integers of the same order where the discrete logarithm is trivially solved by division. However, it turns out that the order of is and that is isomorphic to the cyclic subgroup of of order . We follow a similar approach to [1] for the proof.
Claim. Let be the set of solutions to the equation
Assume that (this is true for the challenge parameters). Then, where is the cyclic subgroup of of order .
Proof. First, note that , so it suffices to show that the set of solutions to the equation (equipped with the addition operation as defined above) is isomorphic to . Let . Note that is irreducible as it is of degree 2 and has no roots, since has no square in by assumption. Therefore . Now, if , then (since has order ). But we can write for . So,
So, if we take and , then we see that . Therefore, we have the bijection . To see that this is a homomorphism, let . Then
But
So we have .
Therefore is an isomorphism, and .
If we take a look at the actual values in the challenge, we'll see that is smooth, with the largest factor being 32 bits! This is a hint that we're on the right path, since we'll be able to solve the discrete logarithm problem in a reasonable time with Pohlig-Hellman.
Inverse Elements
By this point, we've pretty much already solved the challenge, but as a bonus, we can find the inverse elements of . Although the solution script doesn't do this, this has the advantage of speed over using the isomorphism found above since the operations are slightly faster (than operations in ). Recall the addition formula
The identity element is clearly , so to find the inverse element of , we want to find such that . Let . Then
So .
Another advantage of figuring out inverse elements is that you can solve the challenge without finding the isomorphism to by guessing that the order is since it is smooth, and then implementing Pohlig-Hellman and BSGS to use the addition, multiplication and inverse operations.
References
power sign
Challenge Overview
We are given the code running on a server that implements a signature scheme. We can ask the server to sign one message, then we are challenged to forge a signature for a random message.
The signature scheme resembles the Rabin signature algorithm. The public key is an RSA modulus whose prime factorisation is the private key. A message is signed by first computing a randomised hash (where is random) of . If has a square root modulo , a square root modulo is computed and the signature is outputted. Otherwise, we try again with a different . Note that computing square roots modulo , or even determining whether a number has a square root modulo is equivalent to factoring . To verify a signature for the message , we simply check that the equality holds.
In the challenge, the primes generated are of the form . This is done because if is a prime such that , then if a square root of exists, it can be easily computed as . To compute a square root of modulo , we compute square roots of modulo and modulo , then combine them with the Chinese Remainder Theorem.
Security of Rabin Signatures
Before we proceed, we'll take a look at an easy attack against a simplified variant of the Rabin signature algorithm. Specifically, we consider a variant that does not use a hash function at all, i.e. to sign a message the signer simply computes a square root of modulo . For simplicity, we assume that all the messages to be signed actually do have square roots modulo , though it does not really matter. There is one trivial attack; anyone can "forge" a valid signature for if is a perfect square. For example, if , then is a valid signature since .
More interestingly however, it turns out that we can recover the private key given access to a signing oracle. We do this by choosing a random and ask the oracle to sign . It returns a square root of modulo . Now, if , then we have
so reveals a nontrivial factor of .
The Hash Function
This section and the next are somewhat algebra-heavy and basic results from algebra are used without proof for brevity. It may be worthwhile to read up on finite fields, field extensions and Galois theory (for later) if they are unfamiliar concepts.
We learned the importance of a good, randomised hash function in the previous section. And more importantly, we learned that a hash function which is simply the identity map (i.e. sends any input to itself), is completely insecure. We will now take a look at the hash function in the challenge.
Choose a composite integer and a proper divisor (in the challenge we have ). Let be the smallest prime number following , where is the signer's public key. We will work in an extension field of , where is a public degree irreducible polynomial. Let . Then generates and is basis for when viewed as an -dimensional -vector space. That is, elements in can be written in the form where .
The randomised hash function takes as input a message and an integer . Write in terms of powers of :
where . Then, is converted to an element in by computing
To obtain the hash, the function computes which we write as
The output is .
Choosing a Message to be Signed
After the server provides us with its public key, it prompts us to send a message to be signed. Note that (in the sign
function) isn't chosen randomly; it starts at and increments until is a square. There is also a peculiar restriction on the message; it has to be larger than and smaller than .
If we were able to send messages of any size for the server to sign, we can easily find a message that will help us to recover the private key. Specifically, we would choose a random and send the message . The hash function will convert this to and output the constant term in which happens to just be since all elements satisfy by Fermat's Little Theorem. So when the server signs this, we have the exact same situation as the attack described two sections ago. However, the size check prevents this attack.
Recall that the identity map is insecure as a hash function for the reasons given two sections ago. It turns out that our particular hash function is the identity map on a specific subset, or rather, subfield of other than . We have
where is the element in we obtain by converting . We can write this as a composition where
The goal will be to find fixed points of , which can be done by finding fixed points of since we can easily manipulate the result of by carefully choosing . Fixed points of satisfy
Note that for a finite field of order , the elements of are given by the roots of . This follows from the fact that the multiplicative group is a cyclic group of order , so if , then and so . That is cyclic of order follows from the structure theorem for finite abelian groups which states that any finite abelian group is a direct product of cyclic groups. There are a lot of references online for these results and their proofs.
So, to solve for the roots of we simply need to look at elements in the finite field of order . Since divides , then this field is actually a subfield of because if satisfies , then it also satisfies
(and it can also be checked that actually is a field). This is good for us as it means we can write the elements in terms of which is what the server will be expecting.
Let be a generator of . Because is a subfield of , then can be written as
where . Choose a random . We will want to find an element in of the form
such that when is applied to this element, the constant term remains as which will be the output of the hash function. To do this, we will find an element in with as its constant term, and then subtract from it to account for the randomising value which we can predict will be .
The element in we are interested in is obtained by computing
To send this to the server, we encode it as an integer:
The server will compute for us a square root of and if we have , we can easily recover the private key using the technique described two sections ago.
Once we have the private key, we can use the provided functions in the handout code to sign the challenge message and capture the flag.
Easier Solution
I was made aware of this by S3v3ru5's solve during the CTF, but choosing the message to be signed can actually be quite simple (though fundamentally relies on most of the above theory); the idea is to shift the goal posts a bit and instead of trying to find a fixed message, we find a message whose hash is something we can control. We do this by noting that since for all , then for any of our chosen , if we send , then after being hashed, the result is exactly . I imagine most teams would have solved this way instead of finding the subfield which is quite a bit more complicated. I obviously lacked the hindsight to spot this solution when writing the challenge, but it's pretty neat :)
Alternative Approach via Linearity Properties
This solution idea is due to rkm0959 who taught me this during the CTF after he solved it. Instead of looking at the fields involved, we can simply note that the hash function has some nice linearity properties. In particular
So to forge a signature for any given , we simply let and solve for :
Then, is a valid signature for .
This attack doesn't need to use the signing oracle and shows that the signature scheme is completely broken when using this hash function. Pretty cool solution!
Alternative Approach via Galois Theory
Alternatively, one might recognise the resemblance of the function with the Frobenius map which is an -automorphism that generates the Galois group of . Note that is cyclic and of order as is the identity map. The Fundamental Theorem of Galois Theory tells us that there is a one-to-one correspondence between the subgroups of the Galois group of , and the intermediate fields of . Explicitly, for a given subgroup of , the corresponding intermediate field of is given by the fixed field , the set of all elements in which are fixed by all of the maps in . Another result, sometimes known as the Fixed Field Theorem, tells us that the order of is equal to the degree of as an extension of . We use this, along with the fact that the subfields of are given by where divides , to find the fixed fields.
In the challenge, we have , so the Galois group is isomorphic to . The table below lists out the subgroups of and their corresponding intermediate fields ( denotes the identity map):
Subgroup of | Intermediate Field of |
---|---|
(all elements are fixed by ) | |
(only is fixed by all automorphisms) |
From this, we can see that is fixed by .
1337crypt v2
The challenge claims to be a more complex version of 1337crypt from last year's DUCTF. Reading the solution for 1337crypt may help a bit with some initial ideas as both challenges play with the idea of partial knowledge. In both challenges, we are given hints to recover the prime factors, but the hints themselves seem to omit some bits of information. Specifically, the hints are the integer parts of some values that should actually contain a fractional part as well. As we will see (and as you may suspect :)), lattice techniques will help us to recover this missing information and therefore the primes.
Flag Encryption
We start by looking at how the flag is encrypted so that we have a clear idea of what we need to recover it. The flag is encrypted in a way very similar to RSA. The only difference is that instead of using integers, we use Gaussian integers. If you aren't familiar with Gaussian integers, I highly recommend reading this excellent blog post by CryptoHack for some extra background.
We work in the ring (we will just write this as ) where is the RSA modulus. The flag is encoded as an element , of by choosing a random integer and computing .
The ciphertext is obtained by raising to the th power. In the challenge, is 0x1337.
To decrypt the ciphertext, we must find the multiplicative inverse of modulo the order of . (Note: in this context, when we say "the order of " we mean the cardinality of the multiplicative group of ) The order of is , so to compute it, we'll need to recover the primes.
Hints
The challenge generates 1337-bit primes and and gives us two hints, and the ciphertext. Using these hints, we need to recover and .
Hint 1
The first hint, which we will call , is . This seems like a pretty big hint, and indeed if we were also given we could easily recover the primes by finding the roots of a simple univariate polynomial. As far as I am aware, can't be used to directly recover and either; there are techniques such as factoring over the Gaussian integers, but this requires factoring over the integers which may take a long time. That said, if you were able to find a solution using this hint alone I'd be interested in seeing it :)
Hint 2
There is a lot more going on in the second hint. But firstly, some background. Number field is just a fancy name for an (finite) extension of the rational numbers . The simplest number field is itself, and the rational complex numbers, is also a number field.
We can see that a number field is constructed from and . In Sage, the first argument to NumberField is the defining polynomial. The roots of this polynomial specify the elements to be adjoined to to get the number field. We see that the defining polynomial for is , which has roots . Therefore, we define
(side note: we only need to adjoin one root since the other, it's conjugate, can be obtained from the root with usual field operations). In the handout code, denotes this adjoined element.
Now, for the values we are given. We get two instances of values with the same form. For the th hint (), two 1337-bit numbers and are generated, as well as two -bit numbers and . Then, is computed as
Lastly,
is computed and we are given the three values . Note that denotes the complex conjugate of , so the value gives us the squared complex modulus of . For ease of reading, we will write instead of .
Importantly, we note that is an element of and its components are rational numbers with fractional parts. Similarly, is a rational number with a fractional part. However, we are only given the integer parts of these values, so it seems like we might be missing some information.
Solution
Let's analyse the second hint in further detail and see if we can use it to recover the primes directly, or find a relation involving the primes that will help us to do so.
Analysing Hint 2
There will be some tedious algebra, so to make things a bit more readable we will drop the subscripts. Note that we use hint 1 here since appears. We have
Now, this looks like a mess and it kinda is, but fortunately we can clean it up a bit. Recall that and are 1337-bit numbers, while and are 338-bit numbers (). The following table gives the approximate size of each term (when considered as an integer), which we will use to reason with soon. Note that is on the order of bits.
Term | Size (in bits) |
---|---|
Now, what we will do is divide (integer division) the entire expression by . Because is around bits in size, we can more or less throw away any term whose size is or less. This leaves us with
This approximation itself is very accurate, and only differs by a few bits due to the terms. However, we don't know , so this equation isn't as useful for us. Instead, we write
where . Note that the approximation is off by a term of size approximately bits because of integer division rounding. We will omit the , but it should be understood that we are performing integer division.
Using Hint 2
Now let's bring back the subscripts, noting that we have two instances:
Finally, let's combine these two equations to eliminate .
Just so it's easier to read, let and . Then, rewriting the above equation, we have:
or to put it in a more exciting way:
Now, has "small" integer roots and . An algorithm for finding small roots of multivariate polyomials over the integers is described in this paper. Following a similar idea, defund's coppersmith implementation can also be used by working over an arbitrary ring for some large .
Lattice Approach
However, we can also recover and with a very simple lattice. Consider the lattice generated by the rows of the following matrix:
The short vector is an element of this matrix, given by the linear combination of times the first row, times the second row, and times the third row. LLL finds this vector.
Recovering the Primes
Now that we have , it is straightforward to recover . We use the approximation we found earlier:
This approximation is accurate to all but a few bits, which we can easily exhaust over, checking if is a square to know which candidate for is correct. We then compute as the square root of .
Getting the Flag
We've already discussed how to decrypt the ciphertext given that we have the prime factors. Compute as in regular RSA and raise to the th power. The flag is in the imaginary component of the result.
Substitution Cipher III
This challenge is a sequel to "Substitution Cipher II", though in terms of difficulty, it is a large step up. The basis of the challenge is Patarin's famous attack (though other attacks may work) on the Matsumoto-Imai cryptosystem which was hinted towards by the bold MI in the challenge's description (one could easily find useful resources online by searching "MI cryptosystem"). The large public key size is also a trait of multivariate public key cryptosystems which some people may have recognised.
Like with the other two substitution cipher challenges, we are given the code which has been used to encrypt the flag, as well as the ciphertext. In this challenge, we are also given a public key.
Matsumoto-Imai Cryptosystem
In this section we give a rough description of the Matsumoto-Imai-like cryptosystem as presented in the challenge. There are some differences from the original cryptosystem, so to hopefully make it easier to follow, we also draw comparisons between the mathematical notation and the handout code.
The general idea is that a composition of certain transformations is difficult to invert without knowledge of the individual transformations that make it up; so a private key is made up of some invertible transformations, and the corresponding public key is their composition. Additionally, solving a set of multivariate (even quadratic) polynomial equations over a finite field in general is proven to be NP-complete, so one shouldn't be able to easily recover the plaintext from a given ciphertext and the public key alone.
Parameters
To set the scene, we choose parameters and (in the challenge and ) and define some algebraic objects to work with:
where is an arbitrary degree irreducible polynomial.
In Sage, we need to write a bit more boilerplate to achieve this:
q = 2
F = GF(q)
K = BooleanPolynomialRing(n, 'x')
R.<t> = PolynomialRing(K)
i = GF(q)[t].irreducible_element(n)
I = Ideal(R(i))
I.reduce = lambda f: f % i # dirty sage hack
E.<tbar> = PolynomialRing(K, t).quo(I)
L.<x> = PolynomialRing(E)
A = AffineGroup(n, K)
BooleanPolynomialRing
is a more efficient implementation of boolean polynomials; in most cases it can be used as a clean replacement forGF(2)[x1, ..., xn]
.- As for the
I.reduce = ...
line, this should probably not need to be done but I was having difficulties getting Sage to properly reduce elements of modulo and this seemed like the only way to fix it. - The purpose of is to define polynomials whose variables are in (which we'll do next).
- is defined so that we can easily work with affine transformations.
Private Key
Next, we'll generate the private key. We choose random and where is randomly chosen. (Note that the construction of here is not in its most general form, and that this particular choice of is potentially weaker than other choices).
In Sage, we write:
S = A(GL(n, F).random_element(), random_vector(F, n), check=False)
r = sum(randint(0, 1) * tbar^i for i in range(n))
P = r * x^(q + 1)
T = A(GL(n, F).random_element(), random_vector(F, n), check=False)
A.random_element()
wasn't working for some reason, so we simply generate a random invertible matrix and a random vector with elements in which is enough to represent an affine transformation.
Public Key
To derive the public key from the private key, we simply take the composition .
Actually, that's not entirely correct but it captures the general idea. To be more explicit, and are affine transformations that map elements of to elements of , whereas is a function that takes an element in and outputs an element in . The way we glue together these components of different "types" is by introducing an extra function that converts between the two types; there is a natural bijection between and , so we define:
So more accurately, we take the public key to be .
itself can be represented by multivariate polynomials ; these are what will be used for encryption.
In Sage, we can write this concisely as:
B = S(K.gens()) * vector([tbar^i for i in range(n)])
Q = P(B)
R = T(Q.lift().coefficients())
Since and are elements of , we can perform the transformation they represent with S(x)
and T(x)
respectively, where x
is an element of . Multiplication by the vector is equivalent to the conversion map , and taking the coefficients of an element in is equivalent to . The reason for the .lift()
is to consider Q
as an element of (as opposed to an element of ) since .coefficients()
seems to be not implemented for this quotient ring.
Encryption
Encryption is easy but there is a slight caveat that might be confusing. The public polynomials map elements of to an element of . We defined to be a boolean polynomial ring, but it should be noted that is a subset of , and that is a subset of . When we encrypt messages, we are really only concerned with the subset and not with other elements of . (This is just for the reason that working with bits is easier; there shouldn't be any issues with encoding messages using other elements in )
To encrypt a message, we first encode the message as an element of and compute . where are the public polynomials. Since each of the sends elements of to an element in , the resulting ciphertext is an element of .
Solution
We are given nothing but the public key and the ciphertexts. We could attempt to recover the private key, but it isn't necessary. We will follow Patarin's attack from Crypto'95 which allows us to recover the plaintext for any given ciphertext.
The attack relies on the fact that maps of the form are linear for any integer . By this, we mean that if , then is such that with . In words; this says that the coefficients of can be written as a linear combination of the coefficients of .
I couldn't find a proof for this, (maybe because it's supposed to be obvious, but it certainly wasn't obvious to me), so here is my (possibly wrong) attempt at a justification (which I think might actually only work for ):
Define as for any integer . Let . Then
It should now follow that, after being reduced modulo the irreducible polynomial , the coefficients of are linear combinations of the .
Combined with some other clever insights, this eventually leads to a linear expression relating all plaintext and ciphertext pairs as we will soon see.
Finding the Relation
This is the core of the attack. To begin, recall that a plaintext message is encrypted by computing its ciphertext .
Let and . Note that and furthermore that . So we have
Now, applying to both sides of the equation, we get
Multiplying both sides by , we get
By the definition of and , we have
where the are elements of which we do not particularly care about. This follows because and are simply affine transformations of and respectively.
Combining this with the fact that the maps and are linear maps, equation effectively gives us an equation relating and where , and appear to a power of no greater than .
So, if we let and , then the information given by equation can be rewritten as:
And the amazing thing about this relation is that it holds for all plaintext/ciphertext pairs since we made no assumptions about them to begin with! That means it holds for the flag and the ciphertext we've been given too :)
Recovering the Constants
Equation gives us a relation that holds for all plaintext/ciphertext pairs, but we don't know the and . Fortunately, since encryption uses the public key (which we have), we can generate our own plaintexts and encrypt them to get valid plaintext/ciphertext pairs.
With a set of plaintext/ciphertext pairs, we can view equation as a system of linear equations in the unknowns and and use linear algebra techniques to solve for them. There are unknowns, so we'll need plaintext/ciphertext pairs to successfully recover the constants.
Generating the plaintext/ciphertext pairs may take a while (though, not that long if you're patient), so an easy optimisation we can do is choose sparse plaintext vectors; that is, plaintexts which have a lot of zero bits. Doing this speeds up polynomial evaluation which is the most expensive operation for encryption.
Recovering the Flag
The hard work is mostly done. If we have the relation and the constants, then to recover a plaintext given a ciphertext all we need to do is plug in the values, and solve for the values. Again, this is done using linear algebra techniques, but in this case instead of solving a system of linear equations in the unknowns and , we solve the system for the unknowns .