2020-5-28 created by AD1024

SMT

Reverse Engineering

You are given a Python source code like this:

```
m = '???'
n = '???'
assert len(m) == 63 and set(m).issubset(set(a + p))
assert len(n) == 7 and set(n).issubset(set(a))
assert m.count('tjctf{') == 1
assert m.count('}') == 1
assert m.count(' ') == 5
print(str([x for z in [[[ord(m[i]) ^ ord(n[j // 3]) ^ ord(n[i - j - k]) ^ ord(n[k // 21]) for i in range(j + k, j + k + 3)]for j in range (0, 21, 3)] for k in range(0, len(m), 21)] for y in z for x in y])[1:-1])
```

And you are asked to find inputs of $m$ and $n$ such that the output is

```
1, 18, 21, 18, 73, 20, 65, 8, 8, 4, 24, 24, 9, 18, 29, 21, 3, 21, 14, 6, 18, 83, 2, 26, 86, 83,
5, 20, 27, 28, 85, 67, 5, 17, 2, 7, 12, 11, 17, 0, 2, 20, 12, 26, 26, 30, 15, 44, 15, 31,
0, 12, 46, 8, 28, 23, 0, 11, 3, 25, 14, 0, 65
```

First, we need to unroll these annoying list comprehensions, and we can get something like this:

```
lst = []
idx = 0
for k in range(0, len(m), 21):
for j in range(0, 21, 3):
for i in range(j + k, j + k + 3):
lst.append(m[i] ^ n[j // 3] ^ n[i - j - k] ^ n[k // 21])
```

Well, it is not apparent to see what is this actually doing. **BUT**: do we actually need to know what is it doing? The answer is **NO**.

Have you heard of *Symbolic Execution*? We are to mimic this process by compiling sets of constraints, and send the constraints to Z3, and get the model, which will be the answer.

Since the original inputs are strings, which is not a good choice to pick as the core SMT theory to deal with this problem, since we are to restrict the set of characters that can appear in the string, but we don’t want to introduce new abstractions (like a set or something), since these kind of abstractions can make the compilation harder (in SMT, there is no explicit data flow, it’s all about constraint solving).

Alternatively, we can pick array of positive integers that represents the ASCII code in the string, in this way, we can restrict the characters by simply setting a range constraint on the number.

First, let’s describe the constraints informally in English.

Constraints of string $m$:

- Only contain characters in string $a$ and $p$
- Has exactly 1 substring “tjctf{”
- Has excatly 1 substring “}”
- Has exactly 5 spaces

Define a function $\texttt{ord} : \texttt{Char} \to \mathbb{N}$ that converts an English character to ASCII encoding.

Define a function $\texttt{ords} : \texttt{String}\to \mathbb{S}$ where $\mathbb{S}$ is a set of $\mathbb{N}$ that computes the set of ASCIIs of characters in an arbitary string of English characters.

We’d be very careful when translating the constraints to a formal description:

- $\forall c \in m.\ c \in \texttt{ords}($a$) \cup \texttt{ords}($p$)$
- $$\exists p.\ m[p] = \texttt{ord}(t) \wedge m[p + 1] = \texttt{ord}(j) \wedge m[p + 2] = \texttt{ord}( c )\wedge m[p + 3] = \texttt{ord}(t)$$ $$\wedge m[p + 4] = \texttt{ord}(f) \wedge m[p + 5] = \texttt{ord}(\lbrace) \wedge p \ge 0 \wedge p + 5 < \texttt{length}(m)$$ (CAREFUL about the bound!)
- $\exists p.\ m[p] = \texttt{ord}(\rbrace)$
*careful:*only exists one such place, so we also need: $\forall p^\prime.\ p^\prime \neq p \Rightarrow m[p^\prime]\neq \texttt{ord}(\rbrace)$ - Similarly, add constraints for spaces, also be careful that there only exists 5 such places.

Constraints of $n$: pretty similar to that of $m$, actually it is simpler since the only constraint is that $$\forall c \in n.\ c \in \texttt{ords}(a)$$.

This constraint is about the loop body. But we can use Z3Py, so just add the constraints in the loop.

Notice, since we are doing XOR on numbers, the theory we choose is actually arrays of bitvectors. Since ASCII is pretty small, 32 bits is sufficient to cover the problem.

It’s simple, just read out each position and convert them back to characters.

```
goal = [1, 18, 21, 18, 73, 20, 65, 8, 8, 4, 24, 24, 9, 18, 29, 21, 3, 21, 14, 6, 18, 83, 2, 26, 86, 83, 5, 20, 27, 28, 85, 67, 5, 17, 2, 7, 12, 11, 17, 0, 2, 20, 12, 26, 26, 30, 15, 44, 15, 31, 0, 12, 46, 8, 28, 23, 0, 11, 3, 25, 14, 0, 65]
m = '???'
n = '???'
assert len(m) == 63 and set(m).issubset(set(a + p))
assert len(n) == 7 and set(n).issubset(set(a))
assert m.count('tjctf{') == 1
assert m.count('}') == 1
assert m.count(' ') == 5
# Find m and n such that the following will print the same result as print(str(goal)[1:-1])
print(str([x for z in [[[ord(m[i]) ^ ord(n[j // 3]) ^ ord(n[i - j - k]) ^ ord(n[k // 21]) for i in range(j + k, j + k + 3)]for j in range (0, 21, 3)] for k in range(0, len(m), 21)] for y in z for x in y])[1:-1])
ans = []
import z3
M = z3.Array('m', z3.IntSort(), z3.BitVecSort(32))
N = z3.Array('n', z3.IntSort(), z3.BitVecSort(32))
solver = z3.Solver()
solver.push()
print(sorted(list(map(ord, a + p))))
# forall c in m. c in a + p
for i in range(len(m)):
solver.add(
z3.Or(
z3.And(M[i] >= 32, M[i] <= 47),
z3.And(M[i] >= 58, M[i] <= 64),
z3.And(M[i] >= 91, M[i] <= 126)
)
)
least = min(map(ord, a))
most = max(map(ord, a))
# forall c in n. c in a
for i in range(len(n)):
solver.add(z3.And(
N[i] >= least, N[i] <= most
))
lst = []
idx = 0
for k in range(0, len(m), 21):
for j in range(0, 21, 3):
for i in range(j + k, j + k + 3):
solver.add(((M[i] ^ N[j // 3] ^ N[i - j - k] ^ N[k // 21]) == goal[idx]))
idx += 1
for i in range(5):
p = z3.Int(f'p{i}')
lst.append(p)
solver.add(p >= 0, p < len(m))
solver.add(M[p] == ord(' '))
solver.add(z3.Distinct(*lst))
other_space = z3.Int('other_space')
solver.add(z3.ForAll(other_space, z3.Implies(
z3.And(other_space != lst[0],
other_space != lst[1],
other_space != lst[2],
other_space != lst[3],
other_space != lst[4]),
M[other_space] != ord(' ')
)))
# tjctf{
start = z3.Int('start')
solver.add(M[start] == ord('t'))
solver.add(M[start + 1] == ord('j'))
solver.add(M[start + 2] == ord('c'))
solver.add(M[start + 3] == ord('t'))
solver.add(M[start + 4] == ord('f'))
solver.add(M[start + 5] == ord('{'))
solver.add(start >= 0, start + 5 < len(m))
close = z3.Int('close')
solver.add(M[close] == ord('}'))
solver.add(close >= 0, close < len(m))
other_close = z3.Int('other_close')
solver.add(z3.ForAll(other_close, z3.Implies(
other_close != close,
M[other_close] != ord('}')
)))
print(solver.check())
model = solver.model()
ans_m = ''
ans_n = ''
for i in range(len(m)):
ans_m += chr(int(str(model.eval(M[i]))))
for i in range(7):
ans_n += chr(int(str(model.eval(N[i]))))
print(ans_m)
print(ans_n)
```