(Part of a series of writeups from INS'HACK CTF 2019.)

The challenge text reads:

Someone left a strange script and some files on a server.

Will you help me understand what it did to a server ?

#Analysis

The three files are a, k, and l.

a:      data
k:      data
l:      Python script, ASCII text executable, with very long lines

a and k are some form of binary data, and l is a Python script. Let's start with the script.

import zlib;exec(zlib.decompress(b'\xaa\x9c\xbd\xd9...\xff\x0f\x08\xa0'))

It seems to be wrapper that decompresses and executes some code. If we replace exec with print we can see what that code is (instead of executing it).

import binascii;exec(binascii.unhexlify(b'696d706f727420...c7186632272929'))

Another wrapper, but this time hex encoding instead of compression.

If we do the same thing once again, we're back with another round of zlib compression.

import zlib;exec(zlib.decompress(b'\x9c\x8d\x9d\x11...\xca\x87\x91\xf2'))

Not knowing how many times this could go around, we'll write a simple script to automate the unwinding.

#!/bin/bash
set -ex

cp l tmp

while grep -qv -e binascii -e zlib tmp; do
    cat tmp | sed 's/exec/print/' | python | sponge tmp
done

This automates what we were doing manually, using sed to transform each script into one which outputs its code back into the same. This is repeated as long as binascii or zlib is found within the file, at which point we stop.

Turns out zlib was still mentioned in the final script, but replacing exec with print caused a syntax error so it stopped anyway. We have what we want so there's no point in fixing it now.

The result was an obfuscated mess.

import os as _4491ba09efd64d32a7c316a89ed23540
import uuid as _9c0223fbd33e432cad291e3626fa8375
import zlib as _75efb4b5390e450f86339bb370cf9798
import struct as _460dcfe1367b4351bfd11161e6afb4bc

_d92c89d5c8e6480ab83f62d04ac6968e = zip
_2a054c268d044e6ab9b9ba49b0c99ac2 = vars
_efe18136e1d14a66b014f121b6cdc599 = type
_9013c61b775f4709917f60cfb4c4d0bd = tuple
_c78251a8b4ef4e5c975562404153eca2 = super
_05314ca66c8241329018829d41388221 = sum
# ...
_66299359ebad416a9be7d03a54485ba7 = FileNotFoundError
_091dbd374eac4d218391f5d651aa5242 = FileExistsError
_9656b9e60ace40ab958f7005bb4ca86f = False
_78174034c577445f8c0a1c0379d54c86 = Exception
_7609428dbb2a4d0e93506071a0ba0003 = EnvironmentError
_aa69ae8b32464738b560cc318d6d6cbd = Ellipsis

# ...

def _46a7c5c324234b988db190a6f5011912():
    _9020d73ce88149339a5c0db87f05b655 = _dee6853e587a46fe89d055f4a7d4a54c()
    _405a529ecd9e4156902881815fe5b115 = _9b5737cb2ddb4cafac4e8fa8db81fb4b()
    _2531f0349f41497dbf1ed7339b4c880b = _66107a3f96fc4f8aaf770c066d3260a5(_9020d73ce88149339a5c0db87f05b655.d)
    _1cf3273eea5e4642b24446463c8edb23 = _fc2e6885d4f6459c9173119853a9b941.Path(_9020d73ce88149339a5c0db87f05b655.o)
    for _f6cd3797fea24c06be091c077f687482 in _2531f0349f41497dbf1ed7339b4c880b._07da762b35064c328590155ab1fd0e23(_33186cdf76944c1d8b235829cede199b):
        _405a529ecd9e4156902881815fe5b115._5f8f129324e640809d01bddc294aaa1e(_f6cd3797fea24c06be091c077f687482)
    _1cf3273eea5e4642b24446463c8edb23.mkdir(parents=_33186cdf76944c1d8b235829cede199b, exist_ok=_33186cdf76944c1d8b235829cede199b)
    _405a529ecd9e4156902881815fe5b115._6d9fd80d4dd64ec79fcb9665c4c3d5d6(_1cf3273eea5e4642b24446463c8edb23)

if (_0725a4328dac4b608bca988314e49535 == '__main__'):
    _46a7c5c324234b988db190a6f5011912()

If you look closely you can see the rough outline of a Python program there. Variables, functions, properties and builtins have been replaced with hashes, resulting in 357 lines of obfuscated code.

A series of seds will deal with the top section.

sed -i 's/_d92c89d5c8e6480ab83f62d04ac6968e/zip/g' obfuscated.py
sed -i 's/_2a054c268d044e6ab9b9ba49b0c99ac2/vars/g' obfuscated.py
sed -i 's/_efe18136e1d14a66b014f121b6cdc599/type/g' obfuscated.py
sed -i 's/_9013c61b775f4709917f60cfb4c4d0bd/tuple/g' obfuscated.py
sed -i 's/_c78251a8b4ef4e5c975562404153eca2/super/g' obfuscated.py
sed -i 's/_05314ca66c8241329018829d41388221/sum/g' obfuscated.py
# ...

That does that, but there's no way around it, we're going to have to get our hands dirty. Armed only with multiple cursors and a bit of determination, let's go through and give everything a name.

#Archiver

See archiver.py.

Now that we can all enjoy looking at proper variable names, time to see what this script actually does.

def parse_args():
    argparser = argparse.ArgumentParser(description='')
    argparser.add_argument('d')
    argparser.add_argument('o')
    return argparser.parse_args()

def run():
    args = parse_args()
    archive = Archive()

    d = DirectoryEnumerator(args.d)
    o = pathlib.Path(args.o)

    for file in d.get_file_list(True):
        archive.add_file(file)

    o.mkdir(parents=True, exist_ok=True)
    archive.write(o)

if (__name__ == '__main__'):
    run()

Our journey begins by recursively listing all the files under the directory named by the first command line parameter, d. A new Archive is created, and each file is added in turn. Finally, the archive is written to the directory named by the second parameter, o, which is created if it doesn't already exist.

class Archive():
    header = b'L0LARCH\x00'
    size_limit = 1048576

    def __init__(self):
        self.entries = []
        self.keystore = KeyStore()

    def add_file(self, path: 'pathlib.Path'):
        archive_entry = ArchiveEntry(path)
        print(archive_entry)

        if (archive_entry.size > Archive.size_limit):
            raise RuntimeError(f'{archive_entry.f} size is above the limit ({Archive.size_limit})!')

        encrypted_archive_entry = self.keystore.get_encrypted_archive_entry(archive_entry)
        self.entries.append((archive_entry, encrypted_archive_entry))

    def write(self, path: 'pathlib.Path'):
        output = Archive.header
        output += struct.pack('I', len(self.entries))
        for (archive_entry, encrypted_archive_entry) in self.entries:
            print(f'adding {archive_entry.f}...')
            output += archive_entry.get_metadata()
            encrypted_contents = encrypted_archive_entry.encrypt()
            output += struct.pack('I', len(encrypted_contents))
            output += encrypted_contents
        path.joinpath('archive').write_bytes(output)
        self.keystore.write(path)

We can see that this is an archiver that encrypts its input files and stores its keys in a keystore.

It's clear now that the names of the other two files that came with the challenge, a and k, correspond to archive and keystore respectively.

L0LARCH\x00 is the magic byte sequence that identifies the archive. Indeed, if we inspect the early bytes of our a file, we can see it starts with the expected byte sequence.

$ head -c7 a
L0LARCH

As with most encryption challenges, the best place to go for the information you need is the actual encryption code.

def encrypt(self):
    with self.archive_entry.f.open('rb') as f:
        contents = f.read()
    compressed = pad(zlib.compress(contents), 16)
    encrypted = pad(self.encryptor.encrypt(compressed), 128)
    blocks = (len(encrypted) // 128)
    new_block_positions = list(range(blocks))
    random.shuffle(new_block_positions)
    for (leftside, rightside) in zip(list(range(blocks)), new_block_positions):
        self.mapping[leftside] = rightside
    x = [encrypted[(n * 128):((n + 1) * 128)] for n in range(blocks)]
    b = bytes([])
    for r in range(blocks):
        b += x[self.mapping[r]]
    return b

The algorithm has three main parts.

The file contents are compressed, and padded to a 16-byte boundary.
The compressed result is encrypted, and padded to a 128-byte boundary.
The encrypted result is split up into 128-byte chunks and shuffled.

The pad function pads out its input to the specified number of bytes using a byte with the same value as the number of bytes that are needed to pad it out. Therefore, to unpad the string it's a case of removing the number of bytes from the end corresponding to the value of the final byte.

This does mean that if the input already fits within the boundaries and no padding is needed, it will have to be padded to the full extent.

>>> pad(bytes('test', 'utf8'), 2)
b'test\x02\x02'

>>> pad(bytes('test', 'utf8'), 4)
b'test\x04\x04\x04\x04'

>>> pad(bytes('test', 'utf8'), 5)
b'test\x01'

Perhaps at the cost of a little waste sometimes, the unpad function can be very simple.

def unpad(s):
    return s[:-s[-1]]

#Extractor

See extractor.py.

Most of the work involved in writing this extractor will be undoing the work of packing the file format done by the archiver. This means we'll be taking bytes and byte sequences of varying lengths from our input buffers all day long, and array subscriptions are going to get tiresome.

A wrapper class will make the job a bit easier.

class Buffer():
    def __init__(self, s):
        self.s = s
        self.i = 0

    def take(self, n):
        s = self.peek(n)
        self.i += n
        return s

    def peek(self, n):
        return self.s[self.i:self.i + n]

    def unpack(self, fmt, n):
        return struct.unpack(fmt, self.take(n))

    def unpack_I(self):
        return self.unpack('I', 4)

self.a = Buffer(read('a'))
self.k = Buffer(read('k'))

We can read bytes from our input files using the take(n) method. For example, the extraction process begins with a magic byte sanity check.

def check_file_headers(self):
    if self.a.take(8) != b'L0LARCH\x00':
        raise 'Archive invalid'
    if self.k.take(8) != b'L0LKSTR\x00':
        raise 'Keystore invalid'

Now we'll look at the extract function. Even though we're writing a one-time extractor never to be used again, the urge to handle edge cases doesn't go away.

def extract(self):
    self.check_file_headers()

    (file_count,) = self.a.unpack_I()
    (key_count,) = self.k.unpack_I()

    print('Found %d files' % file_count)

    if file_count != key_count:
        raise 'File and key count mismatch'

    for i in range(file_count):
        file = self.extract_next_file()
        key = self.extract_next_key()

        print("Extracting %s: bytes=%d uuid=%s" % (
            file.meta['filename'],
            file.meta['size'],
            file.meta['uuid'])
        )

        decrypted = file.decrypt(key)
        filename = file.meta['filename']
        path = os.path.dirname(filename)

        if not os.path.exists(path):
            os.makedirs(path)

        with open(filename, 'wb+') as f:
            f.write(decrypted)

        os.chmod(filename, file.meta['mode'])

This loops through each file in the archive, decrypts it with its key, and then saves it to disk. The file mode is even set to match the original. (But not atime, mtime, and ctime, because that would be silly.)

#Result

The final directory tree looks like this.

$ tree
tmp
├── 0426d34dfdc9ac3a987f
│   ├── 030e472e78f1e9b847b1
│   ├── 1449f939bd39f6251966
│   └── 82c060b2ca9f6ff17c3a
├── 559591d87cfc511fab91
│   ├── 2de5a23a6734b53878ab
│   ├── 5c0ac53338432fbbc8e7
│   └── 768d44a40b744aa0e55d
├── 795dcd3c4debafa04240
│   ├── 0617dea7075e199c6948
│   ├── 1a5899d8ecaa7641c13e
│   └── b77a01977be3d5665cf3
├── 88190c29ecd14c2c0171
├── b20fdd45bba34b561afc
│   ├── 30b17b6ae03e8179eced
│   ├── 9d1251486cedea824c95
│   └── d959052ff1eb24d442bd
├── true-hacker-desktop
├── where-i-think-u-are
└── where-u-think-u-are

4 directories, 16 files

The flag is contained within 88190c29ecd14c2c0171.

$ cat tmp/88190c29ecd14c2c0171
INSA{9c431db9206d2c13bd730a331f07561e49fdebb13ef13057bbeee655a6808fa5}

ins'hack ctf – obscure file format

#Analysis

#Archiver

#Extractor

#Result

#Code