How much can we feasibly strip from a zig binary? Starting from a normal zig program that does absolutely nothing:
2180K for a binary that does nothing. Given that the smallest possible executable ELF file is around 80 bytes, 2180K is quite a bit of bloat. What happens when we strip out debug info?
Saved 1988K just by stripping out debugging information. However 192K is still quite far from our 80 byte goal. We are still compiling in Debug mode, so let’s switch to ReleaseSmall (equivalent to -Os for gcc/clang as far as I can tell).
Now we’re at 12K! Saved 180K just by switching from Debug to ReleaseSmall. Next step is to enable function and data sections to allow the linker to strip away unreferenced functions or data.
…and that did nothing. I guess ReleaseSmall already handles this optimization.
Taking a peek at the ELF sections shows quite a few unnecessary sections:
.eh_frame and .eh_frame_hdr are generated to provide unwinding information, and is not strictly necessary for the the binary to run. The .comment section holds useless metadata. .tbss is a section for thread local storage, which is also unnecessary since the program does not do any threading.
Switching from x86_64-linux-gnu to x86_64-freestanding-none cuts most of the extra cruft from the binary, down to 472 bytes. Looking at the sections now reveals that all but 2 sections have been removed:
But something isn’t quite right. The binary no longer contains any executable code. This is because we have to change our executable’s entrypoint. Now that our platform is freestanding, the entrypoint is _start instead of main.
Our compile command hasn’t changed and the binary size is now slightly larger.
Except now our binary has some executable code this time:
Looking at the size of the text section, it only contains 11 bytes of code. Where is the 605 extra bytes coming from? Inspecting the ELF further with readelf shows that there are 4 program segments. Each program segments takes up 56 bytes of space, for a total of 56∗4=224 bytes.
GNU_STACK is completely optional, and only acts as a hint to the linux kernel. PHDR is similarly unnecessary and the two LOAD segments can be merged into a single large RWX segment. We cannot directly control the program segments from the command line, so it is time to break out a linker script.
This script creates a single RWX segment that spans all of the executable code and data, cutting down the 4 segments to a single segment.
Recompiling with the linker script brings the binary down to 616 - 56 * 3 = 448 bytes.
We return our attention to the section headers in the binary. The linux kernel completely ignores section headers, so they can be safely removed without affecting the binary. The contents of .comment and .shstrtab can also be stripped since they are not mapped by any program segment.
Here we can take advantage of how the compiler lays out the ELF file.
Sections that are marked as ALLOC are sections that are mapped by a program segment and required for program execution. The way the ELF file is created the Section headers and non alloc sections are all in one contiguous block at the end of the file. To strip out the extra metadata we can cut away any data that is after the last ALLOC section.
Compiling and patching now yields a 131 byte binary. Much better.
Now we can apply some optimizations to the code in the binary to save a few bytes. The disassembled code shows that the function still attempts to return even though the program exits before, and a strange extra stub function at the end.
Marking the function as noreturn eliminates one of the extraneous ret instructions.
Switching from syscall1 to syscall0 eliminates xor edi, edi.
_start is already marked as noreturn, so where is the xor eax, eax ; ret coming from? We can temporarily recompile with -fno-strip and dump the binary to figure out where the extra instructions are coming from.
What is getauxval doing here??? This is a freestanding environment so auxiliary values shouldn’t be used at all. Since the function is not referenced by anything, adding the -flto compile option to strip out unused functions and data removes the extra code.
This is the absolute limit that we can reach without using tricks to overlap the ELF metadata to further shrink the binary.
There is one last change that needs to be made before the binary can run on all linux systems. Currently the program header maps the binary at address 0x00000078, which would require the linux kernel to map a page at address 0x00000000.
Most linux distros set the sysctl value vm.mmap_min_addr to a non zero address to mitigate kernel exploits taking advantage of kernel NULL dereferences. This means that as the binary is right now, it will not run on most modern linux distros. To fix this we can update the python patching script to change the ELF file type from EXEC to DYN. This will tell the linux kernel to choose a base address for the binary instead of using the program segment addresses directly.