.;,;.
Golfing Zig ELF Binaries

Golfing Zig ELF Binaries

January 29, 2025
12 min read

How much can we feasibly strip from a zig binary? Starting from a normal zig program that does absolutely nothing:

pub fn main() void {}
zig build-exe main.zig -target x86_64-linux-gnu
du -hk main
# 2180    main

2180K for a binary that does nothing. Given that the smallest possible executable ELF file is around 80 bytes, 2180K is quite a bit of bloat. What happens when we strip out debug info?

zig build-exe main.zig -target x86_64-linux-gnu -fstrip
du -hk main
# 192     main

Saved 1988K just by stripping out debugging information. However 192K is still quite far from our 80 byte goal. We are still compiling in Debug mode, so let’s switch to ReleaseSmall (equivalent to -Os for gcc/clang as far as I can tell).

zig build-exe main.zig -target x86_64-linux-gnu -fstrip -OReleaseSmall
du -hk main
# 12      main

Now we’re at 12K! Saved 180K just by switching from Debug to ReleaseSmall. Next step is to enable function and data sections to allow the linker to strip away unreferenced functions or data.

zig build-exe main.zig -target x86_64-linux-gnu -fstrip -OReleaseSmall -ffunction-sections -fdata-sections --gc-sections
du -hk main
# 12      main

…and that did nothing. I guess ReleaseSmall already handles this optimization. Taking a peek at the ELF sections shows quite a few unnecessary sections:

There are 9 section headers, starting at offset 0x2068:
 
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .rodata           PROGBITS         00000000010001c8  000001c8
       0000000000000954  0000000000000000 AMS       0     0     8
  [ 2] .eh_frame_hdr     PROGBITS         0000000001000b1c  00000b1c
       00000000000000bc  0000000000000000   A       0     0     4
  [ 3] .eh_frame         PROGBITS         0000000001000bd8  00000bd8
       00000000000003d4  0000000000000000   A       0     0     8
  [ 4] .text             PROGBITS         0000000001001fac  00000fac
       0000000000001041  0000000000000000  AX       0     0     4
  [ 5] .tbss             NOBITS           0000000001002ff0  00001ff0
       000000000000000d  0000000000000000 WAT       0     0     8
  [ 6] .bss              NOBITS           0000000001004000  00002000
       0000000000003108  0000000000000000  WA       0     0     4096
  [ 7] .comment          PROGBITS         0000000000000000  00002000
       000000000000001c  0000000000000001  MS       0     0     1
  [ 8] .shstrtab         STRTAB           0000000000000000  0000201c
       0000000000000045  0000000000000000           0     0     1
 
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

.eh_frame and .eh_frame_hdr are generated to provide unwinding information, and is not strictly necessary for the the binary to run. The .comment section holds useless metadata. .tbss is a section for thread local storage, which is also unnecessary since the program does not do any threading.

zig build-exe main.zig -target x86_64-freestanding-none -fstrip -OReleaseSmall
# warning(link): unexpected LLD stderr:
# ld.lld: warning: cannot find entry symbol _start; not setting start address
wc -c main
#      472 main

Switching from x86_64-linux-gnu to x86_64-freestanding-none cuts most of the extra cruft from the binary, down to 472 bytes. Looking at the sections now reveals that all but 2 sections have been removed:

There are 3 section headers, starting at offset 0x118:
 
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .comment          PROGBITS         0000000000000000  000000e8
       000000000000001c  0000000000000001  MS       0     0     1
  [ 2] .shstrtab         STRTAB           0000000000000000  00000104
       0000000000000014  0000000000000000           0     0     1
 
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

But something isn’t quite right. The binary no longer contains any executable code. This is because we have to change our executable’s entrypoint. Now that our platform is freestanding, the entrypoint is _start instead of main.

const syscall1 = @import("std").os.linux.syscall1;
 
export fn _start() void {
    _ = syscall1(.exit, 0);
}

Our compile command hasn’t changed and the binary size is now slightly larger.

zig build-exe main.zig -target x86_64-freestanding-none -fstrip -OReleaseSmall
wc -c main
#      616 main

Except now our binary has some executable code this time:

There are 4 section headers, starting at offset 0x168:
 
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000001001120  00000120
       000000000000000b  0000000000000000  AX       0     0     4
  [ 2] .comment          PROGBITS         0000000000000000  0000012b
       000000000000001c  0000000000000001  MS       0     0     1
  [ 3] .shstrtab         STRTAB           0000000000000000  00000147
       000000000000001a  0000000000000000           0     0     1
 
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

Looking at the size of the text section, it only contains 11 bytes of code. Where is the 605 extra bytes coming from? Inspecting the ELF further with readelf shows that there are 4 program segments. Each program segments takes up 56 bytes of space, for a total of 564=22456 * 4 = 224 bytes.

Elf file type is EXEC (Executable file)
Entry point 0x1001120
There are 4 program headers, starting at offset 64
 
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000001000040 0x0000000001000040
                 0x00000000000000e0 0x00000000000000e0  R      0x8
  LOAD           0x0000000000000000 0x0000000001000000 0x0000000001000000
                 0x0000000000000120 0x0000000000000120  R      0x1000
  LOAD           0x0000000000000120 0x0000000001001120 0x0000000001001120
                 0x000000000000000b 0x000000000000000b  R E    0x1000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000001000000  RW     0x0
 
 Section to Segment mapping:
  Segment Sections...
   00     
   01     
   02     .text 
   03

GNU_STACK is completely optional, and only acts as a hint to the linux kernel. PHDR is similarly unnecessary and the two LOAD segments can be merged into a single large RWX segment. We cannot directly control the program segments from the command line, so it is time to break out a linker script.

This script creates a single RWX segment that spans all of the executable code and data, cutting down the 4 segments to a single segment.

ENTRY(_start)
 
PHDRS {
    code PT_LOAD FLAGS(7);
}
 
SECTIONS {
    . = SIZEOF_HEADERS;
    .text   : ALIGN(1) { *(.text.*) }
    .rodata : ALIGN(1) { *(.rodata.*) }
    .data   : ALIGN(1) { *(.data.*) }
    .bss    : ALIGN(1) { *(.bss.*) }
}

Recompiling with the linker script brings the binary down to 616 - 56 * 3 = 448 bytes.

zig build-exe main.zig -target x86_64-freestanding-none -fstrip -OReleaseSmall -T linker.ld
wc -c main
#      448 main

We return our attention to the section headers in the binary. The linux kernel completely ignores section headers, so they can be safely removed without affecting the binary. The contents of .comment and .shstrtab can also be stripped since they are not mapped by any program segment.

There are 4 section headers, starting at offset 0xc0:
 
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000078  00000078
       000000000000000b  0000000000000000  AX       0     0     4
  [ 2] .comment          PROGBITS         0000000000000000  00000083
       000000000000001c  0000000000000001  MS       0     0     1
  [ 3] .shstrtab         STRTAB           0000000000000000  0000009f
       000000000000001a  0000000000000000           0     0     1
 
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

Here we can take advantage of how the compiler lays out the ELF file.

ELF Header
Program segments
Section data (ALLOC)
Section data
Section headers

Sections that are marked as ALLOC are sections that are mapped by a program segment and required for program execution. The way the ELF file is created the Section headers and non alloc sections are all in one contiguous block at the end of the file. To strip out the extra metadata we can cut away any data that is after the last ALLOC section.

from pwnc.minelf import ELF
 
elf = ELF(open("main", "rb").read())
 
offset = 0
for section in elf.sections:
    if section.flags & elf.Section.Flags.ALLOC != 0:
        offset = section.offset + section.size
 
elf.header.section_offset = 0
elf.header.number_of_sections = 0
elf.header.section_name_table_index = 0
elf.raw_elf_bytes = elf.raw_elf_bytes[:offset]
elf.write("main")

Compiling and patching now yields a 131 byte binary. Much better.

zig build-exe main.zig -target x86_64-freestanding-none -fstrip -OReleaseSmall -T linker.ld
python3 patch.py
wc -c main
#      131 main

Now we can apply some optimizations to the code in the binary to save a few bytes. The disassembled code shows that the function still attempts to return even though the program exits before, and a strange extra stub function at the end.

main:   file format elf64-x86-64
 
Disassembly of section PT_LOAD#0:
 
0000000000000078 <PT_LOAD#0>:
      78: 6a 3c                         push    60
      7a: 58                            pop     rax
      7b: 31 ff                         xor     edi, edi
      7d: 0f 05                         syscall
      7f: c3                            ret
      80: 31 c0                         xor     eax, eax
      82: c3                            ret

Marking the function as noreturn eliminates one of the extraneous ret instructions.

const syscall1 = @import("std").os.linux.syscall1;
 
export fn _start() noreturn {
    _ = syscall1(.exit, 0);
    unreachable;
}
main:   file format elf64-x86-64
 
Disassembly of section PT_LOAD#0:
 
0000000000000078 <PT_LOAD#0>:
      78: 6a 3c                         push    60
      7a: 58                            pop     rax
      7b: 31 ff                         xor     edi, edi
      7d: 0f 05                         syscall
      7f: 31 c0                         xor     eax, eax
      81: c3                            ret

Switching from syscall1 to syscall0 eliminates xor edi, edi.

const syscall0 = @import("std").os.linux.syscall0;
 
export fn _start() noreturn {
    _ = syscall0(.exit);
    unreachable;
}
main:   file format elf64-x86-64
 
Disassembly of section PT_LOAD#0:
 
0000000000000078 <PT_LOAD#0>:
      78: 6a 3c                         push    60
      7a: 58                            pop     rax
      7b: 0f 05                         syscall
      7d: 31 c0                         xor     eax, eax
      7f: c3                            ret

_start is already marked as noreturn, so where is the xor eax, eax ; ret coming from? We can temporarily recompile with -fno-strip and dump the binary to figure out where the extra instructions are coming from.

main:   file format elf64-x86-64
 
Disassembly of section .text:
 
0000000000000078 <_start>:
      78: 6a 3c                         push    60
      7a: 58                            pop     rax
      7b: 0f 05                         syscall
 
000000000000007d <getauxval>:
      7d: 31 c0                         xor     eax, eax
      7f: c3                            ret

What is getauxval doing here??? This is a freestanding environment so auxiliary values shouldn’t be used at all. Since the function is not referenced by anything, adding the -flto compile option to strip out unused functions and data removes the extra code.

zig build-exe main.zig -target x86_64-freestanding-none -fstrip -OReleaseSmall -T linker.ld -flto
python3 patch.py
wc -c main
#      125 main
main:   file format elf64-x86-64
 
Disassembly of section PT_LOAD#0:
 
0000000000000078 <PT_LOAD#0>:
      78: 6a 3c                         push    60
      7a: 58                            pop     rax
      7b: 0f 05                         syscall

This is the absolute limit that we can reach without using tricks to overlap the ELF metadata to further shrink the binary.

ELF Header=64 bytesProgram Header=56 bytesCode=5 bytes=125 bytes\begin{align*} \text{ELF Header} & = \text{64 bytes} \\ \text{Program Header} & = \text{56 bytes} \\ \text{Code} & = \text{5 bytes} \\ & = \text{125 bytes} \end{align*}

There is one last change that needs to be made before the binary can run on all linux systems. Currently the program header maps the binary at address 0x00000078, which would require the linux kernel to map a page at address 0x00000000.

Elf file type is EXEC (Executable file)
Entry point 0x78
There is 1 program header, starting at offset 64
 
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000078 0x0000000000000078 0x0000000000000078
                 0x0000000000000005 0x0000000000000005  RWE    0x1000

Most linux distros set the sysctl value vm.mmap_min_addr to a non zero address to mitigate kernel exploits taking advantage of kernel NULL dereferences. This means that as the binary is right now, it will not run on most modern linux distros. To fix this we can update the python patching script to change the ELF file type from EXEC to DYN. This will tell the linux kernel to choose a base address for the binary instead of using the program segment addresses directly.

from pwnc.minelf import ELF
 
elf = ELF(open("main", "rb").read())
elf.header.type = elf.Header.Type.DYN
 
offset = 0
for section in elf.sections:
    if section.flags & elf.Section.Flags.ALLOC != 0:
        offset = section.offset + section.size
 
elf.header.section_offset = 0
elf.header.number_of_sections = 0
elf.header.section_name_table_index = 0
elf.raw_elf_bytes = elf.raw_elf_bytes[:offset]
elf.write("main")

The final ELF file:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x78
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         1
  Size of section headers:           64 (bytes)
  Number of section headers:         0
  Section header string table index: 0
 
There are no sections in this file.
 
There are no section groups in this file.
 
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000078 0x0000000000000078 0x0000000000000078 0x000005 0x000005 RWE 0x1000
 
There is no dynamic section in this file.
 
There are no relocations in this file.
No processor specific unwind information to decode
 
Dynamic symbol information is not available for displaying symbols.
 
No version information found in this file.
main:   file format elf64-x86-64
 
Disassembly of section PT_LOAD#0:
 
0000000000000078 <PT_LOAD#0>:
      78: 6a 3c                         push    60
      7a: 58                            pop     rax
      7b: 0f 05                         syscall