Run Functions in Another Stack with Zig

Posted: 11 Jul 2024

This blogpost is the result of an hour of funny mess-around that took place a few days ago just before starting the work day. I am no way an expert in this field, this is just couple of things that I found working when thinking about the concept of “why can’t I change the location of the stack in a program?”. I hope this post can encourage some people to experiment with low level code, not for any performance or intricate reasons, but because messing with “how does this work” is fun!

Every program compiled with a reasonable compiler runs putting some of its data in the so called stack. It’s fairly easy: when you call a function, the function puts its data on top of the stack; when the function returns the data is removed so that the stacks now exposes the data of the caller function. In the x86 architecture stack entries (the so called stack frames) are managed with the rbp and rsp registers. A more detailed introduction can be found here.

Note: all the code in this blogpost is written in Zig. However the same things can be replicated in every language that allows to write inline assembly in it. The C language is a very good example of that.

A First Look to the Assembly

Okay, now let’s get our hands dirty. And GodBolt is a very nice place to cover your hands in the most muddy dirt.

We start with a simple function, it should be the default function when you open the website:

export fn square(num: i32) i32 {
    return num * num;
}

Let’s have a look at the generated assembly now:

square:
    push    rbp
    mov     rbp, rsp
    sub     rsp, 16
    mov     dword ptr [rbp - 4], edi
    imul    edi, edi
    mov     dword ptr [rbp - 8], edi
    seto    al
    jo      .LBB0_1
    jmp     .LBB0_2
.LBB0_1:
    movabs  rdi, offset __anon_1406
    mov     esi, 16
    xor     eax, eax
    mov     edx, eax
    movabs  rcx, offset .L__unnamed_1
    call    example.panic
.LBB0_2:
    mov     eax, dword ptr [rbp - 8]
    add     rsp, 16
    pop     rbp
    ret

I removed some parts because they are not really relevant to what we are going to do.

A Few Ideas on Assembly

If it’s your first time reading some assembly code, here’s a few ideas on how it works:

The unindented lines like “square:” are labels. They do nothing, but are useful to say things like “go to label square”, or “go to label LBB0_1”.
The indented lines are instructions. Some have no arguments (like ret), some do (like imul).
Instructions can act on data that is either constant, in memory or on registers. You can think the registers as some very fast memory slots that have fixed names.
Usually the first argument of the instruction is the target data. For example the instruction sub rsp, 16 means “Take the value that is in the rsp register, subtract 16, and then put the result in the rsp register again”.

So What Does That Mean?

Now let’s dissect the assembly code, starting from the first instructions (1):

push rbp
mov  rbp, rsp
sub  rsp, 16

These lines manipulates the stack pointers. The rbp registers holds a pointer to the start of the current frame, while the rsp register holds a pointer to the its end. Since the stack grows from higher memory locations to lower memory locations, the value of rbp is greater or equal to that of rsp.

The push instruction “decrements the stack pointer and then stores the source operand on the top of the stack” (see this). So the first line saves rbp on the top of the stack.

The mov instruction overwrites the value of its first operand with the value of the second operand. Thus rbp now points to the end of the previous stack frame.

Finally the third lines subtracts 16 from rsp, meaning that the new stack frame is going to have a size of 16 bytes.

Let’s continue with a second batch of lines (2):

mov  dword ptr [rbp - 4], edi
imul edi, edi
mov  dword ptr [rbp - 8], edi

Here the operands of mov are not registers but memory locations. The operand dword ptr [rbp - 4] means “whatever the memory location (rbp - 4) contains, of the size of a dword (which 4 bytes)”. In other words we are moving the content of the edi register in the first 4 bytes of the current stack frame. Usually the edi register contains the first argument of the function, that in this case is the integer value num in the original Zig code. Let’s notice that num is of type i32, which has size exactly 4 bytes, a dword!

The imul instruction works similarly to the sub function, performing a signed integer multiplication between the two operands and storing the result in the first operand.

The third instruction saves the result of the multiplication in a second slot on the stack frame.

We do not really care for the next set of three lines (3):

seto al
jo   .LBB0_1
jmp  .LBB0_2

It is used to handle the overflow in the multiplication. If the overflow happens, the program will continue at the label LBB0_1, otherwise the normal flow will proceed at the LBB0_2 label. More details on the overflow path are left to the reader to figuere them out 😃.

Let’s have a look at the non-overflow path then (4):

mov eax, dword ptr [rbp - 8]
add rsp, 16
pop rbp
ret

The first line moves the result of the multiplication into the eax register, which is commonly used for the return values. The next two lines revert the changes to the stack pointer and base stack pointer. The instruction add rsp, 16 closes the parenthesis opened at sub rsp, 16 and pop rbp does the same thing with push rbp. The stack is thus in the same condition as we started, masking all the computations that happened since the beginning of the square function.

A last instruction ret gives back control to the caller function (see this).

A Brief Detour: Optimized Code

What’s the point of all that stack juggling in such a simple function? Well… mostly none. In fact let’s run an optimizing compiler by passing the -O ReleaseFast argument to the Zig compiler. The generated assembly will be much simple:

square:
    mov     eax, edi
    imul    eax, edi
    ret

With this option we can see we also disabled the overflow check.

Explicit Stack Pointer Manipulation

Having understood the basic mechanisms of the stack pointers we are almost ready to present our final piece of code.

We start with a dummy function, it’s not really important what it does, I just want it to fill a few stack frames.

fn fibonacci(n: u32) u32 {
    if (n <= 1) return n;
    const a = fibonacci(n - 1);
    const b = fibonacci(n - 2);
    return a + b;
}

Then I want a wrapper function to call fibonacci from another stack. Here it is:

fn new_stack(n: u32) u32 {
    const stack = std.heap.page_allocator.alloc(u8, 1024) catch unreachable;
    const stack_ptr = stack.ptr;

    const old_rbp = asm (
        "": [ret] "={rbp}" (-> [*]u8),
        ::
    );
    const old_rsp = asm (
        "": [ret] "={rsp}" (-> [*]u8),
        ::
    );
    const frame_data = old_rsp[0..@intFromPtr(old_rbp ) - @intFromPtr(old_rsp)];

    @memcpy(stack_ptr[0..frame_data.len], frame_data);
    
    const new_rsp = &stack_ptr[frame_data.len];

    asm volatile (
        \\ mov %%rbp, %%rax
        \\ mov %%rsp, %%rdx
        ::
        [stack_ptr] "{rax}" (stack_ptr),
        [new_rsp] "{rdx}" (new_rsp),
        : "rax", "rdx"
    );

    const r = fibonacci(n);

    asm volatile (
        \\ mov %%rbp, %%rax
        \\ mov %%rsp, %%rdx
        ::
        [old_rbp] "{rax}" (old_rbp),
        [old_rsp] "{rdx}" (old_rsp),
        : "rax", "rdx"
    );

    return r;
}

In the firtst two lines we create a new stack of size 1024 bytes. For people unfamiliar with Zig, std.heap.page_allocator.alloc works more or less like a C malloc, but with just a minimal wrap around the mmap syscall. An excellent post on allocators and why you would benefit from something different from malloc can be found here.

The next step is to retrieve the two stack pointers. I do it by injecting some assembly in the code.

Note: At the writing time of this post (2024-07-11) the asm structure is signaled as highly unstable, take the exact syntax with even more grains of salt than the rest of the article.

An assembly block in Zig (similarly to C) is made of four parts, separated by a colon:

A string: this is essentially the assembly code;
The output value: this expresses how values should cross the asm block from inside to the regular code;
The input values: this expresses how variables inside regular code should enter inside the asm block;
Closure: this is a list of registers that the asm block can change, such that the compiler can rely on the fact that registers outside this list are not invalidated during the executon of the asm code.

Thus the code

const old_rbp = asm (
    "": [ret] "={rbp}" (-> [*]u8),
    ::
);

contains only the output value, that moves the rbp register into the result of the asm block (thus the old_rbp variable), giving it a type of an array of bytes.

After the first two asm block, we have saved the values of the two stack pointers in the new_stack function stack frame.

const frame_data = old_rsp[0..@intFromPtr(old_rbp ) - @intFromPtr(old_rsp)];

With this line we are declaring an array of bytes in the locations corresponding to the stack frame of the new_stack function: it’s an array starting from rsp and going up to rbp.

Next we execute a memcpy to copy the current stack frame into the freshly allocated memory. It’s useful because we will want to have a working environment when we will return from the fibonacci function.

We also compute the end of the copied stack frame, and save it in the new_rsp variable.

Then we let magic happen:

asm volatile (
    \\ mov %%rbp, %%rax
    \\ mov %%rsp, %%rdx
    ::
    [stack_ptr] "{rax}" (stack_ptr),
    [new_rsp] "{rdx}" (new_rsp),
    : "rax", "rdx"
);

Lines starting with \\ are the cumbersome way to write multi line strings in Zig. In this block we have two input variables: stack_ptr is moved into register rax, new_rsp is moved into register rdx. We then moves these values into rbp and rsp.

Note: This code can be written much shortly by putting the variable values directly into their final registers. However it’s fun to explore the correct whole syntax, it’s a first time for me with Zig.

We can finally call the child function, fibonacci, and restore the stack to the original value, as nothing had happened. This last bit of code should be familiar by now.

Let’s try it!

pub fn main() !void {
    std.debug.print("fibonacci(12): {}\n", .{new_stack(12)});
}

By compiling this code and running it we get the correct result:

fibonacci(12): 144

Sounds like we got it!

Comments

Comments can be submitted at HackerNews: HERE.

Final Notes

As said in the beginning, this post is an encouragement for people to get on a keyboard and try to test funny things, in a light-hearted quest to understand the world where we live. I don’t expect the knowledge I gained in this little experiment to be useful in a near future. What’s important is the friends we made along the way.

PS. Do not try to compile the final code with ReleaseFast, and if you do, do not try to run the executable.

With love,

– Alessio

Alessio Marchetti