This blogpost is the result of an hour of funny mess-around that took place a few days ago just before starting the work day. I am no way an expert in this field, this is just couple of things that I found working when thinking about the concept of “why can’t I change the location of the stack in a program?”. I hope this post can encourage some people to experiment with low level code, not for any performance or intricate reasons, but because messing with “how does this work” is fun!
Every program compiled with a reasonable compiler runs putting some of its data in the so called
stack. It’s fairly easy: when you call a function, the function puts its data on top of the stack;
when the function returns the data is removed so that the stacks now exposes the data of the caller
function. In the x86 architecture stack entries (the so called stack frames) are managed with the
rbp
and rsp
registers. A more detailed introduction can be found here.
Note: all the code in this blogpost is written in Zig. However the same things can be replicated in every language that allows to write inline assembly in it. The C language is a very good example of that.
Okay, now let’s get our hands dirty. And GodBolt is a very nice place to cover your hands in the most muddy dirt.
We start with a simple function, it should be the default function when you open the website:
export fn square(num: i32) i32 {
return num * num;
}
Let’s have a look at the generated assembly now:
square:
push rbp
mov rbp, rsp
sub rsp, 16
mov dword ptr [rbp - 4], edi
imul edi, edi
mov dword ptr [rbp - 8], edi
seto al
jo .LBB0_1
jmp .LBB0_2
.LBB0_1:
movabs rdi, offset __anon_1406
mov esi, 16
xor eax, eax
mov edx, eax
movabs rcx, offset .L__unnamed_1
call example.panic
.LBB0_2:
mov eax, dword ptr [rbp - 8]
add rsp, 16
pop rbp
ret
I removed some parts because they are not really relevant to what we are going to do.
If it’s your first time reading some assembly code, here’s a few ideas on how it works:
square:
” are labels. They do nothing, but are useful to say things
like “go to label square
”, or “go to label LBB0_1
”.ret
), some do (like imul
).sub rsp, 16
means “Take the value that is in the rsp
register, subtract 16
, and then put the
result in the rsp
register again”.Now let’s dissect the assembly code, starting from the first instructions (1):
push rbp
mov rbp, rsp
sub rsp, 16
These lines manipulates the stack pointers. The rbp
registers holds a pointer to the start of the
current frame, while the rsp
register holds a pointer to the its end. Since the stack grows from
higher memory locations to lower memory locations, the value of rbp
is greater or equal to that of
rsp
.
The push
instruction “decrements the stack pointer and then stores the source operand on the top
of the stack” (see this). So the first line saves rbp
on
the top of the stack.
The mov
instruction overwrites the value of its first operand with the value of the second
operand. Thus rbp
now points to the end of the previous stack frame.
Finally the third lines subtracts 16 from rsp
, meaning that the new stack frame is going to have a
size of 16 bytes.
Let’s continue with a second batch of lines (2):
mov dword ptr [rbp - 4], edi
imul edi, edi
mov dword ptr [rbp - 8], edi
Here the operands of mov
are not registers but memory locations. The operand dword ptr [rbp - 4]
means “whatever the memory location (rbp - 4)
contains, of the size of a dword (which 4 bytes)”.
In other words we are moving the content of the edi
register in the first 4 bytes of the current
stack frame. Usually the edi
register contains the first argument of the function, that in this
case is the integer value num
in the original Zig code. Let’s notice that num
is of type i32
,
which has size exactly 4 bytes, a dword!
The imul
instruction works similarly to the sub
function, performing a signed integer
multiplication between the two operands and storing the result in the first operand.
The third instruction saves the result of the multiplication in a second slot on the stack frame.
We do not really care for the next set of three lines (3):
seto al
jo .LBB0_1
jmp .LBB0_2
It is used to handle the overflow in the multiplication. If the overflow happens, the program will
continue at the label LBB0_1
, otherwise the normal flow will proceed at the LBB0_2
label. More
details on the overflow path are left to the reader to figuere them out 😃.
Let’s have a look at the non-overflow path then (4):
mov eax, dword ptr [rbp - 8]
add rsp, 16
pop rbp
ret
The first line moves the result of the multiplication into the eax
register, which is commonly
used for the return values. The next two lines revert the changes to the stack pointer and base
stack pointer. The instruction add rsp, 16
closes the parenthesis opened at sub rsp, 16
and pop rbp
does the same thing with push rbp
. The stack is thus in the same condition as we started,
masking all the computations that happened since the beginning of the square
function.
A last instruction ret
gives back control to the caller function (see this).
What’s the point of all that stack juggling in such a simple function? Well… mostly none. In fact
let’s run an optimizing compiler by passing the -O ReleaseFast
argument to the Zig compiler.
The generated assembly will be much simple:
square:
mov eax, edi
imul eax, edi
ret
With this option we can see we also disabled the overflow check.
Having understood the basic mechanisms of the stack pointers we are almost ready to present our final piece of code.
We start with a dummy function, it’s not really important what it does, I just want it to fill a few stack frames.
fn fibonacci(n: u32) u32 {
if (n <= 1) return n;
const a = fibonacci(n - 1);
const b = fibonacci(n - 2);
return a + b;
}
Then I want a wrapper function to call fibonacci
from another stack. Here it is:
fn new_stack(n: u32) u32 {
const stack = std.heap.page_allocator.alloc(u8, 1024) catch unreachable;
const stack_ptr = stack.ptr;
const old_rbp = asm (
"": [ret] "={rbp}" (-> [*]u8),
::
);
const old_rsp = asm (
"": [ret] "={rsp}" (-> [*]u8),
::
);
const frame_data = old_rsp[0..@intFromPtr(old_rbp ) - @intFromPtr(old_rsp)];
@memcpy(stack_ptr[0..frame_data.len], frame_data);
const new_rsp = &stack_ptr[frame_data.len];
asm volatile (
\\ mov %%rbp, %%rax
\\ mov %%rsp, %%rdx
::
[stack_ptr] "{rax}" (stack_ptr),
[new_rsp] "{rdx}" (new_rsp),
: "rax", "rdx"
);
const r = fibonacci(n);
asm volatile (
\\ mov %%rbp, %%rax
\\ mov %%rsp, %%rdx
::
[old_rbp] "{rax}" (old_rbp),
[old_rsp] "{rdx}" (old_rsp),
: "rax", "rdx"
);
return r;
}
In the firtst two lines we create a new stack of size 1024 bytes. For people unfamiliar with Zig,
std.heap.page_allocator.alloc
works more or less like a C malloc
, but with just a minimal wrap
around the mmap
syscall. An excellent post on allocators and why you would benefit from something
different from malloc
can be found here.
The next step is to retrieve the two stack pointers. I do it by injecting some assembly in the code.
Note: At the writing time of this post (2024-07-11) the asm structure is signaled as highly unstable, take the exact syntax with even more grains of salt than the rest of the article.
An assembly block in Zig (similarly to C) is made of four parts, separated by a colon:
Thus the code
const old_rbp = asm (
"": [ret] "={rbp}" (-> [*]u8),
::
);
contains only the output value, that moves the rbp
register into the result of the asm block (thus
the old_rbp
variable), giving it a type of an array of bytes.
After the first two asm block, we have saved the values of the two stack pointers in the new_stack
function stack frame.
const frame_data = old_rsp[0..@intFromPtr(old_rbp ) - @intFromPtr(old_rsp)];
With this line we are declaring an array of bytes in the locations corresponding to the stack frame
of the new_stack
function: it’s an array starting from rsp
and going up to rbp
.
Next we execute a memcpy
to copy the current stack frame into the freshly allocated memory. It’s
useful because we will want to have a workign environment when we will return from the fibonacci
function.
We also compute the end of the copied stack frame, and save it in the new_rsp
variable.
Then we let magic happen:
asm volatile (
\\ mov %%rbp, %%rax
\\ mov %%rsp, %%rdx
::
[stack_ptr] "{rax}" (stack_ptr),
[new_rsp] "{rdx}" (new_rsp),
: "rax", "rdx"
);
Lines starting with \\
are the cumbersome way to write multi line strings in Zig. In this block we
have two input variables: stack_ptr
is moved into register rax
, new_rsp
is moved into register
rdx
. We then moves these values into rbp
and rsp
.
Note: This code can be written much shortly by putting the varieable values directly into their final registers. However it’s fun to explore the correct whole syntax, it’s a first time to me with Zig.
We can finally call the child function, fibonacci
, and restore the stack to the original value, as
nothing had happened. This last bit of code should be familiar by now.
Let’s try it!
pub fn main() !void {
std.debug.print("fibonacci(12): {}\n", .{new_stack(12)});
}
By compiling this code and running it we get the correct result:
fibonacci(12): 144
Sounds like we got it!
Comments can be submitted at HackerNews: HERE.
As said in the beginning, this post is an encouragement for people to get on a keyboard and try to test funny things, in a light-hearted quest to understand the world where we live. I don’t expect the knowledge I gained in this little experiment to be useful in a near future. What’s important is the friends we made along the way.
PS. Do not try to compile the final code with ReleaseFast
, and if you do, do not try to run the
executable.
With love,
– Alessio