https://www.recurse.com/blog/7-understanding-c-by-learning-assembly.

gdb

simple.c

int main()
{
    int a = 5;
    int b = a + 6;
    return 0;
}
gcc -g -O0 simple.c -o simple
gdb simple

Inside GDB, we’ll break on main and run until we get to the return statement. We put the number 2 after next to specify that we want to run next twice:

(gdb) break main
(gdb) run
(gdb) next 2

Now let’s use the disassemble command to show the assembly instructions for the current function. You can also pass a function name to disassemble to specify a different function to examine.

(gdb) disassemble
Dump of assembler code for function main:
0x0000000100000f50 <main+0>:    push   %rbp
0x0000000100000f51 <main+1>:    mov    %rsp,%rbp
0x0000000100000f54 <main+4>:    mov    $0x0,%eax
0x0000000100000f59 <main+9>:    movl   $0x0,-0x4(%rbp)
0x0000000100000f60 <main+16>:   movl   $0x5,-0x8(%rbp)
0x0000000100000f67 <main+23>:   mov    -0x8(%rbp),%ecx
0x0000000100000f6a <main+26>:   add    $0x6,%ecx
0x0000000100000f70 <main+32>:   mov    %ecx,-0xc(%rbp)
0x0000000100000f73 <main+35>:   pop    %rbp
0x0000000100000f74 <main+36>:   retq   
End of assembler dump.

In our example above, %eax and %ecx are general purpose registers, while %rbp and %rsp are special purpose registers. %rbp is the base pointer, which points to the base of the current stack frame, and %rsp is the stack pointer, which points to the top of the current stack frame. %rbp always has a higher value than %rsp because the stack starts at a high memory address and grows downwards.

If the pushing consumes all of the space allocated for the call stack, an error called a stack overflow occurs, generally causing the program to crash.

There is usually exactly one call stack associated with a running program (or more accurately, with each task or thread of a process), although additional stacks may be created for signal handling or cooperative multitasking (as with setcontext).

call stack layout.

Instructions in AT&T syntax are of the format mnemonic source, destination.

0x0000000100000f50 <main+0>:    push   %rbp
0x0000000100000f51 <main+1>:    mov    %rsp,%rbp

function prologue or preamble.

First we push the old base pointer onto the stack to save it for later. Then we copy the value of the stack pointer to the base pointer. After this, %rbp points to the base of main’s stack frame.

0x0000000100000f54 <main+4>:    mov    $0x0,%eax

The x86 calling convention dictates that a function’s return value is stored in %eax, so the above instruction sets us up to return 0 at the end of our function.

0x0000000100000f59 <main+9>:    movl   $0x0,-0x4(%rbp)

The parentheses let us know that this is a memory address. Here, %rbp is called the base register, and -0x4 is the displacement.

Because the stack grows downwards, subtracting 4 from the base of the current stack frame moves us into the current frame itself, where local variables are stored. This means that this instruction stores 0 at %rbp - 4

0x0000000100000f60 <main+16>:   movl   $0x5,-0x8(%rbp)
(gdb) x &a
0x7fff5fbff768: 0x00000005
(gdb) x $rbp - 8
0x7fff5fbff768: 0x00000005
0x0000000100000f73 <main+35>:   pop    %rbp
0x0000000100000f74 <main+36>:   retq

function epilogue: We pop the old base pointer off the stack and store it back in %rbp and then retq jumps back to our return address, which is also stored in the stack frame.

  • pushing a value (not necessarily stored in a register) means writing it to the stack.
  • popping means restoring whatever is on top of the stack into a register.

The instruction pointer is a register that stores the memory address of the next instruction. On x86_64, that register is %rip. We can access the instruction pointer using the $rip variable, or alternatively we can use the architecture independent $pc.

(gdb) x/i $pc
0x100000e94 <natural_generator+4>:  movl   $0x1,-0x4(%rbp)

display is like x, except it evaluates its expression every time our program stops:

(gdb) display/i $pc
1: x/i $pc  0x100000e94 <natural_generator+4>:  movl   $0x1,-0x4(%rbp)

Instead of next, which moves to the next source line, we’ll use nexti, which moves to the next assembly instruction.

(gdb) nexti
7           b += 1;
1: x/i $pc  mov   0x177(%rip),%eax # 0x100001018 <natural_generator.b>
(gdb) x $rbp - 0x4
0x7fff5fbff78c: 0x00000001
(gdb) x &a
0x7fff5fbff78c: 0x00000001

Instead, we have an instruction that loads 0x100001018, located somewhere after the instruction pointer, into %eax. GDB gives us a helpful comment with the result of the memory operand calculation and a hint telling us that natural_generator.b is stored at this address.

Even though the disassembly shows %eax as the destination, we print $rax, because GDB only sets up variables for full width registers.

(gdb) nexti
(gdb) p $rax
$3 = 4294967295
(gdb) p/x $rax
$5 = 0xffffffff

While variables have types that specify if they are signed or unsigned, registers don’t, so GDB is printing the value of %rax unsigned. Let’s try again, by casting %rax to a signed int:

(gdb) p (int)$rax
$11 = -1
(gdb) x/d 0x100001018
0x100001018 <natural_generator.b>:  -1
(gdb) x/d &b
0x100001018 <natural_generator.b>:  -1

This is because the value for b is hardcoded in a different section of the sample executable, and it’s loaded into memory along with all the machine code by the operating system’s loader when the process is launched.

static

Here.

  • A static variable inside a function keeps its value between invocations.
  • A static global variable or a function is “seen” only in the file it’s declared in

In the C programming language, static is used with global variables and functions to set their scope to the containing file. In local variables, static is used to store the variable in the statically allocated memory instead of the automatically allocated memory.

In C++, however, static is also used to define class attributes (shared between all objects of the same class) and methods

make

CFLAGS="-g -O0" make static