GDB
https://www.recurse.com/blog/7-understanding-c-by-learning-assembly.
gdb
simple.c
int main()
{
int a = 5;
int b = a + 6;
return 0;
}
gcc -g -O0 simple.c -o simple
gdb simple
Inside GDB, we’ll break
on main
and run
until we get to the return statement. We put the number 2 after next
to specify that we want to run next
twice:
(gdb) break main
(gdb) run
(gdb) next 2
Now let’s use the disassemble
command to show the assembly instructions for the current function. You can also pass a function name to disassemble
to specify a different function to examine.
(gdb) disassemble
Dump of assembler code for function main:
0x0000000100000f50 <main+0>: push %rbp
0x0000000100000f51 <main+1>: mov %rsp,%rbp
0x0000000100000f54 <main+4>: mov $0x0,%eax
0x0000000100000f59 <main+9>: movl $0x0,-0x4(%rbp)
0x0000000100000f60 <main+16>: movl $0x5,-0x8(%rbp)
0x0000000100000f67 <main+23>: mov -0x8(%rbp),%ecx
0x0000000100000f6a <main+26>: add $0x6,%ecx
0x0000000100000f70 <main+32>: mov %ecx,-0xc(%rbp)
0x0000000100000f73 <main+35>: pop %rbp
0x0000000100000f74 <main+36>: retq
End of assembler dump.
In our example above, %eax
and %ecx
are general purpose registers
, while %rbp
and %rsp
are special purpose registers
. %rbp
is the base pointer
, which points to the base of the current stack frame, and %rsp
is the stack pointer
, which points to the top of the current stack frame. %rbp
always has a higher value than %rsp
because the stack starts at a high memory address and grows downwards.
If the pushing consumes all of the space allocated for the call stack, an error called a stack overflow occurs, generally causing the program to crash.
There is usually exactly one call stack associated with a running program (or more accurately, with each task or thread of a process), although additional stacks may be created for signal handling or cooperative multitasking (as with setcontext).
.
Instructions in AT&T syntax are of the format mnemonic source, destination
.
0x0000000100000f50 <main+0>: push %rbp
0x0000000100000f51 <main+1>: mov %rsp,%rbp
function prologue or preamble.
First we push the old base pointer onto the stack to save it for later. Then we copy the value of the stack pointer to the base pointer. After this, %rbp
points to the base of main’s stack frame.
0x0000000100000f54 <main+4>: mov $0x0,%eax
The x86 calling convention dictates that a function’s return value is stored in %eax
, so the above instruction sets us up to return 0 at the end of our function.
0x0000000100000f59 <main+9>: movl $0x0,-0x4(%rbp)
The parentheses let us know that this is a memory address. Here, %rbp
is called the base register, and -0x4
is the displacement.
Because the stack grows downwards, subtracting 4 from the base of the current stack frame moves us into the current frame itself, where local variables are stored. This means that this instruction stores 0 at %rbp - 4
0x0000000100000f60 <main+16>: movl $0x5,-0x8(%rbp)
(gdb) x &a
0x7fff5fbff768: 0x00000005
(gdb) x $rbp - 8
0x7fff5fbff768: 0x00000005
0x0000000100000f73 <main+35>: pop %rbp
0x0000000100000f74 <main+36>: retq
function epilogue: We pop
the old base pointer off the stack and store it back in %rbp
and then retq
jumps back to our return address, which is also stored in the stack frame.
pushing
a value (not necessarily stored in a register) means writing it to the stack.popping
means restoring whatever is on top of the stack into a register.
The instruction pointer is a register that stores the memory address of the next instruction. On x86_64, that register is %rip
. We can access the instruction pointer using the $rip
variable, or alternatively we can use the architecture independent $pc
.
(gdb) x/i $pc
0x100000e94 <natural_generator+4>: movl $0x1,-0x4(%rbp)
display
is like x
, except it evaluates its expression every time our program stops:
(gdb) display/i $pc
1: x/i $pc 0x100000e94 <natural_generator+4>: movl $0x1,-0x4(%rbp)
Instead of next
, which moves to the next source line, we’ll use nexti
, which moves to the next assembly instruction.
(gdb) nexti
7 b += 1;
1: x/i $pc mov 0x177(%rip),%eax # 0x100001018 <natural_generator.b>
(gdb) x $rbp - 0x4
0x7fff5fbff78c: 0x00000001
(gdb) x &a
0x7fff5fbff78c: 0x00000001
Instead, we have an instruction that loads 0x100001018
, located somewhere after the instruction pointer, into %eax
. GDB gives us a helpful comment with the result of the memory operand calculation and a hint telling us that natural_generator.b
is stored at this address.
Even though the disassembly shows %eax
as the destination, we print $rax
, because GDB only sets up variables for full width registers.
(gdb) nexti
(gdb) p $rax
$3 = 4294967295
(gdb) p/x $rax
$5 = 0xffffffff
While variables have types that specify if they are signed or unsigned, registers don’t, so GDB is printing the value of %rax
unsigned. Let’s try again, by casting %rax
to a signed int:
(gdb) p (int)$rax
$11 = -1
(gdb) x/d 0x100001018
0x100001018 <natural_generator.b>: -1
(gdb) x/d &b
0x100001018 <natural_generator.b>: -1
This is because the value for b
is hardcoded in a different section of the sample executable, and it’s loaded into memory along with all the machine code by the operating system’s loader when the process is launched.
static
Here.
- A static variable inside a function keeps its value between invocations.
- A static global variable or a function is “seen” only in the file it’s declared in
In the C programming language, static is used with global variables and functions to set their scope to the containing file. In local variables, static is used to store the variable in the statically allocated memory instead of the automatically allocated memory.
In C++, however, static is also used to define class attributes (shared between all objects of the same class) and methods
make
CFLAGS="-g -O0" make static