How Debuggers Work

In the last couple of days, I've been wondering how debuggers work and how can I implement a small debugger. I tried to watch some videos and read a few articles to understand better how they work. Then decided to write some notes on what I found.

Introduction

A debugger is an application that is used to test and debug other applications. Debuggers have a few functionalities or ‘common operations’, for example;

Breakpoints
Stepping
Viewing the stack trace/backtrace

To learn more about how debuggers work, we will need to learn a little bit about what ELF, DWARF, and ptrace are.

ELF

ELF is the main binary format for Linux objects and executables.

DWARF

DWARF is a debug info format. It uses a data structure called a Debugging Information Entry (DIE) to represent each variable, type, procedure, etc.

DWARF has a Line Number Table, which maps code locations to source code locations and vice versa, also specifies which instructions are part of function prologues and epilogues.

In assembly language programming, the function prologue is a few lines of code at the beginning of a function, which prepare the stack and registers for use within the function. Similarly, the function epilogue appears at the end of the function and restores the stack and registers to the state they were in before the function was called.

`ptrace`

#include <sys/ptrace.h>
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);

Is short for "process trace". It's a system call in Unix. By using ptrace a process can control another process. Therefore, a parent's process can manipulate the internal state of their child's process. For example; it can read and write memory, read and write registers, and allows tracing of its child process. ptrace is generally used by debuggers and other code analysis tools, mostly as aids to software development.

Breakpoints

According to Wikipedia, a breakpoint is an intentional stopping or pausing place in a program, put in place for debugging purposes. It is also sometimes simply referred to as a pause in the process's execution.

There are two types of breakpoints, hardware and software breakpoints.

Hardware Breakpoints

Hardware breakpoints only have four registers for writing breakpoint addresses into, which means they are limited in number. While using hardware breakpoints you'll set a special register which will result in a breakpoint. Then you can use other debug status registers to see the status of your breakpoints. Finally, you also have to debug control registers that you can use to control your breakpoint, so you can break on reading, writing, or executing.

Software Breakpoints

Software breakpoints, on the other hand, modify the running code in memory to introduce breaking instructions. Software breakpoints are unlimited so you can create as many breakpoints as you want but you can only set a breakpoint on executing an instruction.

Address Level Breakpoints

To learn more about how they work, let's take a look at this x86 Assembly;

55                     push %rsp
48 89 e5               mov %rsp, %rsp
48 83 ec 10            sub $0x10, %rsp

So if you want to set a breakpoint at the mov instruction in the second line and since we already know the address of this instruction, we will explicitly be telling the debugger to set a breakpoint right there.

What the debugger then does is it takes this first bite this 0x48 and it's going to set it off to the side remember it for later. The debugger is then going to replace the old value with 0xcc and this is an int3 instruction and this is going to trigger a software interrupt. So, it'll look like this.

cc 89 e5            int3

Finally, when the execution reaches this instruction, it'll stop and triggers an interrupt.

`int3` Instruction

If we took a look at what that int3 instruction does exactly in the linux kernel traps.c.

static void do_int3_user(struct pt_regs *regs)
{
    if (do_int3(regs))
        return;

    cond_local_irq_enable(regs);
    do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, 0, 0, NULL);
    cond_local_irq_disable(regs);
}

We will find out what we are calling a SIGTRAP which is the signal for traces and breakpoint traps. That means when this line gets executed a UNIX signal will get sent to our process and our debugger then will know that we hit a breakpoint.

Source Level Breakpoints

We just learned how to use address level breakpoints work however, it's more likely to use a breakpoint to break on a certain function. Therefore, now we will be learning how to do this work.

For example, let's say that we want to break on main. We will need to get the address of this function. Therefore, we will use the DWARF table that we talked about earlier to look up the function's prologue. Then we can just use that address to do an address level breakpoint just as before.

If we want to add a breakpoint to an overloaded function, we will use its mangled name to lookup the DWARF table.

Line Level Breakpoints

To add a breakpoint at a certain line. For example, adding a breakpoint at line 4. I.e. program.cpp:4 . In this case, we will use the Line Number Table information.

In some cases, we might have multiple entries for the same line. For example, if we are doing more than one instruction on the same line. Taking the following example;

int foo = 4; foo += 5; std::cout << foo;

In this case, the breakpoint will be set at the beginning of the statement.

Finally, we will get the corresponding address to the line from the Line Number Table and we will do the same as discussed in Address Level Breakpoints.

Stepping

There are four types of stepping. Stepping over an assembly instruction, over a function call, in a function, or out of of a function.

Instruction Step

To do an instruction step on the UNIX system you can use ptrace. For example, you can do;

ptrace(PTRACE_SINGLESTEP, debuggee_pid, nullptr, nullptr);

Step In & Step Over

For stepping in a function we will need to set two breakpoints:

The first one is at the return address. That's because in case you're not stepping into a function and you're turning out, you don't want to just continue and never gain control over your debuggee again.
The second one is at the next instruction in the callee.

The question here is how the debugger will know what the next instruction in the callee is.

Step Out

For stepping in a function we will just need to set a breakpoint in the return address.

Read & Write Memory

Sometimes the developers might need the debugger to do a memory dump. To do that we can use also use ptrace. For example;

auto data = ptrace(PTRACE_POKEDATA, debuggee_pid, address, nullptr);

The problem here is that ptrace only read one word at a time. So, doing this for a huge portion of data is inefficient. Because we do a syscall and we have a contact switch down into kernel mode and then we come back up every single word.

So the solution here will be using process_vm_readv(); which can do multi-word reading and writing.

Stack Unwinding

Sometimes we might want to get the entire call stack trace for a function. We can use the stack back pointer to get the previous stack frame and use the return address of the function to lookup the DWARF information and since DWARF has info about where the function address lives, we then can find all of the information about our last stack frame. We will keep doing that until the frame pointer reaches 0.

Conclusion

In this post, we learned some basic concepts on how debuggers work. Of course, there's a lot more to learn however I think this is enough, for now, to just have a basic understanding of how they work.

If you caught any incorrect or incomplete information in this post, please reach out to me on Twitter.

Resources

CppCon 2018: Simon Brand “How C++ Debuggers Work”: I couldn't have ever written this without the help of this amazing video.
DWARF - Wikipedia
ptrace - Wikipedia
ptrace - Linux man page
Debugger flow contorl: Hardware vs software breakpoints