The 4 steps of compilation in C

Paul Manot
3 min readSep 16, 2020
The C programming language has been around a long time…

But what is compilation exactly?

In a nutshell, compilation is the act of taking some code in a certain language and turning it into code in an another language. In the case of C, we are transforming human readable code (also called high level) and turning it into machine code (also know as binary code) that the computer can interpret.

To compile a program you use a compiler such as gcc on your human readable C file like so:

>$ gcc main.c -o main

The gcc compiler will parse the main.c file and turn it into an executable (a file you can run directly from the command line by invoking it). The option -o after the name of the file allows to rename the compiled file into main . Otherwise the compiled file will default to a not so meaningful name a.out .

You can run you C program once compiled like so:

>$ ./a.out

Imagine the traditional example of hello_world.c file with the following code in it:

#include <stdio.h>int main(void)
{
printf("Hello World\n");
return 0;
}

After compilation

>$ gcc hello_world.c -o hello

The file can be executed

>$ ./hello

And thanks to the printf function will print Hello World to the screen followed by a new line.

But before the message gets printed the C compiler goes through a 4 part process consisting of preprocessing, compilation, assembly and linking. Let’s discuss each steps into more details.

Preprocessing

In this stage the code from the different headers gets added to the program. in the case of our example the #inclued <stdio.h> is added to the rest of the code base. All comments are striped and continuous lines finishing with a \ are joined.

to stop the compilation after this first stage you can pass the -E option to gcc like so:

gcc -E hello_world.c -o hello

Compilation

At this stage the compiler generates assembly code according to different specifications that you can set. For example this code will generate Intel assembly code:

>$ gcc -S -masm=intel hello_world.c

Assembly

At this stage the code is converted to object code also know as machine code or binary. This is the code that the computer will be able to interpret.

Stylized representation of object code

Linking

The linking phase is there to put some order into the object code and fetch any extra librairies or functionalities that might still be missing at this stage. In the case of the hello_world.c example the puts function binarie code is added in order to execute the printf function.

And that’s it you C program is ready to be executed!

--

--