The 4 steps of compilation in C

But what is compilation exactly?
In a nutshell, compilation is the act of taking some code in a certain language and turning it into code in an another language. In the case of C, we are transforming human readable code (also called high level) and turning it into machine code (also know as binary code) that the computer can interpret.
To compile a program you use a compiler such as gcc
on your human readable C file like so:
>$ gcc main.c -o main
The gcc
compiler will parse the main.c
file and turn it into an executable (a file you can run directly from the command line by invoking it). The option -o
after the name of the file allows to rename the compiled file into main
. Otherwise the compiled file will default to a not so meaningful name a.out
.
You can run you C program once compiled like so:
>$ ./a.out
Imagine the traditional example of hello_world.c
file with the following code in it:
#include <stdio.h>int main(void)
{
printf("Hello World\n");
return 0;
}
After compilation
>$ gcc hello_world.c -o hello
The file can be executed
>$ ./hello
And thanks to the printf
function will print Hello World
to the screen followed by a new line.
But before the message gets printed the C compiler goes through a 4 part process consisting of preprocessing, compilation, assembly and linking. Let’s discuss each steps into more details.
Preprocessing
In this stage the code from the different headers gets added to the program. in the case of our example the #inclued <stdio.h>
is added to the rest of the code base. All comments are striped and continuous lines finishing with a \
are joined.
to stop the compilation after this first stage you can pass the -E
option to gcc
like so:
gcc -E hello_world.c -o hello
Compilation
At this stage the compiler generates assembly code according to different specifications that you can set. For example this code will generate Intel assembly code:
>$ gcc -S -masm=intel hello_world.c
Assembly
At this stage the code is converted to object code also know as machine code or binary. This is the code that the computer will be able to interpret.
Linking
The linking phase is there to put some order into the object code and fetch any extra librairies or functionalities that might still be missing at this stage. In the case of the hello_world.c
example the puts
function binarie code is added in order to execute the printf
function.
And that’s it you C program is ready to be executed!