What happens when you type an URL in your browser and press enter?
When you type an URL such as https://www.holbertonschool.com in your browser as expected you “end up” on Holberton School’s web site. More exactly you request the home page of Holberton School which eventually (a few milliseconds later) gets rendered on your screen. But between pressing the enter key and viewing the page in your web browser quite a few things happen…
Let’s dive in!
Theory First: the TCP/IP model
When sending data between 2 computers over a network you need to go through certain steps. And, in order for the communication to be consistent these steps needs to be standardized and therefore we use some common protocols. This way, all connected devices can talk to each others using the same language. With each steps of the process the data is transformed as information is added in headers before being physically sent over the network as binary impulse by mean of cables (electricity), optic fiber (light) or WiFi (electromagnetic waves).
The TCP/IP model is a theoretical model (named after the its 2 main protocols) which describes the steps and protocols needed for the transfer of data over a network from its logical representation to its physical implementation. In its most common form the model is composed of 4 layers: Application->Transport->Network->Network Access. The data is passed down through this chain in a process called encapsulation (referred in the intro of this section). With each layer information gets added until the information is actually sent over the physical network. Let’s look into more details at what’s happening.
▹ The Application layer (layer 4). NB: This is the top level of the model where most of what’s describe in the rest of the article happens including DNS lookup, handshakes and interaction with the application server. Firewalls and load-balancing can either be configured at application layer level or at the transport layer level. But let’s leave this for now. The important thing here is to understand that the data (in our case our request to get a web page) is transformed into machine code. The created byte code is called a Protocol Data Unit (PDU). It’s not ready to be sent yet. Therefore the PDU is passed down to:
▹ The Transport layer (layer 3). At this level the PDU is divided into segments to facilitate transportation. The protocol used to segment the data is called the Transmission Control Protocol. That’s the TCP in TCP/IP! It enforces that all segments must be present on the receiving end in order for the message to be fully received. If one or more are missing the segments are requested again. This protocol is used when the data integrity must be guarantied (which is in most cases). In the cases of video streams or audio calls the User Datagram Protocol (UDP) will be preferred as it is faster but does not guaranty the integrity of the message. In actual facts it translate by a quality loss in the stream. On top of that, each segment is given some additional information, a source port number and a destination port number. In our case the application (the source) is a web browser and will be given a random port number such as 5986 and the destination port will be 443 as the receiving server expects an https request, 443 being the reserved port for receiving https requests. At this stage the segments are passed down to the:
▹ The Network layer (layer 2). At this level the segments are turned into packets using the Internet Protocol (IP). This is where the sender’s and receiver’s IP addresses get added. The sender’s IP address trickled down from the application layer after the DNS lookup resolved the destination IP address (more on this in the next section) but is only added at this stage to create a packet. Then the packets are passed down to the:
▹ The Network Access layer (layer 1). At this layer a head and a tail are added to the packets to form frames. This is the final step of the encapsulation process. Information about the physical world is added such as the mac address of the sending machine and the mac address of the next link in the chain (most likely a router). The data is then transformed into bytes which will sent over the network as electrical pulsations and later transformed into light in the case of transmission via optic fiber.
At the other end the receiver will de-encapsulate the information following the layer chain in reverse order as exemplified by the below schema:
Anatomy of URL
URL stands for Uniform Resource Locator. It is a unique identifier that allows you to get some specific content hosted on a server (a file or a document). You can decompose it in 3 parts, a protocol (for instance http or https), a domain name and a path (to where on the serve the resource is hosted).
So when you type that URL in and press enter you are basically saying create a connection with the Holberton School web site using the https protocol and get me the file/page located at this location on the server. But how does the web browser know where the Holberton School server is on the internet?
Let’s take a closer look at DNS.
DNS lookup
Web browsers cannot find websites just with their names, or even a fully qualified URL such as the one in our example, they need an actual address, the address of the web-server where the website is hosted. We call this address an IP (Internet Protocol) address and it looks something like this 35.174.46.174
. This IP address is at the time of publication the current address of Holberton School website. It is susceptible to change therefore entering it directly in your browser will probably lead you no where but in principle it would take you straight to the website. You will get a warning because the connection is not secure (more on this later) but if you ignore it you will get Holberton School home page in your browser.
So if you have the IP address of a website you can go straight to it. DNS lookup is for all the other times when you don’t have the correct IP address of the server (99,99% of the time!).
So when you press enter the first thing that happens even before the actual DNS lookup is a cache lookup. Indeed if you have visited Holberton School recently your web browser has stored the address in its cache and doesn’t need to do DNS lookup before going to the next step. Let’s assume you haven’t been to this web site yet and that the browser doesn’t know of www.holbertonschool.com. Well then, your web browser will ask the Domain Name System (DNS), usually of your Internet Service Provider (ISP) if it knows the IP address associated with this URL. In a nutshell a DNS server works like a phone book. If the DNS server finds the IP address straight away, it sends it back to the browser. If it doesn’t it will recursively ask other DNS servers if they know the address of Holberton School web site. If the ISP DNS server doesn’t know the answer it will then forward the request to the local root server (1 of 13 world wide). If the local root server doesn’t know the answer it will forward the request to the Top Level Domain name server for .com
web site. If the .com
TLD server doesn’t find it will send it to holbertonschool.con
server which should have the answer and will send it back to the ISP DNS server which in turn sends it back to the client and caches the record to avoid a similar process when next asked for this web site address in the future.
Now that we know where to go we need to figure out the best way to get there safely…
Firewall(s)
So once we have a destination we need to connect to the destination server before any file can be requested and data flow between the client and the server. This process is called the TCP/IP 3 Ways Handshake and is immediately followed by another handshake called the TLS Handshake which we’ll describe in the next two sections. But in order for the handshakes to happen the server must be listening on specific ports (In our case 22 for SSL and 443 for https). And if the server has a firewall (as it should) the firewall need not block these ports. In other words the firewall needs to have explicit rules stating that the traffic is open on a port in order for the traffic to flow.
For example adding such rule on the popular Uncomplicated Firewall (ufw) software would look something like this:
sudo ufw allow 443/TCP
Also it’s important to note that firewall rules can get a lot more complicated than the above example and for example stop some machines to connect by specifying IP addresses individually or by IP range.
We mentioned earlier that the Firewall could be a layer 4 or layer 3 components. It simply means that a firewall can be a software installed on a server (layer 4) or an actual physical device that you configure manually (layer 3). The distinction is trivial for our purpose as they serve the same conceptual role.
We’ll talk about load balancing later, but basically it is a way to distribute http(s) requests across different back-end servers to account for more traffic. The only reason I mention this here is to illustrate that you can have added security and added firewalls between the load balancer and the back-end servers.
TCP/IP 3 Ways Handshake
The handshake takes place in the Transport layer (layer 3). Its sole purpose is to establish the connection. At this stage the connection is not secured yet. But because it is an https request the connection is initialized on port 443 as opposed to port 80 for a normal http request. The way it works is as follow… The client will set a header with a syn
(synchronization) bit on the segment and send it to the server which will respond with a syn
bit of its own and an ack
(acknowledgement) bit to answer the first syn
. The client will answer with its own ack
bit to finalize the synchronization between the 2 machines. Now the machine are ready to talk but no actual content making the web page has been sent yet.
TLS Handshake
Immediately after the TCP/IP handshake another handshake occurs on the same 443 port. This one is to ensure that the exchange of data between client and server can be done safely. That’s the ‘s’ in https for secure, which means that all traffic will be encrypted.
▹ So the client (the browser) sends a ‘hello’ message to the client with something called a cypher suite (all the encryption methods the client can use). The server then selects the encryption method among the ones it supports
▹ The server sends a ‘server hello’ to the client confirming the cypher suite that’s going to be used during the exchange. It also sends its certificate to confirm the server is who it says it is and prevent attacks such as ‘man in the middle’ attacks. The certificate contains quite a lot of information but most important is the server’s public key and some validation data that the browser can cross reference with the issuing certificate authority. Basically as soon as the browser receives the certificate it checks with the certificate authority if the certificate is legitimate or cease the connection if it’s not… At the stage a secure connection is established. But before any real data exchange can happen some optimizations are needed.
▹ We are now back on the client side. Now that the client has the public key of the server and the server the public key of the client. The back and forth could be encrypted using both public keys in an asymmetric way. But because asymmetric encryption is slow, it is much better to use the same key for both encryption and decryption. This is called symmetric encryption. The simplest way to achieve that is for the client to generate the symmetric key, encrypt it with the server’s public key and send it back to the server. The server will decrypt the symmetric key using its private key. In actual fact this process is more complex and secure as the symmetric key is never sent itself but instead something called a pre-master key. The pre-master key is a proto version of the key that the server uses to generate the actual key using some algorithms agreed upon when the cypher suite was sent. Therefore if the pre-master key is somehow intercepted it’s not the actual key. Along with the pre-master key 2 messages are sent, a ‘client finished’ message and a ‘change cypher spec’ message to state that we are switching to a symmetric encryption exchange.
▹ Back on the server side the symmetric is generated. The server sends back a ‘change cypher spec’ message and ‘server finish’ message.
▹ Finally the bulk of the exchange can start in a back and forth between the client and the server. All subsequent exchanges will be encrypted with the symmetric key…
The Actual request/response cycle
Now that we have fast and secure connection the actual request can happen.
Our URL https://www.holbertonschool.com is a simple GET request, meaning we just want to retrieve the content from the home page.
So first the request is encrypted with the symmetric key and sent to the server.
If there is a load-balancer, it will hit this front server first. In turn the load-balancer will distribute the request on the back-end servers using a distribution algorithm such as the Round Robin algorithm. In our case this is pretty straight forward because we are making only one request but imagine a 1000 requests happen over a couple of milliseconds, the load-balancer will split the load and distribute it to the different servers we have in the back…
On the back-end server you generally have a couple of software installed, namely a web-server and an app-server. The web-server is responsible for static content and the app-server responsible for dynamic content. For example if you request a page which doesn’t exist the web-server will send the 404 page straight back. On the other hand if you have some dynamically generated content the app-server needs to query the database first before the web-server generate the content.
Once the database is queried and the content retrieved It will generate the html and send along all the supporting files such as css, javascript and images files… Everything is encrypted and sent back to the client. It’s important to note as described in the TCP/IP model section that we have more than one back and forth here. Everything is broken into multiple peaces and happening over a very short period time. It’s like the response already starts to happen when the request is not finished yet…
Back into the Browser
Everything is ready now. We have the HTML and the content for the HTML, the Javascript and CSS file(s); the browser can now render the page.
The browser is going to generate a few things in order to display the page on your screen. This process is called the Critical Rendering Path as exemplified below
▹ First it creates the DOM (Document Object Model) from the HTML file creating nodes for each element in the HTML file and starts downloading the pictures if any.
▹ Very shortly after that it creates the CSSOM (Cascading Style Sheet Object Model) from the CSS file(s).
▹ Then the Javascript is loaded.
▹ Then the browser match the DOM with the CSSOM rules to create each node of the Render Tee
▹ The content is layout
▹ And finally the pixels painted on the page
▹ These last 2 steps are continuously repeated with each interaction of the user on the page registered through the Javascript which ‘redraw’ the layout all the time.
Conclusion
The above description might seem quite detailed but is only the tip of the iceberg!