Marouane.B

you've typed a url thousands of times. hit enter, page loads, move on with your life. but between that keystroke and the rendered webpage, there's an entire symphony of protocols, lookups, and transformations happening in milliseconds.

this is the journey of a url from the moment you press enter to the moment pixels appear on your screen. we'll skip the rabbit holes and focus on what actually matters for understanding the flow.

what makes a url "uniform"

before we follow the journey, let's understand what we're working with. url stands for "uniform resource locator," and that word "uniform" is doing heavy lifting.

uniform means every url follows the same formal specification (RFC 3986 if you're into that sort of thing):

scheme://authority/path?query#fragment

here's a real example:

https://api.ninefive.com:443/users/1?active=true#profile

let's break it down:

https - the protocol (scheme) we're using to fetch the resource
api.ninefive.com:443 - where we're connecting (authority = domain + optional port)
/users/1 - the path to the specific resource on that server
?active=true - optional query parameters
#profile - optional fragment (which section of the page to jump to)

the authority part (api.ninefive.com:443) can actually omit the port if you're using the default. https://api.ninefive.com:443 is identical to https://api.ninefive.com because 443 is https's default port.

why "locator" matters

urls are a specific type of uri (uniform resource identifier). the "locator" part means a url answers two critical questions:

what protocol do i use to fetch this?
where exactly do i connect?

compare that to a urn (uniform resource name) like urn:isbn:0471694703. this uniquely identifies a book, but it doesn't tell you where to get it or how to retrieve it. that's the difference between identification (uri/urn) and location (url).

the encoding problem http created

here's where things get interesting. when you type a url with a space - say, searching for "hello world" - your browser needs to send this to a server:

GET /search?q=hello world HTTP/1.1
Host: google.com

but http has a problem. it was designed in 1991 to be human-readable using ascii characters. that's great for debugging, but ascii only supports 128 characters. no arabic, no japanese, no emojis. and crucially, spaces are ambiguous.

look at that request again. where does the url end? is it /search?q=hello or /search?q=hello world? how does the server know where HTTP/1.1 begins?

the solution is percent encoding. your browser automatically converts problematic characters:

// what you type
"https://google.com/search?q=hello world"

// what actually gets sent
"https://google.com/search?q=hello%20world"

the space becomes %20. any character that's not ascii-safe or might be reserved gets encoded. this is why you see those weird % symbols in urls sometimes - it's not broken, it's working exactly as designed.

dns: translating names to numbers

now you have a properly formatted url: https://api.ninefive.com/users/1. but there's a problem. your browser needs an ip address to establish a tcp connection. domain names like api.ninefive.com don't exist at the network layer - they're a human convenience.

this is where dns (domain name system) comes in. it's basically a distributed phone book that translates domain names to ip addresses.

but before asking the entire internet, your system checks a few local caches:

level 1: browser cache

your browser keeps a cache of recent dns lookups. if you visited api.ninefive.com five minutes ago and the ttl (time to live) hasn't expired, it uses that cached ip address. instant lookup, zero network requests.

level 2: operating system cache

if the browser doesn't have it, it asks the os. your operating system maintains its own dns cache. still no network request needed.

level 3: the hosts file

before going to the network, the os checks /etc/hosts (on unix-like systems). this is a local file where you can manually map domain names to ip addresses:

# /etc/hosts
127.0.0.1       localhost
192.168.1.100   mydevserver.local

developers use this for local testing. point myapp.local to 127.0.0.1 and you can test your app locally with a real-looking domain name.

level 4: recursive dns resolver

if none of the caches have it, your os finally makes a network request to a recursive dns resolver (usually provided by your isp or router).

this resolver doesn't know the answer either. so it walks through the global dns hierarchy:

asks a root nameserver: "where do i find .com domains?"
root server responds: "ask the .com tld (top-level domain) servers"
asks .com tld server: "where do i find ninefive.com?"
tld server responds: "ask ninefive's authoritative nameservers"
asks ninefive's nameserver: "what's the ip for api.ninefive.com?"
authoritative nameserver responds: "104.21.82.120, and cache this for 300 seconds"

the resolver caches the result, returns it to your os, which caches it, which returns it to your browser, which caches it. next time you visit, all those steps get skipped until the ttl expires.

establishing the connection

now your browser has an ip address: 104.21.82.120. time to connect.

tcp three-way handshake

before any http traffic flows, your browser needs to establish a reliable connection using tcp:

syn: browser sends a synchronize packet to the server
syn-ack: server acknowledges and sends its own synchronize packet back
ack: browser acknowledges the server's packet

this three-way handshake ensures both sides are ready to communicate. it adds latency (roughly one round-trip time), but it guarantees reliability.

tls handshake (for https)

since we're using https://, there's an additional step. before sending any actual data, the browser and server need to establish an encrypted connection using tls (transport layer security):

browser says "hello, here are the encryption methods i support"
server says "hello, let's use this method, and here's my certificate"
browser verifies the certificate with a certificate authority
both sides generate encryption keys
connection is now encrypted

this is why https is slower than http on the first request - there's an extra round trip for the tls handshake. but subsequent requests can reuse the same encrypted connection.

sending the http request

finally, we can send the actual http request. your browser constructs something like:

GET /users/1?active=true HTTP/1.1
Host: api.ninefive.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)
Accept: text/html,application/json
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Connection: keep-alive

this travels over the encrypted tcp connection to the server.

server processing

on the other side, the server:

receives and decrypts the request
parses the http headers
routes /users/1 to the appropriate handler
checks authentication/authorization
queries the database for user 1 where active=true
serializes the result (probably to json)
constructs an http response

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 157
Cache-Control: max-age=300

{
  "id": 1,
  "name": "marouane",
  "active": true,
  "email": "boufaroujmarouan@gmail.com"
}

browser rendering

the browser receives this response and:

parses the http headers (content-type tells it this is json)
decompresses the body if it was gzip-encoded
if it's html, parses it into a dom tree
discovers referenced resources (css, javascript, images)
makes additional requests for those resources (going through this entire process again for each one)
applies css styling
executes javascript
paints pixels to your screen

the #profile fragment from our original url? that only matters now, at the very end. the browser scrolls to the element with id="profile" after rendering.

the reality check

all of this - dns lookup, tcp handshake, tls negotiation, http request, server processing, http response, parsing, rendering - happens in milliseconds.

next time you type a url and hit enter, you'll know there's an entire choreographed dance happening behind that loading spinner. and maybe you'll appreciate why "the internet is slow" when you're on a bad connection - every one of those round trips adds latency.

this is a simplified version of a complex process. i intentionally skipped details about http/2 multiplexing, dns over https, cdn routing, load balancing, and about a dozen other topics that would turn this into a book. the goal was understanding the main flow, not documenting every possible edge case.

from typing a url to seeing a webpage: what really happens in between