HTTP: The Foundation of the Web

Hypertext Transfer Protocol

HTTP is the protocol that powers the World Wide Web, enabling the retrieval and display of hypertext documents. It defines how messages are formatted and transmitted, and how web servers and browsers should respond to commands.

Browser
Web Server

HTTP is a request-response protocol where clients (like browsers) send requests to web servers, which then return responses.

HTTP Basics

Stateless Protocol

HTTP is stateless, meaning each request from a client to a server is treated as an independent transaction. The server doesn't retain session information about the client between requests. This simplicity contributes to HTTP's scalability but requires additional mechanisms (like cookies) to maintain state when needed.

Request-Response Model

HTTP follows a request-response model where:

  1. A client sends a request to a server
  2. The server processes the request
  3. The server returns a response

Each HTTP transaction consists of a request message and a response message, each with a specific format.

URL Structure

HTTP requests are directed to resources identified by URLs (Uniform Resource Locators). A typical URL has this structure:

http://www.example.com:80/path/to/resource?query=value#fragment
  • http:// - Protocol
  • www.example.com - Host (domain name)
  • :80 - Port (default is 80 for HTTP, 443 for HTTPS)
  • /path/to/resource - Path to the resource
  • ?query=value - Query parameters
  • #fragment - Fragment identifier

HTTP Methods

HTTP defines several methods (sometimes called "verbs") that indicate the desired action to be performed on a resource. The most common methods are:

MethodDescriptionUse Case
GETRequests a representation of the specified resourceRetrieving webpage content
POSTSubmits data to be processed to the specified resourceForm submissions, file uploads
PUTReplaces all current representations of the target resourceUpdating an entire resource
DELETERemoves the specified resourceDeleting resources
PATCHApplies partial modifications to a resourceUpdating part of a resource
HEADSame as GET but returns only headers, no bodyChecking if a resource exists

Idempotent Methods

Some HTTP methods (GET, HEAD, PUT, DELETE) are idempotent, meaning multiple identical requests should have the same effect as a single request. This is an important characteristic for reliable systems.

Safe Methods

GET and HEAD are considered "safe" methods because they don't change the state of the server or have side effects. These methods should only retrieve information.

HTTP Status Codes

HTTP status codes indicate the result of an HTTP request. They are grouped into five classes:

1xx - Informational

These status codes indicate a provisional response. The client should be prepared to receive one or more 1xx responses before receiving a regular response.

  • 100 Continue: The server has received the request headers and the client should proceed to send the request body.
  • 101 Switching Protocols: The server is switching protocols as requested by the client.

2xx - Success

These status codes indicate that the client's request was successfully received, understood, and accepted.

  • 200 OK: The request succeeded.
  • 201 Created: The request succeeded and a new resource was created.
  • 204 No Content: The request succeeded but there's no content to send in the response.

3xx - Redirection

These status codes indicate that further action needs to be taken by the client to complete the request.

  • 301 Moved Permanently: The requested resource has been permanently moved to a new URL.
  • 302 Found: The requested resource is temporarily located at a different URL.
  • 304 Not Modified: The client's cached version of the resource is still valid.

4xx - Client Error

These status codes indicate that the client seems to have made an error.

  • 400 Bad Request: The server cannot process the request due to a client error.
  • 401 Unauthorized: Authentication is required and has failed or not been provided.
  • 403 Forbidden: The client does not have access rights to the content.
  • 404 Not Found: The server cannot find the requested resource.

5xx - Server Error

These status codes indicate that the server failed to fulfill a valid request.

  • 500 Internal Server Error: The server encountered an unexpected condition.
  • 502 Bad Gateway: The server received an invalid response from an upstream server.
  • 503 Service Unavailable: The server is currently unavailable (overloaded or down for maintenance).

HTTP Versions & Evolution

HTTP/0.9 (1991)

The original version was extremely simple, with only one method (GET) and no headers or status codes. It could only transfer HTML files.

HTTP/1.0 (1996)

This version introduced headers, status codes, and content types, enabling the transfer of different media types. However, it opened a new connection for each request, which was inefficient.

HTTP/1.1 (1997)

The most widely used version for many years, HTTP/1.1 introduced:

  • Persistent connections (keep-alive) to reuse a single connection for multiple requests
  • Pipelining to send multiple requests without waiting for responses
  • Host header to support virtual hosting (multiple domains on one IP)
  • Content negotiation, allowing clients to request specific formats
  • Chunked transfer encoding for streaming data

Despite these improvements, HTTP/1.1 still suffered from head-of-line blocking, where a slow response could block subsequent requests.

HTTP/2 (2015)

A major revision focused on performance, HTTP/2 introduced:

  • Multiplexing: Multiple requests and responses over a single connection
  • Header compression: Reduced overhead
  • Server push: Servers can proactively send resources to clients
  • Binary protocol: More efficient parsing compared to text-based HTTP/1.1
  • Stream prioritization: Clients can indicate which resources are more important

HTTP/3 (2022)

The latest version builds on HTTP/2 but runs over QUIC (Quick UDP Internet Connections) instead of TCP. Key benefits include:

  • Improved performance on unreliable networks
  • Reduced connection establishment time
  • Elimination of head-of-line blocking at the transport layer
  • Better mobile performance with connection migration across networks
  • Built-in TLS encryption

Security: HTTPS

What is HTTPS?

HTTPS (HTTP Secure) is HTTP running over TLS/SSL, providing encrypted communication and secure identification of a web server. It protects against eavesdropping, tampering, and message forgery.

How HTTPS Works

  1. TLS Handshake: Client and server establish a secure connection through a process that includes:
    • Exchanging cryptographic information
    • Verifying the server's identity using certificates
    • Generating symmetric session keys for encryption
  2. Certificate Validation: The client verifies the server's SSL/TLS certificate is:
    • Issued by a trusted Certificate Authority (CA)
    • Valid (not expired or revoked)
    • Issued for the correct domain
  3. Encrypted Data Transfer: Once the secure connection is established, all HTTP messages are encrypted using the negotiated session keys.

Benefits of HTTPS

  • Confidentiality: Data is encrypted, preventing eavesdropping
  • Integrity: Data cannot be modified in transit without detection
  • Authentication: Proves the identity of the website you're connected to
  • SEO Advantage: Search engines favor HTTPS websites
  • Access to Modern Features: Some browser features (like geolocation) require HTTPS

HTTP Strict Transport Security (HSTS)

HSTS is a security policy mechanism that helps protect websites against protocol downgrade attacks and cookie hijacking. It instructs browsers to only connect to a website over HTTPS, even if the user tries to use HTTP.

Further Reading