How the Web works, in Amazing Detail!

31Jan - by Alan - 0 - In Web & Internet
That's Amazing!
That’s Amazing!

Visiting a web site seems so simple – click on a link in your web browser, and soon – within seconds we hope – an interesting web page reveals itself to us. What could be easier?

In fact, what happens ‘under the hood (or bonnet)’ is amazingly complex. It’s taken decades of computer and software development by many thousands of people to make the process of following that hyperlink (to give it its original technical name) to the information it represents. If we were to fully explain all that happens from the moment you clicked that link, we’d fill several books! Perhaps even an encyclopedia or two. You’d need to know: the difference between the Internet and the Web; How Computers work; Computer Science; Web design; Programming; Networks; Electronics; Physics; Mathematics… (I’ve probably left out something!). This essay can only scratch the surface of all that, but I’ll give you plenty of links to more detail so that you can find out as much as you want about this wonderful technology that we take so much for granted!

A Quick Overview of How the Web Works

So that you can see the forest before you get lost in the trees, I’m going to give you a dramatically over-simplified overview that has several mysterious acronyms and probably new concepts. Don’t worry if it doesn’t make a whole lot of sense just yet – it will if you hang in there! Let’s start by supposing that you want to see this wonderful cat picture (it will open in a new window). What happens is:

  1. You clicked on a link: http://tuxar.uk/long-path/cat-picture.jpg (it’s actually longer than that but I want to keep it simple!)
  2. Your browser split the link into three pieces: the protocol (HTTP), the domain name (tuxar.uk) and the path (/long-path/cat-picture.jpg).
  3. Your browser used the DNS system to convert the server’s user-friendly domain name (tuxar.uk) into my server’s IP address (104.28.25.51 today, might change).
  4. Your browser sent a connection request to my server’s IP address.
  5. Your browser sent my server an HTTP request asking for a copy of the image stored at /long-path/cat-picture.jpg
  6. My server found the requested image and returned it to your browser via an HTTP response.
  7. Your browser received and displayed the picture.
  8. Your browser dropped the connection to my server, terminating the session.
  9. You are admiring my cat, Nova.

HTTP = HyperText Transfer Protocol; DNS = Domain Name System. Next, we’ll break those steps down into much more detail!

Computer
Computer

To be reading this web page, you are using a web browser on some form of computer (desktop, notebook, tablet, smartphone, …). The browser is a software application.

Wikipedia:

A web page (webpage or Web page) is a document that is suitable for the World Wide Web and web browsers. A web browser displays a web page on a monitor or mobile device. The web page is what displays, but the term also refers to a computer file, usually written in HTML or comparable markup language. Web browsers coordinate the various web resource elements for the written web page, such as style sheets, scripts, and images, to present the web page.

Typical web pages provide hypertext that includes a navigation bar or a sidebar menu to other web pages via hyperlinks, often referred to as links.

On a network, a web browser can retrieve a web page from a remote web server. On a higher level, the web server may restrict access to only a private network such as a corporate intranet or it provides access to the World Wide Web. On a lower level, the web browser uses the Hypertext Transfer Protocol (HTTP) to make such requests.

A static web page is delivered exactly as stored, as web content in the web server's file system, while a dynamic web page is generated by a web application that is driven by server-side software or client-side scripting. Dynamic website pages help the browser (the client) to enhance the web page through user input to the server.

Wikipedia:

A web browser (commonly referred to as a browser) is a software application for retrieving, presenting and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier (URI/URL) that may be a web page, image, video or other piece of content. Hyperlinks present in resources enable users easily to navigate their browsers to related resources.

Although browsers are primarily intended to use the World Wide Web, they can also be used to access information provided by web servers in private networks or files in file systems.

The most popular web browsers are Google Chrome, Microsoft Edge (preceded by Internet Explorer), Safari, Opera and Firefox.

The Internet vs. The World Wide Web

The computer needs a connection to the Internet, usually wired or wireless. But note that the Internet and the World Wide Web are not the same thing! The Internet is like the road or highway system, it provides the ‘routes’ along which traffic can flow. The web is a particular kind of traffic on that highway system. Other kinds include email, file transfers, VoIP, etc.

How does the Internet really work? This clip lets you ride shotgun with a packet of data—one of trillions involved in the trillions of Internet interactions that happen every second. Look deep beneath the surface of the most basic Internet transaction, and follow the packet as it flows from your fingertips, through circuits, wires, and cables, to a host server, and then back again, all in less than a second.

Client-server model
Client-server model

The World Wide Web (aka ‘web’) is based on a client-server model. This simply means that a ‘client’ such as your web browser sends a request to a ‘server’ (specifically, a web server) somewhere out there on the Internet for some information, such as a web page, and if it successfully finds the page you asked for, it sends it back over the Internet. But just like sending letters or packages, you need an address to send your request to – a web address.

Wikipedia:

The World Wide Web (abbreviated WWW or the Web) is an information space where documents and other web resources are identified by Uniform Resource Locators (URLs), interlinked by hypertext links, and can be accessed via the Internet. English scientist Tim Berners-Lee invented the World Wide Web in 1989. He wrote the first web browser computer program in 1990 while employed at CERN in Switzerland. The Web browser was released outside of CERN in 1991, first to other research institutions starting in January 1991 and to the general public on the Internet in August 1991.

The World Wide Web has been central to the development of the Information Age and is the primary tool billions of people use to interact on the Internet. Web pages are primarily text documents formatted and annotated with Hypertext Markup Language (HTML). In addition to formatted text, web pages may contain images, video, audio, and software components that are rendered in the user's web browser as coherent pages of multimedia content. Embedded hyperlinks permit users to navigate between web pages. Multiple web pages with a common theme, a common domain name, or both, make up a website. Website content can largely be provided by the publisher, or interactive where users contribute content or the content depends upon the user or their actions. Websites may be mostly informative, primarily for entertainment, or largely for commercial, governmental, or non-governmental organisational purposes. In the 2006 Great British Design Quest organised by the BBC and the Design Museum, the World Wide Web was voted among the top 10 British design icons.

Web Addresses (URLs) vs. Internet Protocol (IP) Addresses

What’s a web address? You’ve probably seen it looking like this: Google.com or Microsoft.com or Linux.com or Tuxar.uk/linux However, these are the user-friendly short forms.You might have seen web addresses looking like this: http://example.com/ – your web browser sticks that ‘http://’ on the beginning to make it known further down the line that this is a web address.

You’ve probably also seen some stuff tacked on to the end, after the domain name, e.g. http://example.com:80/more/stuff  That last part is called the path, and the web server (the software on the computer that has the web page you want) needs that to know what to send back to you. This completed web address is called a Uniform Resource Locator (URL) or Uniform Resource Identifier (URI). The components are: protocol, domain name, port number, and path or query string, as shown in this table:

Protocol Domain Name Port Number Path/Query String
http :// example.com :80 /more/stuff
IP address
IP address

However, computers prefer numbers. And just as you can telephone someone (do people say that now?) if you know their phone number, so you can ‘call up’ a computer if your computer knows the other computer’s internet number – better known as an IP address (for Internet Protocol). Because the Internet is a global network of computers each computer connected to the Internet must have a unique address. Internet addresses are in the form nnn.nnn.nnn.nnn where nnn is a number from 0 – 255. This is the human-readable form, the actual IP address is stored in binary.

Wikipedia:

An IP address (abbreviation of Internet Protocol address) is an identifier assigned to each computer and other device (e.g., printer, router, mobile device, etc.) connected to a TCP/IP network that is used to locate and identify the node in communications with other nodes on the network. IP addresses are usually written and displayed in human-readable notations, such as 172.16.254.1 in IPv4, and 2001:db8:0:1234:0:567:8:1 in IPv6.

Version 4 of the Internet Protocol (IPv4) defines an IP address as a 32-bit number. However, because of the growth of the Internet and the depletion of available IPv4 addresses, a new version of IP (IPv6), using 128 bits for the IP address, was developed in 1995, and standardized as RFC 2460 in 1998. Its deployment commenced in the mid-2000s and is ongoing.

The IP address space is managed globally by the Internet Assigned Numbers Authority (IANA), and by five regional Internet registries (RIR) responsible in their designated territories for assignment to end users and local Internet registries, such as Internet service providers. Addresses have been distributed by IANA to the RIRs in blocks of approximately 16.8 million addresses each. Each ISP or private network administrator assigns an IP address to each device connected to its network. Such assignments may be on a static (fixed or permanent) or dynamic basis, depending on its software and practices.

Domain Name System
Domain Name System

The Internet has the equivalent of a phone directory, called the Domain Name System (DNS). So when you click a link, the computer has to find the IP address of the computer the link is referring to (Google, Microsoft, Linux, …). For now, we’ll call this the ‘DNS lookup‘. It’s similar, at the highest level, to looking up a phone number.

Wikipedia:

The Domain Name System (DNS) is a hierarchical decentralized naming system for computers, services, or other resources connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities. Most prominently, it translates more readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols. By providing a worldwide, distributed directory service, the Domain Name System is an essential component of the functionality of the Internet, that has been in use since 1985.

The Domain Name System delegates the responsibility of assigning domain names and mapping those names to Internet resources by designating authoritative name servers for each domain. Network administrators may delegate authority over sub-domains of their allocated name space to other name servers. This mechanism provides distributed and fault tolerant service and was designed to avoid a single large central database.

The Domain Name System also specifies the technical functionality of the database service that is at its core. It defines the DNS protocol, a detailed specification of the data structures and data communication exchanges used in the DNS, as part of the Internet Protocol Suite. Historically, other directory services preceding DNS were not scalable to large or global directories as they were originally based on text files, prominently the HOSTS.TXT resolver.

The Internet maintains two principal namespaces, the domain name hierarchy and the Internet Protocol (IP) address spaces. The Domain Name System maintains the domain name hierarchy and provides translation services between it and the address spaces. Internet name servers and a communication protocol implement the Domain Name System. A DNS name server is a server that stores the DNS records for a domain; a DNS name server responds with answers to queries against its database.

The most common types of records stored in the DNS database are for Start of Authority (SOA), IP addresses (A and AAAA), SMTP mail exchangers (MX), name servers (NS), pointers for reverse DNS lookups (PTR), and domain name aliases (CNAME). Although not intended to be a general purpose database, DNS can store records for other types of data for either automatic lookups, such as DNSSEC records, or for human queries such as responsible person (RP) records. As a general purpose database, the DNS has also been used in combating unsolicited email (spam) by storing a real-time blackhole list. The DNS database is traditionally stored in a structured zone file.

Here’s a practical exercise to familiarise you with DNS lookup. You can run a program on your computer to lookup a domain name: nslookup google.com (do this on your command line).

Domain Name System
Domain Name System

Protocols, Ports, & Packets

Now we need to know about protocols. A protocol is essentially a set of rules for sending data in or between computers. So if one computer is going to send a file to another, it might use the File Transfer Protocol (FTP). For web pages, we need the HyperText Transfer Protocol (HTTP). For email, there’s POP3 and SMTP.

Wikipedia:

In telecommunications, a communication protocol is a system of rules that allow two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. These are the rules or standard that defines the syntax, semantics and synchronization of communication and possible error recovery methods. Protocols may be implemented by hardware, software, or a combination of both.

Communicating systems use well-defined formats (protocol) for exchanging various messages. Each message has an exact meaning intended to elicit a response from a range of possible responses pre-determined for that particular situation. The specified behavior is typically independent of how it is to be implemented. Communications protocols have to be agreed upon by the parties involved. To reach agreement, a protocol may be developed into a technical standard. A programming language describes the same for computations, so there is a close analogy between protocols and programming languages: protocols are to communications what programming languages are to computations.

Multiple protocols often describe different aspects of a single communication. A group of protocols designed to work together are known as a protocol suite; when implemented in software they are a protocol stack.

Most recent protocols are assigned by the IETF for Internet communications, and the IEEE, or the ISO organizations for other types. The ITU-T handles telecommunications protocols and formats for the PSTN. As the PSTN and Internet converge, the two sets of standards are also being driven towards convergence.

Port Numbers: These are like telephone extensions. Programs that listen for messages from other computers are given ‘well-known’ port numbers, so that the receiving computer knows who it’s for. Web servers are usually on port 80. If it’s a secure transaction (protocol https) then the port number is usually 443. The port number gets put in after the domain name, like so: http://bing.com:80/ Searches usually have a path like this: http://www.bing.com/search?q=keyword (notice the ‘?’)

Wikipedia:

In the internet protocol suite, a port is an endpoint of communication in an operating system. While the term is also used for female connectors on hardware devices (see computer port), in software it is a logical construct that identifies a specific process or a type of network service.

A port is always associated with an IP address of a host and the protocol type of the communication, and thus completes the destination or origination network address of a communication session. A port is identified for each address and protocol by a 16-bit number, commonly known as the port number. For example, an address may be "protocol: TCP, IP address: 1.2.3.4, port number: 80", which may be written 1.2.3.4:80 when the protocol is known from context.

Specific port numbers are often used to identify specific services. Of the thousands of enumerated ports, 1024 well-known port numbers are reserved by convention to identify specific service types on a host. In the client–server model of application architecture, the ports that network clients connect to for service initiation provide a multiplexing service. After initial communication binds to the well-known port number, this port is freed by switching each instance of service requests to a dedicated, connection-specific port number, so that additional clients can be serviced. The protocols that primarily use ports are the transport layer protocols, such as the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP).

Ports were unnecessary on direct point-to-point links when the computers at each end could only run one program at a time. Ports became necessary after computers became capable of executing more than one program at a time and were connected to modern packet-switched networks.

Your request now has to go down into several layers of software to reach the Internet. These are the layers of the TCP/IP Protocol stack. They look like this:

Protocol Layer Description
Application Protocols Protocols specific to applications such as HTTP (web), SMTP (e-mail), FTP (file transfer), etc.
Transmission Control Protocol TCP directs packets to a specific application on a computer using a port number.
Internet Protocol IP directs packets to a specific computer using an IP address.
Hardware Converts binary packet data to network signals and back. (E.g. ethernet network card, modem for phone lines, etc.)
IP Packets
IP Packets

HTTP functions as a request-response protocol in the client-server computing model. A web browser, for example, may be the client and an application running on a computer hosting a web site may be the server. The client submits an HTTP request message to the server. The server, which provides resources such as HTML files and other content, or performs other functions on behalf of the client, returns a response message to the client. The response contains completion status information about the request and may also contain requested content in its message body.

Wikipedia:

Technical overview

URL beginning with the HTTP scheme and the WWW domain name label.

HTTP functions as a request–response protocol in the client–server computing model. A web browser, for example, may be the client and an application running on a computer hosting a website may be the server. The client submits an HTTP request message to the server. The server, which provides resources such as HTML files and other content, or performs other functions on behalf of the client, returns a response message to the client. The response contains completion status information about the request and may also contain requested content in its message body.

A web browser is an example of a user agent (UA). Other types of user agent include the indexing software used by search providers (web crawlers), voice browsers, mobile apps, and other software that accesses, consumes, or displays web content.

HTTP is designed to permit intermediate network elements to improve or enable communications between clients and servers. High-traffic websites often benefit from web cache servers that deliver content on behalf of upstream servers to improve response time. Web browsers cache previously accessed web resources and reuse them when possible to reduce network traffic. HTTP proxy servers at private network boundaries can facilitate communication for clients without a globally routable address, by relaying messages with external servers.

HTTP is an application layer protocol designed within the framework of the Internet protocol suite. Its definition presumes an underlying and reliable transport layer protocol,[1] and Transmission Control Protocol (TCP) is commonly used. However HTTP can be adapted to use unreliable protocols such as the User Datagram Protocol (UDP), for example in HTTPU and Simple Service Discovery Protocol (SSDP).

HTTP resources are identified and located on the network by Uniform Resource Locators (URLs), using the Uniform Resource Identifiers (URI's) schemes http and https. URIs and hyperlinks in HTML documents form inter-linked hypertext documents.

HTTP/1.1 is a revision of the original HTTP (HTTP/1.0). In HTTP/1.0 a separate connection to the same server is made for every resource request. HTTP/1.1 can reuse a connection multiple times to download images, scripts, stylesheets, etc after the page has been delivered. HTTP/1.1 communications therefore experience less latency as the establishment of TCP connections presents considerable overhead.

  1. ^ "Overall Operation". p. 12. sec. 1.4. RFC 2616. https://tools.ietf.org/html/rfc2616#section-1.4. 
Wikipedia:

The Internet protocol suite is the conceptual model and set of communications protocols used on the Internet and similar computer networks. It is commonly known as TCP/IP because the original protocols in the suite are the Transmission Control Protocol (TCP) and the Internet Protocol (IP). It is occasionally known as the Department of Defense (DoD) model, because the development of the networking model was funded by DARPA, an agency of the United States Department of Defense.

The Internet protocol suite provides end-to-end data communication specifying how data should be packetized, addressed, transmitted, routed and received. This functionality is organized into four abstraction layers which are used to sort all related protocols according to the scope of networking involved. From lowest to highest, the layers are the link layer, containing communication methods for data that remains within a single network segment (link); the internet layer, connecting independent networks, thus providing internetworking; the transport layer handling host-to-host communication; and the application layer, which provides process-to-process data exchange for applications.

Technical standards specifying the Internet protocol suite and many of its constituent protocols are maintained by the Internet Engineering Task Force (IETF). The Internet protocol suite model is a simpler model developed prior to the OSI model.

Your browser is the application in the top layer. The request will be sent in one or more packets, depending on its size. The packets may even go via different routes in the Internet, but each packet also contains a serial number so that they can be re-assembled once they have all arrived at the destination. At which point, they go back up the layers into the web server.

Wikipedia:

A network packet is a formatted unit of data carried by a packet-switched network. Computer communications links that do not support packets, such as traditional point-to-point telecommunications links, simply transmit data as a bit stream. When data is formatted into packets, packet switching is possible and the bandwidth of the communication medium can be better shared among users than with circuit switching.

A packet consists of control information and user data, which is also known as the payload. Control information provides data for delivering the payload, for example: source and destination network addresses, error detection codes, and sequencing information. Typically, control information is found in packet headers and trailers.

The TCP layer adds things like the source & destination port numbers; sequence number; etc. The IP layer adds things like source & destination IP addresses, etc. The final packet looks like this: [IP header][TCP header][application data]

Internet layering
Internet layering

Into The Internet!

So your request gets sent out from your computer, into the big scary Internet! Basically, the Internet is a lot of globally interconnected networks and their routers. The primary function of a router is to forward a packet toward its destination network, which is the destination IP address of the packet. To do this, a router needs to search the routing information stored in its routing table.

A routing table is a data file that is used to store route information about directly connected and remote networks. The routing table contains network/next hop associations. These associations tell a router that a particular destination can be optimally reached by sending the packet to a specific router that represents the “next hop” on the way to the final destination. The next hop association can also be the outgoing or exit interface to the final destination. The network/exit-interface association can also represent the destination network address of the IP packet. This association occurs on the router’s directly connected networks.

Wikipedia:

Basics

A routing table uses the same idea that one does when using a map in package delivery. Whenever a node needs to send data to another node on a network, it must first know where to send it. If the node cannot directly connect to the destination node, it has to send it via other nodes along a proper route to the destination node. Most nodes do not try to figure out which route(s) might work; instead, a node will send an IP packet to a gateway in the LAN, which then decides how to route the "package" of data to the correct destination. Each gateway will need to keep track of which way to deliver various packages of data, and for this it uses a Routing Table. A routing table is a database which keeps track of paths, like a map, and allows the gateway to provide this information to the node requesting the information.

With hop-by-hop routing, each routing table lists, for all reachable destinations, the address of the next device along the path to that destination: the next hop. Assuming that the routing tables are consistent, the simple algorithm of relaying packets to their destination's next hop thus suffices to deliver data anywhere in a network. Hop-by-hop is the fundamental characteristic of the IP Internetwork Layer[1] and the OSI Network Layer.

The primary function of a router is to forward a packet toward its destination network, which is the destination IP address of the packet. To do this, a router needs to search the routing information stored in its routing table.

A routing table is a data file in RAM that is used to store route information about directly connected and remote networks. The routing table contains network/next hop associations. These associations tell a router that a particular destination can be optimally reached by sending the packet to a specific router that represents the "next hop" on the way to the final destination. The next hop association can also be the outgoing or exit interface to the final destination.

The network/exit-interface association can also represent the destination network address of the IP packet. This association occurs on the router's directly connected networks.

A directly connected network is a network that is directly attached to one of the router interfaces. When a router interface is configured with an IP address and subnet mask, the interface becomes a host on that attached network. The network address and subnet mask of the interface, along with the interface type and number, are entered into the routing table as a directly connected network. When a router forwards a packet to a host, such as a web server, that host is on the same network as a router's directly connected network.

A remote network is a network that is not directly connected to the router. In other words, a remote network is a network that can only be reached by sending the packet to another router. Remote networks are added to the routing table using either a dynamic routing protocol or by configuring static routes. Dynamic routes are routes to remote networks that were learned automatically by the router, using a dynamic routing protocol. Static routes are routes to networks that a network administrator manually configured.

  1. ^ Requirements for IPv4 Routers, F. Baker, RFC 1812, June 1995

Illuminating my Web Server with a LAMP

Eventually your request reaches my web host (Pair) in the USA (I’m in the UK but I lived in the USA when I chose Pair). There’s a collection of programs there called a LAMP stack, for Linux, Apache, MySQL, and PHP. Linux is the computer operating system, Apache is the web server, MySQL is a database, and PHP is the programming language of the application that has overall responsibility for processing your request.

Wikipedia:

LAMP is an archetypal model of web service stacks, named as an acronym of the names of its original four open-source components: the Linux operating system, the Apache HTTP Server, the MySQL relational database management system (RDBMS), and the PHP programming language. The LAMP components are largely interchangeable and not limited to the original selection. As a solution stack, LAMP is suitable for building dynamic web sites and web applications.

Since its creation, the LAMP model has been adapted to other componentry, though typically consisting of free and open-source software. For example, an equivalent installation on the Microsoft Windows family of operating systems is known as WAMP.

My web server is Apache, the oldest and most popular web server software there is. A web server stores, processes and delivers web pages to clients. The communication between client and server takes place using the Hypertext Transfer Protocol (HTTP). Pages delivered are most frequently HTML documents, which may include images, style sheets and scripts in addition to text content.

LAMPP Architecture
LAMPP Architecture
Wikipedia:

A web server is a computer system that processes requests via HTTP, the basic network protocol used to distribute information on the World Wide Web. The term can refer to the entire system, or specifically to the software that accepts and supervises the HTTP requests.

It passes the request onto a more specialised software called WordPress (the PHP part of LAMP), which helps me to create, manage, and nicely present all my web pages. WordPress powers about 1/5 th of all the websites there are – more than any other software (known as a Content Management System, or CMS).

Wikipedia:

A content management system (CMS) is a computer application that supports the creation and modification of digital content. It is often used to support multiple users working in a collaborative environment.

CMS features vary widely. Most CMSs include Web-based publishing, format management, history editing and version control, indexing, search, and retrieval. By their nature, content management systems support the separation of content and presentation.

A web content management system (WCM or WCMS) is a CMS designed to support the management of the content of Web pages. Most popular CMSs are also WCMSs. Web content includes text and embedded graphics, photos, video, audio, maps, and program code (e.g., for applications) that displays content or interacts with the user.

Such a content management system (CMS) typically has two major components:

  • A content management application (CMA) is the front-end user interface that allows a user, even with limited expertise, to add, modify, and remove content from a website without the intervention of a webmaster.
  • A content delivery application (CDA) compiles that information and updates the website.

Digital asset management systems are another type of CMS. They manage things such as documents, movies, pictures, phone numbers, and scientific data. CMSs can also be used for storing, controlling, revising, and publishing documentation.

Based on market share statistics, the most popular content management system is WordPress, used by over 27% of all websites on the internet, and by 59% all websites using a known content management system. Other popular content management systems include Joomla and Drupal.

It looks into another piece of software called a database (specifically MySQL) to find out all the stuff needed to built the text of that web page (this text and links to the pictures, style sheetsJavaScript) and hands it back to the web server, which then sends it back to your computer.

welcomewikilite wikiurl=”http://en.wikipedia.org/wiki/Database” sections=”Short description” settings=””]

Now to be clear, it doesn’t send all the stuff needed for the web page in one go, it puts in links to all the bits and pieces that are needed. The images, JavaScript etc will be collected later, possibly from other computers, possibly from none because your computer already has them.

Your Computer Gets my Web Page

HyperText Markup Language
HyperText Markup Language

The beginnings of your web page arrive in your computer’s web browser. It might look like this:

<head>
<title>Just a Simple Web Page</title>
</head>
<body>
<p>This is a picture</p>
<img src="/images/picture.jpg" />
<p>This is a <a href="http://Tuxar.uk/">link</a></p>.
</body>

The words that appear in angle brackets are tags (or HTML elements). The content of the title tag is what will appear in your browser’s title bar (at the top). The content of the p (paragraph) tag is text that will appear in your web page. The content of the src attribute is a link to a picture. Your browser now has to fetch that. In this case it’s on the web site you got the page from, but if you’ve been to that web site recently, your browser may already have it in a cache (a memory of recent stuffs from the web). Once it has that it can display your web page fully (it may even have started before it had everything needed). There’s also a link to a website.

Wikipedia:

Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS) and JavaScript it forms a triad of cornerstone technologies for the World Wide Web. Web browsers receive HTML documents from a webserver or from local storage and render them into multimedia web pages. HTML describes the structure of a web page semantically and originally included cues for the appearance of the document.

HTML elements are the building blocks of HTML pages. With HTML constructs, images and other objects, such as interactive forms, may be embedded into the rendered page. It provides a means to create structured documents by denoting structural semantics for text such as headings, paragraphs, lists, links, quotes and other items. HTML elements are delineated by tags, written using angle brackets. Tags such as <img /> and <input /> introduce content into the page directly. Others such as <p>...</p> surround and provide information about document text and may include other tags as sub-elements. Browsers do not display the HTML tags, but use them to interpret the content of the page.

HTML can embed programs written in a scripting language such as JavaScript which affect the behavior and content of web pages. Inclusion of CSS defines the look and layout of content. The World Wide Web Consortium (W3C), maintainer of both the HTML and the CSS standards, has encouraged the use of CSS over explicit presentational HTML since 1997.

The HTML displayed above is trivial compared to the HTML in most web pages. For an idea of what it more typically looks like, right-click on this page and select View page source. In the early days of the web, we wrote HTML by hand! But our web pages were very simple, modern web pages are much more complex.

The picture on the left shows how long it takes to load elements for the Tuxar.uk homepage. We use several techniques to make it as fast as we can, such as by using a Content Distribution Network (CDN). This works by placing our content on several web servers around the world, so that it can be served from the closest web server (in the CDN) to you. It has a shorter distance to travel and fewer network hops to reach you, thus taking much less time – a couple of seconds instead of several if you’re on the other side of the world!

And that’s how the web works!

Cat Nova Richmond
Cat Nova Richmond

Resources

Leave a Reply