Chapter 4 of 15

HTML & Semantic Markup

The foundation of everything on the web

Let’s move on to the first actual piece of web development technology: HTML.

But first, meet this guy:

tim berners-lee

This is Tim Berners-Lee, my personal hero. There are very few technologies where you can point at one person and say “he invented it,” but with the web, it’s actually true. Everyone involved agrees that Tim invented the web. He co-wrote the first web browser, the first web server, the first version of HTTP, and the first version of HTML, the language in which all web pages are written.

But it’s worth noting that what he had in mind when he invented that stuff isn’t quite what we ended up with. What he really had in mind was something more like Wikipedia — a two-way system, both display and editor, a human-readable database of knowledge where researchers could link their work together. Luckily for us, the web as implemented can host Wikipedia, but also billions of other websites that work quite differently.

The web relies on a single, foundational innovation that it didn’t invent: the hyperlink, which is much older. A hyperlink says: “this piece of text on this page is related to the information on this other page”. The entire web, in all its grand complexity, relies on that ability to jump from place to place within it using links. It is an astonishing thing, and truly wonderful to think about.

How do agents know what’s important?

The most meaningful demonstration of how astonishingly powerful the link tag is is to think about how agents know things. It turns out they overwhelmingly figure things out by reading the textual content of the web. And the way they sort out what’s true and what’s important on the web, just like search engines before them, is links.

If you tell a search engine or an LLM “tell me about Apple”, it will almost certainly tell you about Apple, the giant tech corporation. But there are hundreds of other companies named after apples, and there are also actual apples, the fruit. How does it know what you meant, bereft of any other context?

The answer is: the web. The web is full of links, and most of the time when somebody linked to the text “Apple”, or text that was nearby the word Apple, they were linking to that one specific company’s website. Each of those pages gets a vote, and across billions of web pages, that’s a lot of votes. It gets more complicated, of course — some websites themselves have a lot of people linking to them, so their votes count for more. But in essence, that’s the deal.

Agents know things because we encoded that knowledge into the web, and they read the web.

Semantic markup matters

But there is more to the web than just links. The A tag, the link, adds a single piece of meaning to your document: “this relates to that”. But there are hundreds of other HTML tags, and they also add meaning to your documents. Few of them are quite as mind-blowingly powerful as the link tag, but that doesn’t mean you should ignore them.

And the good news is you don’t have to. When you’re building a website, you are probably getting your agent to lay down the HTML structure for you. The agent knows exactly what’s going on inside of each tag, because it’s building the website. All you have to do is tell it “use semantic HTML”. That key phrase will get it to recode a mess of DIV and SPAN tags into useful, semantic tags that tell other agents what they are reading.

Accessibility and SEO

When you use semantic HTML, you also get a ton of accessibility for free. We’ll talk more about that later. You also get SEO for free, which is to say search engines and agents will rank your content higher because they understand it better.

It’s cheap to add, it’s valuable to have, it’s a no-brainer.

Now let’s talk about JavaScript.