<base
/> ruins everythingVivien,
The <base /> tag indicates how relative URLs
should be resolved in a whole document. It has global
semantics, so there are complications with its use.
Fortunately, documents should not need it, because we have the
xml:base attribute, which is both simpler and more
versatile. The non-XML serializations of HTML cannot use it, so
it’s not deprecated yet. RDFa mandates that its semantics must
be respected, so we have a problem.
This document is best viewed with your browser’s Reader View.
I’m all for the interoperable social media. The fediverse is cool: it builds on ActivityPub to let people meme together while introducing limits to the risks developers take to interoperate with the big players. Unfortunately, the theory is not fully realized yet, and we are still investigating the reasons. My two cents: requiring everyone to compact (and thus, expand) JSON-LD is a burden, because the algorithms and their complexity are not clear, but most importantly, processing the objects requires fetching things across the web. It’s not a surprise that most developers trust the lie on the ActivityPub standard’s page:
JSON-LD documents and ActivityStreams can be understood as plain old simple JSON.
Appealing to JSON developers is not necessary. We need correct algorithms for processing the data, which means that the semantics must not depend on third-party servers, that may lie to me, to you, to people in some regions of the world, now, in the future, or in the past. Expanded JSON-LD is ugly, but you don’t have to fetch an extra resource to understand it. By ugly, I mean that it would be generated by a machine, and not authored by a human.
Interoperable social media can be understood by both machines or humans. Which one do you put first? The ugly-but-correct version of JSON-LD for sure puts machines first. Or it should, if it were implemented. I have found someone else who says it better than I could.
The indieweb is a separate movement to the fediverse. It does not really care about the actor model of ActivityPub, so the infrastructure basically looks like this: you write things on your site, and when you reply to someone, you send a webmention. If the page you reply to accepts it, your reply appears there. Neat.
This is a human-first approach. And it shows: you can have any HTML markup in your post or your reply (I guess it is sanitized when the original post includes your reply). This is great! We humans need markup languages. Look at this page: it has paragraphs, pointers, quotes, and code blocks.
The machine comes last. With the indieweb, you sprinkle your prose with microformats2, and then the machine understands what you are replying to, who you are, and generates the comment section.
Still, this cannot realize the promise of interoperability, because the markup is non-extensible. If we all abandoned the fediverse and went to the indieweb, let me risk a guess: we would get the same frustrations, because what this implementation calls a poll is nothing like what this other implementation understand, and so on. RDF would help here.
I want a human-first approach but with extensible markup and RDF for interoperability. Where do I start?
I like RDFa a lot, since it has been recommended to me: it is as easy as microformats2, fixes the JSON-LD leak of semantics, is human-first, neither unknown from the indieweb nor the fediverse. I want that! Let’s write a parser!
HTML is a good human-first markup language, and there is an extension to RDFa working with HTML. Let’s read it:
[Section 3.1 Additional RDFa Processing Rules:] The base can be
set using the base element.
I like the xml:base approach, but why not. Let’s
look what HTML5
says about the <base /> element.
You have to set it in a HTML head tag. It is global, so it works for the whole document. Using it looks like this:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<base href="https://example.org" />
</head>
<body>
<img src="toto.png" />
</body>
</html>
<base /> tag
Then, even if the page is hosted at some other URL, the image
will always be loaded from
https://example.org/toto.png.
It is more complex than xml:base
attribute, though. You really should be doing this instead:
<html xmlns="http://www.w3.org/1999/xhtml"
xml:base="https://example.org">
<head />
<body>
<img src="toto.png" />
</body>
</html>
xml:base attribute in the root
However, you cannot serialize the latter HTML tree with the
non-XML syntax. This is why you would use the <base
/> element.
Global semantics and including a reply of your post on your page do not compose very well. What if the indieweb microformats parser decides that this is my reply?
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<base href="https://example.org" />
</head>
<body>
<img src="toto.png" />
</body>
</html>
<base /> element.
It takes it, and appends it to the original post’s page.
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en">
<head>
<title>The Post with a Reply</title>
</head>
<body>
<p>This is the post! The reply is below.</p>
<html>
<head>
<base href="https://example.org" />
</head>
<body>
<img src="toto.png" />
</body>
</html>
</body>
</html>
What kind of beast is that? It is a well-formed XML file. It is invalid by HTML5’s standards, but it’s not clear exactly what that means: web browsers will happily display it. The RDFa extension says:
[Section 2.1 Document Conformance:] All HTML5 elements and attributes SHOULD be used in a way that conforms to html5.
This does not conform to HTML5, so RDFa and its all-caps “SHOULD” sends me its strongest glare, but I don’t care. Now, what’s the RDFa subject of this page?
Let’s see. There’s only one HTML base tag, no
xml:base attribute, so the default subject of
triples is the href of this base, right?
Web browsers have a strong incentive to display whatever invalid file you throw at them. A few decades ago, we were already told that showing an error message meant that both the users and the website would complain to the browser developer. So, browsers guess and invent things. Now that there is a quasi-monopoly on web browsers, though, the situation is a little different: Google can decide whatever it wants the web to look like by guessing when the standard is not specific enough, and if any of the 2 other browser developers guesses something different than Google, they get complains from everyone again. So, web standards get thinner and thinner (not better, just less specific) as the monopoly grows. Google invents more and more things, the others are compelled to follow at great expense while losing market share. The authors give up on having precision on what they mean.
How about I close that cringe digression and back to the main matter?
I expanded a bit on the simple example, see the explicitly invalid page. It will
tell you what the location.href is, and how links
are resolved on the page (before and after the <base
/> element). On my browser, for this case, the
location.href ignores the <base
/> element, but all links are resolved relative to it.
That’s not exactly what the HTML standard says, but it confirms
that a <base /> element anywhere on the page
changes link resolution. Technically, it means that, should I
copy it, I can’t parse RDFa in one pass, since a document-level
property may be set anywhere in the document.
Parsing microformats or RDFa is unusual: suddenly, I become the developer of a web user agent. Since the HTML5 living standard is insufficient, what do I do? Should I rebel and do something shocking or provocative? Do I accept the status quo?
Nah.
We take HTML5 too seriously. It’s just an excuse for a company
to brutalize everyone with its monopoly. In the upcoming XHTML2,
there’s no <base /> anyway, so I’ll just
pass: if I find a document with a <base />,
I’ll just ignore this element, and use xml:base
only.