URI’s
This document is about naming things on the web. Maybe it should be called something else.
1 naming things
‘Tis but thy name that is my enemy.
Thou art thyself, though not a Montague.
What’s Montague? it is nor hand, nor foot,
Nor arm, nor face, nor any other part
Belonging to a man. O, be some other name!
What’s in a name? that which we call a rose
By any other name would smell as sweet.
So Romeo would, were he not Romeo called,
Retain that dear perfection which he owes
Without that title. Romeo, doff thy name,
And for that name which is no part of thee
Take all myself.
Romeo & Juliet
I’m collecting some thoughts here.
A group of attributes is what each substance here is known-as, they form its sole cash-value for our actual experience. The substance is in every case revealed through THEM; if we were cut off from THEM we should never suspect its existence; and if God should keep sending them to us in an unchanged order, miraculously annihilating at a certain moment the substance that supported them, we never could detect the moment, for our experiences themselves would be unaltered. Nominalists accordingly adopt the opinion that substance is a spurious idea due to our inveterate human trick of turning names into things. Phenomena come in groups—the chalk-group, the wood-group, etc.—and each group gets its name. The name we then treat as in a way supporting the group of phenomena. The low thermometer to-day, for instance, is supposed to come from something called the ‘climate.’ Climate is really only the name for a certain group of days, but it is treated as if it lay BEHIND the day, and in general we place the name, as if it were a being, behind the facts it is the name of. But the phenomenal properties of things, nominalists say, surely do not really inhere in names, and if not in names then they do not inhere in anything.
William James, Pragmatism, Lecture 3
http://www.gutenberg.org/ebooks/5116
And finally, there’s a bit in a Rich Hickey talk where he describes the psychology of perception. As I recall, he was arguing (after Alfred North Whitehead) that we don’t perceive mutable things as a flow, so much as a series of states. This served as a background for a discussion about names and references, in which identity is something we assign to a series of values, not something in which that thing “inheres” (to use James’ word).
All this ties rather mundanely to the way that URI’s allow us to name resources, and the way that HTTP tries to deal with changes to those things.
2 use a naked domain
This rule canonicalizes the www
-free form of all addresses. This is also known
as a “naked domain.” Is this wise? While there are of course arguments for and
against1, the only concensus is that you must be consistent.
That’s what this rule does.
It’s true as of of this writing that just about any “big” web site you can think
of uses www
. The only respectable counterexample is archive.org
. Since the
only thing I care about is that the addresses are archival, that’s good enough
for me. As for the other arguments—willshake will never use cookies.
Using the RedirectMatch
directive from Apache’s “alias”
module2, you permanently redirect from any www
address to
the equivalent non-www address. I do this by creating a separate VirtualHost
for the www
form, so it’s important that the non-www form not specify the www
form as a ServerAlias
, as you often see.
<VirtualHost *:80>
ServerName www.willshake.net
RedirectMatch permanent (.*) https://willshake.net$1
</VirtualHost>
Obviously, this is only for production.
3 canonicalize no trailing slash
Consider the two paths:
/plays/Ham
/plays/Ham/
They look pretty similar, right? Well, left to its own, willshake will serve exactly the same content for both. And that doesn’t bother me per se.
But with pre-rendering, the slash-free version will actually be served by
/plays/Ham/index.html
, which will be redirected by mod_dir
to /plays/Ham/
.
Supposedly this is good practice, because it canonicalizes the “right” form for
directories. But willshake doesn’t work like that. There’s no difference
between a “directory” and a “file” in willshake, there are only locations. It
makes no sense that adding child locations should change the canonical URL of
the parent (from no slash to slash), but that’s exactly how things work right
now with the site pre-rendered.
Bottom line, I hate those redirects. It puts me at conflict with myself, because I don’t (and won’t) write the URL’s with the slash, but since the server redirects them (for paths that do in fact have children, anyway), I’m telling Google the opposite.
So I noticed this because of pre-rendering the site, but it’s not really specific to that. It’s better practice to observe a canonical form, and I’m sure that if I must do so, then the slash-free form is it.
Finally, I’d like to fix this without resorting to mod_rewrite
, since I’d prefer
to avoid the use of that module altogether.
Footnotes:
See http://www.yes-www.org/ versus http://no-www.org/
“Apache mod_alias
”, Apache HTTP Server Documentation,
Version 2.4