deploying the web site

In which I ship willshake.net. That’s right, no royal “we” here. You don’t get to do this.

1 motivation

Shipping is a feature. A really important feature. Your product must have it.1

Deployment should be a painless, one-step process. You should be so confident in that process that you would press the button and walk away, without even testing the result.

Yeah, never do that. But the goal of this document is to be so insanely scrupulous about the shipping process that you never think about it (until you decide to change it).

The deployment breaks down into two phases:

one-time setup
in which I initially configure the production server. If you’re reading this, it’s already done.
one-step update
in which I push the latest files to said server. If you’re reading this, it’s happened at least once, and it will happen again every time I decide that willshake isn’t quite finished yet.

2 one-time setup

This document is about deployment to a remote server, but most of this one-time setup is needed even on a local computer if you want to actually test this setup first. An alternative to this setup is the development server.

The production site is deployed on a Linode named Belarius running Debian 8.1. I set it up by following their guides, which are quite good. I went through

  1. Getting Started with Linode
  2. Securing Your Server, and
  3. Hosting a Website

pretty much verbatim, except that I skipped the parts about IPv6, PHP, and MySQL, which willshake doesn’t use. I also part ways with them about the basic web site setup, which follows.

2.1 the place

What’s the site we’re creating again?

domain="${1:-willshake.net}"

Oh yeah.

When you install Apache, it creates a place for your web sites called /var/www.

Usually, the files for a site will go there, in a directory named after the hostname.

the_place="$web_home/$domain"

That directory is owned by root, so you can only write to it as a superuser.

The standard setup includes a place for the published files, of course, as well as Apache’s logs.

sudo mkdir "$the_place/public"
sudo mkdir "$the_place/log"

Will the site really run, though? Maybe when you were doing express as root, but for vanilla Apache (as I’m using now), you need to do the www-data dance, right?

And that’s all you really need to do. You can now dump the site files into public, and the site will run!—I mean, I can dump the files there.

2.2 permissions

Right, but how am I going to dump the files there? That directory is owned by root, which makes all of my options more difficult.

Why? For one thing, I turned off root access on the server, as the security guide recommended. And having sudo on the remote machine doesn’t do you much good either unless you’re willing to poke other holes, which are arguably worse than root access.2

The Unix-style of permissions offers basically three levels of access (owner, group, and world) and three types of access (read, write, and execute). This scheme may seem impossibly simple to someone used to the more modern “Access Control Lists” (or ACL’s); to others, they may just seem impossible. These days, with ACL’s being widely available, people are recommending them more and more often for any purpose that strains the capabilities of the older system. For our purposes here, the Unix model will do.3

2.2.1 own the place

I’ll keep mine own, despite of all the world.
The Taming of the Shrew

Luckily, /var/www is

one of the rare directories where you have the privilege of deciding for yourself what to put in it and what permissions everything in it should have.4

Of course, when it comes to things that I know virtually nothing about, I simply love having the privilege of making critical decisions about them. And I agree with thomasrutter that

Most files should be writable by whichever user or group is going to be writing to them most.

Hey, that’s me!

You can set them to be owned by your user account.

Great, let’s do it! I mean, I own the place, right?

owner=`whoami`
sudo chown -R "$owner" "$the_place"

All right, now I can easily copy files to the site.

2.2.2 lock everyone else out

Falstaff: Banish plump Jack, and banish all the world!
Prince Hal: I do, I will
Henry the Fourth, Part 1

But when I do (copy files to the site), they will be created as new files in the system, and they will have the default permissions, which are way looser than necessary. That’s convenient in some ways, but it’s more secure to explicitly set exactly the permissions that you need, rather than relying on overly permissive policies.5

How can you control the permissions of files that haven’t been created yet? The “looser than necessary” defaults come from a bit mask that is applied to the permissions of all newly-created files and directories. You can set it with the umask command.

umask 0027

But umask only applies to the current process. If you want this to be in effect for later sessions (like, y’know, deployments), you’d have to configure the the server to set it whenever a session is created. That’s outside the scope of this script, but generally you can just add the umask command to the ~/.profile of the user who’d be making any updates.

grep -q '^umask 0027$' ~/.profile || echo 'umask 0027' >> ~/.profile

To apply the same policy to the site directories that were just created (or any files that previously existed there for whatever reason), you must apply the inverse of the mask.

chmod -R 750 "$the_place"

Again, that’s a pretty harsh policy.6 It means that newly-created files (including all of the site files, which I haven’t deployed yet) will have no “world” permissions at all. You’d have to be in a select group to get any kind of access.7

2.2.3 allow Apache

Why, didn’t that leave group read enabled? Or is it because of the directories? I did definitely get a 403 at this point in the setup.

In fact, even Apache itself can’t read the files in such a state. If I were to create an index.html in these conditions, it would return a 403 Forbidden to the public.

Apache runs as www-root (on Debian-based systems, anyway), and assigning your site directory to that group is the typical way to give it access.

sudo chgrp -R www-data "$the_place"

That’s good for the files that are already there, but what about files that get added later? For that, we need to set the “guid” bit on the directory, which will cause the group ownership of new files and directories to be inherited from its parent.8

sudo chmod -R g+s "$the_place"

Note that sudo is needed in both of the above cases even though I’m now the owner of those directories, because some systems won’t let you set the group to something other than yourself, even if you own them.

Finally, although the web site doesn’t collect any data, Apache does need write permission to write the site’s logs.

chmod g+w "$the_place"/log

2.3 configuration

Does this go here? In the web server, I (must) mention that at least one bit of configuration is needed, that must be put out of band. You have to do that setup to deploy, also.

2.4 Python and mod_wsgi

I’m not using express for production anymore. So you need to add the WSGI configuration here.

At this point, the web site works—/if/ you enter through the “home page.” That is, if you go to https://willshake.net, you’ll be able to use the site normally. But if you go directly to any inner pages (that is, without getting there by way of links starting with the home page) they’ll be 404 Not Found. That’s because the getflow server isn’t running.

There are several ways to install mod_wsgi. This is what I did:

su
apt-get update
apt-get install -y apache2-mpm-worker
apt-get install -y apache2-threaded-dev
apt-get install -y python-pip
apt-get install -y python-dev
pip install mod_wsgi

With that, mod_wsgi-express should be able to run. But getflow also needs python3 and lxml.

sudo apt-get install -y python3
sudo apt-get install -y python-lxml

That’s it. That’s all that’s required for willshake to run… except all of its files.

3 one-step update

And now, without further ado, the moment I’ve all been waiting for. Shipping.

Willshake consists entirely of files. Shipping it is just a matter of copying the files to the site.

default_remote='willshake.net:/var/www/willshake.net'
remote="${1:-$default_remote}"

options='
	--recursive
	--copy-links
	--delete-after
	--verbose
	--times
	--compress'

#rsync $options site/ $remote/public
# For big files, ignore date and just go by size.  Trying this to make sure
# unnecessary transfers of large media are avoided.
rsync $options --max-size 10m site/ $remote/public
rsync $options --min-size 10m --size-only site/ $remote/public

rsync $options \
	--exclude='*.pyc' \
	--exclude='httpd.conf.d' \
	server/ \
	$remote/server

The trailing slashes on site/ and server/ are important. Without it, rsync would create another site or server directory under their respective targets.

The server/ folder has the Apache configuration and getflow.

The script ships the copy of the script where it resides. It could be a half-baked mess, or it could be a good build from a clean checkout. It’s not this script’s job to make sure you’re shipping something good, just that you’re shipping. I mean, that I’m shipping.

Of course, this will only work if you’re authorized to ssh into willshake.net. As a user who owns the web site directory. I hope you’re not.

4 maintenance

In principle, the only maintenance should be to restart the server when configuration changes.

So yeah, this isn’t as “insanely scrupulous” as it could be, since you have to remember when you’re shipping configuration changes, and things could go wrong. In practice, the configuration doesn’t change very often.

Footnotes:

1

Joel Spolsky, “The Duct Tape Programmer”, Joel on Software, September 23, 2009

2

“rsync all files of remote machine over SSH without root user?” Unix & Linux StackExchange (2013) http://unix.stackexchange.com/a/92397

3

A decent summary of this system is at “chmod” on ss64.com.

4

“default permissions for /var/www”, AskUbuntu (2014) http://askubuntu.com/a/493401

5

The guidance here comes from “Maintained by a single user” section of the the canonical ServerFault question and answer, “What permissions should my website files/folders have on a Linux webserver?” http://serverfault.com/a/357109

6

For more on the umask command, see, e.g. the ArchLinux wiki. https://wiki.archlinux.org/index.php/Umask

7

Yep, sounds about like the world.

8

The “GUID” bit is often confused with the “SUID” bit, in spite of the man pages. See whether this answer to “How does the sticky bit work?” clarifies matters for you.

about willshake

Project “willshake” is an ongoing effort to bring the beauty and pleasure of Shakespeare to new media.

Please report problems on the issue tracker. For anything else, public@gavinpc.com

Willshake is an experiment in literate programming—not because it’s about literature, but because the program is written for a human audience.

Following is a visualization of the system. Each circle represents a document that is responsible for some part of the system. You can open the documents by touching the circles.

Starting with the project philosophy as a foundation, the layers are built up (or down, as it were): the programming system, the platform, the framework, the features, and so on. Everything that you see in the site is put there by these documents—even this message.

Again, this is an experiment. The documents contain a lot of “thinking out loud” and a lot of old thinking. The goal is not to make it perfect, but to maintain a reflective process that supports its own evolution.

graph of the program

about

Shakespeare

An edition of the plays and poems of Shakespeare.

the works