This document implements the audio-specific features of a media subsystem.

1 motivation

Nothing is quite as immersive as sound. Sound adds a dimension to a space in a way that no visual projection can do. The sense of presence and motion that can be created by playing back a simple two-channel audio recording is undeniable.

Weaving sounds together with other media can have a multiplier effect, giving the whole representation a new level of power.

The objective of this subsystem is to make it easy to collect and use audio recordings—especially from the public domain—along with other media.

2 programs

An audio program is a recording, plus everything known about it. In other words, a program is the thing defined by an audio record. Some of the information isn’t in the record itself, but in other files that it points to.

Currently only single-file programs are supported, but there are multi-file programs already on the radar, so this will need to be dealt with some way.

A program can comprise one or more audio files.

Its record in the catalog will indicate what it’s a recording of, and anything else I want to define.

3 audio records

Audio records are stored in the catalog. Each recording gets its own file.

3.1 notes on structure

package/program: when there are several audio files in a group

cues: associate times in the audio with play/scene/anchor, see audio-cues.xml/rnc

3.2 bundle the audio

A very similar thing is done for images.

Of course, to use the audio records, it’ll help to have them in a single file..

: $(PROGRAM)/audio/bundle_audio_records $(AUDIO_RECORDS)/* \
|> ^ bundle audio records ^ %f > %o \
|> $(ROOT)/data/audio/audio.xml
BEGIN { print "<audio-index>" }
FNR == 1 {
	# Extract the name from amid the source type prefix, format, and
	# extension.
	match(FILENAME, /\w+-([^/]+)\.[^.]+\.xml$/, name)
	printf "<audio key=\"%s\">\n", name[1]
{ print }
ENDFILE { print "</audio>" }
END { print "</audio-index>" }

4 getting audio

This is simliar to the process that is used for images. Right now, all audio comes from the Internet Archive, so I don’t bother with non-catalog types.

The other difference is that an additional intermediate file is used for storing the location of the catalog metadata, just as is done for the download location.

: foreach $(AUDIO_RECORDS)/catalog-*.* \
  | $(PROGRAM)/audio/* \
|> ^o get audio metadata location %B^ \
   $(PROGRAM)/audio/get-audio-metadata-location %f > %o \

Of course, this adds another node to the build graph for every file, but the upside is that it lets you separate unrelated things, namely the extraction of the metadata location and the downloading of the metadata.

match($0, /<from .*archive-item="(.*?)"/, m) {
	key = m[1]
	print "" key "/" key "_files.xml"

I may switch to something like that for images, despite the fact that it means thousands of more little files, because it decomposes the steps more clearly and it’s a stronger way to prevent repeat downloads.

The above may be inadequate for some inputs, i.e. those with problematic characters in the key (so as to require XML decoding or URL escaping). I haven’t seen any on Internet Archive, but I don’t know if that’s a matter of policy.

It also makes some assumptions about the XML in the record, but hey, they’re my records, so I can do that.

The following version is “safer” but a little slower.

import sys
from lxml import etree

_, record = sys.argv            # Can't iterparse from stdin (AFAIK)

# `recover` is needed because this is a fragment
for event, element in etree.iterparse(record, tag='from', recover=True):
    key = element.get('archive-item')
    print("{}/{}_files.xml".format(key, key))

With that, getting the actual metadata is trivial.

: foreach $(AUDIO_METADATA_LOCATION)/*.txt \
  | $(PROGRAM)/get-resource \
|> ^ get audio metadata %B ^ \
   $(PROGRAM)/get-resource `cat %f` %o "%B__metadata" \
: foreach $(AUDIO_METADATA)/* \
  | $(PROGRAM)/audio/get-audio-location \
|> ^o get audio location %B ^ \
   $(PROGRAM)/audio/get-audio-location "Ogg Vorbis" > %o < %f \

To extract the download location from the metadata, you need to know something about its format. That’s also fairly trivial. You just have to know which format you want.

import sys
from lxml import etree

doc = etree.parse(sys.stdin)
format_name = sys.argv[1]
# Cheating to get key.
key = doc.xpath('substring-before(//file[contains(@name, "_files.xml")]/@name, "_files.xml")')
name = doc.xpath('//file[format=$format]/@name', format = format_name)[0]
print("" + key + '/' + name)

This will fail if there’s no such format, which would be a good thing to know.

About that key. See, Archive puts all of the files under a directory named after the item’s key. Which makes perfect sense.

But the thing is, I don’t persist the Archive key to that point. By the time that script gets executed, it’s lost, and it turns out it’s not in the metadata itself. I could look it up any number of places, but instead I cheat and take it from a place where—by convention, anyway—it is in the available metadata.

And finally, to get the actual audio.

: foreach $(AUDIO_LOCATION)/* \
  | $(PROGRAM)/get-resource \
|> ^o get audio resource %B ^ \
   $(PROGRAM)/get-resource "`cat '%f'`" "%o" "%B" \
|> $(AUDIO)/%B

5 formats

So far, I’ve only dealt with the Ogg Vorbis format. And if this format were supported by all browsers, I’d definitely stop now and go take a walk.

But alas, Ogg is not enough. At least one other format is needed to cover all of today’s major browsers, and Opera.1 These days, there are several viable encoding/container combinations, including plain old MP3.

So how to get MP3’s, or any other format, for that matter?

Internet Archive itself provides most audio programs in multiple formats, including both Ogg and MP3. In fact, for the records I’ve used so far, the MP3 is the “original,” and the Ogg is derivative. I originally chose the Ogg’s because they were much smaller.

But getting other formats from the source is not a good solution generally, because you never know what’s going to be available. The alternative is to convert the audio to the desired format after downloading it. Fortunately, despite the patent issues surrounding several common formats, converting between them is a routine matter, thanks to a “free” software product called ffmpeg.

Well, sort of. For a few years, there’s apparently been a rift in the ffmpeg development community that’s resulted in the user’s hairballing of ffmpeg being temporarily unstraightforward. I am not inclined to summarize the matter here, and I assume it’ll go away before too long. See here and here.

Yet—I happen to be using one of the distributions that’s still on the wrong side of this “forking” issue. So while I could easily install a thing called avconv, I’m going to take a slightly longer road on the assumption that ffmpeg will “win.”

sudo add-apt-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install ffmpeg

Courtesty of Doug McMahon.2

ffmpeg is kind of the ImageMagick of audio. It offers a bewildering array of features and options to them. Let’s suppose I wanted to convert all of the ogg files to aac.

: foreach $(AUDIO)/*.ogg \
  | $(PROGRAM)/audio/to_aac \
|> ^o audio to AAC %b ^ $(PROGRAM)/audio/to_aac "%f" "%o" \
|> $(AUDIO)/%B.m4a

The to_aac script below uses ffmpeg to do the conversion.

And that takes a while. It runs here at about 35x, which means 35 times faster than the length of the audio itself. The “Living Shakespeare” series, for example, are mostly about 55 minutes apiece. So that’s about 90 seconds per production, or forty minutes for the whole set. Even though I’m using a kind of “functional reactive” build system, I don’t want to run those conversions any more than necessary. It’s the same issue when getting things from the internet. Just as I do there, this script puts the actual encoded files in a location outside of the project directory and symlinks to them. After that, it does nothing, even if the input audio changes. So if you do need to re-encode a file for some reason, you have to delete both the actual file and the symlink.



if [ ! -d "$big_file_dir" ]; then
	mkdir -p "$big_file_dir"

# Skip the work if the file already exists
if [ ! -f "$actual_file" ]; then

	ffmpeg -i "$in_file" \
		   -c:a libfdk_aac \
		   -movflags +faststart \

# Now link the requested location to the actual file
ln --symbolic "$actual_file" "$out_file"

With the default bitrate of about 128Kpbs (which is the minimum recommended) the quality of the resulting audio sounds slightly degraded to me. And yet the files are about 40% larger. So I’ll still prefer Ogg for browsers that do support it. I’m explicitly using the FFmpeg developers’ recommendations about which encoder to use for AAC, since it happens to be available on my system3.

Yes, this is transcoding one lossy format to another, and I should start from the original, which the metadata indicates is the MP3. I don’t know that the “original” MP3 is not also lossy in this case, but I’ll get around to trying it eventually.3

I also noticed that an extra option appears to be necessary if you want to support “progressive download.”

By default the MP4 muxer writes the ‘moov’ atom after the audio stream (‘mdat’ atom) at the end of the file. This results in the user requiring to download the file completely before playback can occur. Relocating this moov atom to the beginning of the file can facilitate playback before the file is completely downloaded by the client.4

Hence -movflags +faststart.

6 roadmap

6.1 normalize volume of recordings

Just based on the few recordings that I’m already using, there’s a wide range in volume level among them. I tried some ad-hoc adjustment of the levels at some point (through WinAmp), but I could automate the process with ffmpeg. (Of course, I’d have to listen and determine the target adjustments.)



Opera slam! See the Browser compatibility table of “Supported media formats” at MDN.


Doug McMahon, “Ubuntu Multimedia for Trusty


See “AAC § Progressive Download”, FFmpeg wiki.

about willshake

Project “willshake” is an ongoing effort to bring the beauty and pleasure of Shakespeare to new media.

Please report problems on the issue tracker. For anything else,

Willshake is an experiment in literate programming—not because it’s about literature, but because the program is written for a human audience.

Following is a visualization of the system. Each circle represents a document that is responsible for some part of the system. You can open the documents by touching the circles.

Starting with the project philosophy as a foundation, the layers are built up (or down, as it were): the programming system, the platform, the framework, the features, and so on. Everything that you see in the site is put there by these documents—even this message.

Again, this is an experiment. The documents contain a lot of “thinking out loud” and a lot of old thinking. The goal is not to make it perfect, but to maintain a reflective process that supports its own evolution.

graph of the program



An edition of the plays and poems of Shakespeare.

the works