audio
https://commons.wikimedia.org/wiki/File:Advertising_Record.ogg

This document implements the audio-specific features of a media subsystem.
1 motivation
Nothing is quite as immersive as sound. Sound adds a dimension to a space in a way that no visual projection can do. The sense of presence and motion that can be created by playing back a simple two-channel audio recording is undeniable.
Weaving sounds together with other media can have a multiplier effect, giving the whole representation a new level of power.
The objective of this subsystem is to make it easy to collect and use audio recordings—especially from the public domain—along with other media.
2 programs
An audio program is a recording, plus everything known about it. In other words, a program is the thing defined by an audio record. Some of the information isn’t in the record itself, but in other files that it points to.
Currently only single-file programs are supported, but there are multi-file programs already on the radar, so this will need to be dealt with some way.
A program can comprise one or more audio files.
Its record in the catalog will indicate what it’s a recording of, and anything else I want to define.
3 audio records
Audio records are stored in the catalog. Each recording gets its own file.
3.1 notes on structure
package/program: when there are several audio files in a group
cues: associate times in the audio with play/scene/anchor, see audio-cues.xml/rnc
3.2 bundle the audio
A very similar thing is done for images.
Of course, to use the audio records, it’ll help to have them in a single file..
: $(PROGRAM)/audio/bundle_audio_records $(AUDIO_RECORDS)/* \
|> ^ bundle audio records ^ %f > %o \
|> $(ROOT)/data/audio/audio.xml
BEGIN { print "<audio-index>" }
FNR == 1 {
# Extract the name from amid the source type prefix, format, and
# extension.
match(FILENAME, /\w+-([^/]+)\.[^.]+\.xml$/, name)
printf "<audio key=\"%s\">\n", name[1]
}
{ print }
ENDFILE { print "</audio>" }
END { print "</audio-index>" }
4 getting audio
This is simliar to the process that is used for images. Right now, all audio comes from the Internet Archive, so I don’t bother with non-catalog types.
The other difference is that an additional intermediate file is used for storing the location of the catalog metadata, just as is done for the download location.
: foreach $(AUDIO_RECORDS)/catalog-*.* \
| $(PROGRAM)/audio/* \
|> ^o get audio metadata location %B^ \
$(PROGRAM)/audio/get-audio-metadata-location %f > %o \
|> $(AUDIO_METADATA_LOCATION)/%g.txt
Of course, this adds another node to the build graph for every file, but the upside is that it lets you separate unrelated things, namely the extraction of the metadata location and the downloading of the metadata.
match($0, /<from .*archive-item="(.*?)"/, m) {
key = m[1]
print "http://archive.org/download/" key "/" key "_files.xml"
exit
}
I may switch to something like that for images, despite the fact that it means thousands of more little files, because it decomposes the steps more clearly and it’s a stronger way to prevent repeat downloads.
The above may be inadequate for some inputs, i.e. those with problematic characters in the key (so as to require XML decoding or URL escaping). I haven’t seen any on Internet Archive, but I don’t know if that’s a matter of policy.
It also makes some assumptions about the XML in the record, but hey, they’re my records, so I can do that.
The following version is “safer” but a little slower.
import sys
from lxml import etree
_, record = sys.argv # Can't iterparse from stdin (AFAIK)
# `recover` is needed because this is a fragment
for event, element in etree.iterparse(record, tag='from', recover=True):
key = element.get('archive-item')
print("http://archive.org/download/{}/{}_files.xml".format(key, key))
break
With that, getting the actual metadata is trivial.
: foreach $(AUDIO_METADATA_LOCATION)/*.txt \
| $(PROGRAM)/get-resource \
|> ^ get audio metadata %B ^ \
$(PROGRAM)/get-resource `cat %f` %o "%B__metadata" \
|> $(AUDIO_METADATA)/%B.xml
: foreach $(AUDIO_METADATA)/* \
| $(PROGRAM)/audio/get-audio-location \
|> ^o get audio location %B ^ \
$(PROGRAM)/audio/get-audio-location "Ogg Vorbis" > %o < %f \
|> $(AUDIO_LOCATION)/%B.txt
To extract the download location from the metadata, you need to know something about its format. That’s also fairly trivial. You just have to know which format you want.
import sys
from lxml import etree
doc = etree.parse(sys.stdin)
format_name = sys.argv[1]
# Cheating to get key.
key = doc.xpath('substring-before(//file[contains(@name, "_files.xml")]/@name, "_files.xml")')
name = doc.xpath('//file[format=$format]/@name', format = format_name)[0]
print("http://archive.org/download/" + key + '/' + name)
This will fail if there’s no such format, which would be a good thing to know.
About that key
. See, Archive puts all of the files under a directory named
after the item’s key. Which makes perfect sense.
But the thing is, I don’t persist the Archive key to that point. By the time that script gets executed, it’s lost, and it turns out it’s not in the metadata itself. I could look it up any number of places, but instead I cheat and take it from a place where—by convention, anyway—it is in the available metadata.
And finally, to get the actual audio.
: foreach $(AUDIO_LOCATION)/* \
| $(PROGRAM)/get-resource \
|> ^o get audio resource %B ^ \
$(PROGRAM)/get-resource "`cat '%f'`" "%o" "%B" \
|> $(AUDIO)/%B
5 formats
So far, I’ve only dealt with the Ogg Vorbis format. And if this format were supported by all browsers, I’d definitely stop now and go take a walk.
But alas, Ogg is not enough. At least one other format is needed to cover all of today’s major browsers, and Opera.1 These days, there are several viable encoding/container combinations, including plain old MP3.
So how to get MP3’s, or any other format, for that matter?
Internet Archive itself provides most audio programs in multiple formats, including both Ogg and MP3. In fact, for the records I’ve used so far, the MP3 is the “original,” and the Ogg is derivative. I originally chose the Ogg’s because they were much smaller.
But getting other formats from the source is not a good solution generally,
because you never know what’s going to be available. The alternative is to
convert the audio to the desired format after downloading it. Fortunately,
despite the patent issues surrounding several common formats, converting between
them is a routine matter, thanks to a “free” software product called ffmpeg
.
Well, sort of. For a few years, there’s apparently been a rift in the ffmpeg
development community that’s resulted in the user’s hairballing of ffmpeg
being
temporarily unstraightforward. I am not inclined to summarize the matter here,
and I assume it’ll go away before too long. See here and here.
Yet—I happen to be using one of the distributions that’s still on the wrong
side of this “forking” issue. So while I could easily install a thing called
avconv
, I’m going to take a slightly longer road on the assumption that ffmpeg
will “win.”
sudo add-apt-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install ffmpeg
Courtesty of Doug McMahon.2
ffmpeg
is kind of the ImageMagick of audio. It offers a bewildering array of
features and options to them. Let’s suppose I wanted to convert all of the ogg
files to aac
.
: foreach $(AUDIO)/*.ogg \
| $(PROGRAM)/audio/to_aac \
|> ^o audio to AAC %b ^ $(PROGRAM)/audio/to_aac "%f" "%o" \
|> $(AUDIO)/%B.m4a
The to_aac
script below uses ffmpeg
to do the conversion.
And that takes a while. It runs here at about 35x, which means 35 times faster than the length of the audio itself. The “Living Shakespeare” series, for example, are mostly about 55 minutes apiece. So that’s about 90 seconds per production, or forty minutes for the whole set. Even though I’m using a kind of “functional reactive” build system, I don’t want to run those conversions any more than necessary. It’s the same issue when getting things from the internet. Just as I do there, this script puts the actual encoded files in a location outside of the project directory and symlinks to them. After that, it does nothing, even if the input audio changes. So if you do need to re-encode a file for some reason, you have to delete both the actual file and the symlink.
in_file="$1"
out_file="$2"
big_file_dir="$HOME/ws/media"
out_name="${out_file##*/}"
actual_file="$big_file_dir/$out_name"
if [ ! -d "$big_file_dir" ]; then
mkdir -p "$big_file_dir"
fi
# Skip the work if the file already exists
if [ ! -f "$actual_file" ]; then
ffmpeg -i "$in_file" \
-c:a libfdk_aac \
-movflags +faststart \
"$actual_file"
fi
# Now link the requested location to the actual file
ln --symbolic "$actual_file" "$out_file"
With the default bitrate of about 128Kpbs (which is the minimum recommended) the quality of the resulting audio sounds slightly degraded to me. And yet the files are about 40% larger. So I’ll still prefer Ogg for browsers that do support it. I’m explicitly using the FFmpeg developers’ recommendations about which encoder to use for AAC, since it happens to be available on my system3.
Yes, this is transcoding one lossy format to another, and I should start from the original, which the metadata indicates is the MP3. I don’t know that the “original” MP3 is not also lossy in this case, but I’ll get around to trying it eventually.3
I also noticed that an extra option appears to be necessary if you want to support “progressive download.”
By default the MP4 muxer writes the ‘moov’ atom after the audio stream (‘mdat’ atom) at the end of the file. This results in the user requiring to download the file completely before playback can occur. Relocating this moov atom to the beginning of the file can facilitate playback before the file is completely downloaded by the client.4
Hence -movflags +faststart
.
6 roadmap
6.1 normalize volume of recordings
Just based on the few recordings that I’m already using, there’s a wide range in
volume level among them. I tried some ad-hoc adjustment of the levels at some
point (through WinAmp), but I could automate the process with ffmpeg
. (Of
course, I’d have to listen and determine the target adjustments.)
Footnotes:
Opera slam! See the Browser compatibility table of “Supported media formats” at MDN.
Doug McMahon, “Ubuntu Multimedia for Trusty”
“Guidelines for high quality lossy audio encoding”, FFmpeg wiki.
See “AAC § Progressive Download”, FFmpeg wiki.