text tracking

Text tracking is an extension to the audio player.

When playback starts, another channel opens up. A “channel” is a waterway, a continuous flow. The presence of this channel adds another dimension to the space. For recordings that correspond to a text being presented, the “playback head” is floating down this channel. Its location independent from yours.

But the “physical” space does not need to remain ignorant of the audio channel. Text tracking refers to the way that the space reflects the events in the audio program.

1 the module

This module builds on the core audio module.

require(['audio', 'getflow'], (audio, getflow) => {
	console.log("text tracking: start init");
	<<independent functions>>
	<<one-time setup>>
});
<require module="text_tracking" />

2 cue tracker

All text tracking is based on the cue tracker. The cue tracker provides cue events for an HTML media player, based on a set of cues. These cue events don’t do anything in themselves. “Subscribers” to the tracker are notified of the events, and can do what they please about it.

Note that the cue tracker never modifies the playback state. It just “rides along” with any playback that’s going on.

// ASSUMES cues are in time order.
function CueTracker(media) {
	const cue_channel = PubSub();
	let cues = null;
	let pending_timer = null;

	<<cue tracker>>

	function start_with(new_cues) {
		cues = new_cues;
		start_tracking_from_playback_position();
	}

	// `canplay` is needed in case audio is still loading when tracking is
	// requested.
	on(media, 'playing,seeked,canplay', start_tracking_from_playback_position);

	return {
		on_cue: cue_channel.on,
		get_cues() { return cues; }, // cheating?
		start_with
	};
}

The cue tracker only works during playback. It rests when the audio is paused, and it cleans up any timers that hadn’t elapsed yet.

function clear_pending_cue() {
	window.clearTimeout(pending_timer);
	pending_timer = null;   // not really necessary
}
on(media, 'pause', clear_pending_cue);

The tracker works by setting up timers—one at a time. When the the timeout occurs, that means the cue has “fired.” Subscribers to the tracker are notified of this—that’s the tracker’s main job. Also at that time, the tracker sets up the next timer.

function start_cue_countdown(cue_number) {
	const cue = cues[cue_number];

	if (cue) {
		const seconds_to_cue = cue.at - media.currentTime;

		if (seconds_to_cue > 0)
			pending_timer = window.setTimeout(
				() => {
					start_cue_countdown(cue_number + 1);
					cue_channel.send(cue, cue_number);
				}, seconds_to_cue * 1000);
	}
}

That function won’t do anything unless the cue is “in the future,” that is, later than the current playback time.

function start_tracking_from_playback_position() {
	// Sorry, can't track unless audio is playing, or without cues.  Better luck
	// next time.
	if (!cues || !is_playing(media)) return;

	const current_time = media.currentTime;

	clear_pending_cue();

	for (let next = 0; next < cues.length; next++) {
		if (cues[next].at > current_time) {
			start_cue_countdown(next);

			// Also fire the current cue now.  Resuming in the middle of a cue
			// is not exactly the same as *reaching* a cue, so the third
			// argument lets listeners respond accordingly.
			if (next > 0)
				cue_channel.send(cues[next - 1], next - 1, true);

			break;                          // Stop scanning the cues
		}
	}
}

Tracking only makes sense if the audio is playing. But how do you know if the audio is playing? The media has a paused property instead of a “playing” property (as one might expect). So it’s common to assume that “playing” simply means “not paused."1 But if the audio hasn’t loaded yet, it is neither playing nor paused, in which case the currentTime is meaningless. The whole point of the cues is to trigger other actions based on a reported location, so it’s critical that tracking not begin until the audio is actually playing.

function is_playing(media) {
	return !media.paused && media.duration;
}

From my testing, media.duration is NaN when the audio is not loaded. To my mind, this approach makes more sense than checking for a nonzero currentTime, as some recommend.

There is one tracker per visit. It’s reused even as the program changes.

console.log("text tracker: creating cue tracker", audio.get_player());
const tracker = CueTracker(audio.get_player());
<<use tracker>>

Doesn’t this need to listen for a change in the program?

Tracking always begins when the user expresses an intent to listen. Keep in mind that the tracker itself doesn’t do anything but broadcast cue events. What’s done with those events depends entirely on the subscribers.

function start_tracking(program) {
console.log('text tracking: start tracking program', program);
	audio.get_cues_for(program || audio.state.program)
		.then(({all}) => tracker.start_with(all))
}

// Passes program as first argument.
audio.intent_to_listen.on(start_tracking);
on(audio.get_player(), 'playing', () => start_tracking());

Also keep in mind that by the time you get here, the audio may already be playing.

3 marking the current place in the text

The typical use for “text tracks” is to put captions (or subtitles) on a video. Your attention is assumed to remain on the video, so that’s where the captions go, one or two lines at a time.

The case is somewhat inverted here. Rather than put transient captions in a fixed location, the text is persistent and spread out over space, while the sound flows by.

At each point, then, some line is “current,” (also known as the “cued” line). Marking the current line in the text helps make it easier to follow.

tracker.on_cue(({a}) => {

	// Remove any previous marks
	const current_cue = document.querySelector('.current.cue');
	if (current_cue) current_cue.classList.remove('current');

	// Mark the cue link
	const cue_link = audio.cue_link_for(a);
	if (cue_link) cue_link.classList.add('current');
});

Um, yeah, that’s complected into the audio player.

The current line can be signified any kind of way. Right now, the style for “current” is defined at the same time as the cue link’s “pointed at” state.

Of course, the current line won’t always be in view. Even if you’re looking at it now, it will move out of view as playback proceeds. To keep up with the program, you would have to “follow along” by scrolling the page, and moving into the next section.

4 following along

For some recordings, it would be impractical to follow along manually. Productions of plays, for example, may skip many lines and jump around the text. And in any case, navigating while listening means having to think about the device and the site, which would constantly work against the experience of the flow.

Fortunately, it’s easy to automate that navigation. “Following” (or “conducting”) means adjusting the view as needed to keep the current line visible. Automatic following allows people to effortlessly follow the text of a recording while listening. This frees the person’s attention to enjoy and be immersed in the material.

For plays in particular, this not only makes the experience more pleasant, but also reveals some of the decisions made by producers when putting together a performance.

This could be encapsulated better, right? Right now anything that wants to use following has to be inside this closure.

Following may be on or off during playback.

(() => {
	let following;
	const following_changed = PubSub();
	function set_following(enabled) {
		console.log("text tracking: set following to", enabled);
		following_changed.send(following = enabled);
	}

	<<when to stop following>>
	<<when to start following>>
	<<set up following>>
}());

There’s a separate question of when to start following and when to stop following, which will be dealt with shortly. They have separate placeholders because the “start” handler” needs to come after the “stop” handler.

First, what does following actually do when it is enabled? It listens for cues during playback and does whatever’s necessary to ensure that the playback location is in view. That can mean two things:

  • scrolling
  • going to another address (within the site)

Cue events will contain contains either an anchor or a section. One listener handles both cases.

tracker.on_cue(({a, section}) => {

	if (following) {

		// Go to the next section if necessary.
		const {program} = audio.state;
		if (section && program)
			audio.await_catalog.then(catalog => {
				const record = catalog[program];
				if (record)
					getflow.go(record.path + '/' + section);
			});

		// This isn't the line itself, but should be in the same place vertically
		const cue_link = audio.cue_link_for(a);
		if (cue_link)
			scroll_into_view(cue_link, 200);
	}
});

So when you’re following along, reaching the next section in the program means you’ll be taken to that section, at which point the address changes. (Note that the listen query is not carried to the new location.)

Most cues, of course, are just cues for the next line. To follow along, it’s only necessary that the line be kept in view. Assuming that you’ll still want to see the lines before and after the current one, somewhere in the middle of the screen will be preferable. To give some preference to the upcoming text (over the text that’s already past), this function keeps the line within the top third of the screen, never more than 10% from the very top (unless you’re at the end of the section and can’t go any further).

function scroll_into_view(element, duration = 200) {

	if (element) {
		// ASSUMES the document is the scroll layer
		const height = document.documentElement.clientHeight,
			  min = height / 10,
			  max = height / 3;

		getflow.animate(duration, completeness => {
			// Target the nearest place within the range.
			const place = element.getBoundingClientRect().top,
				  target = Math.min(Math.max(place, min), max);

			// Stop if you're there
			if (target == place)
				return false;

			const change = completeness * (place - target);
			window.scroll(0, window.scrollY + change);

			return true;
		});
	}
}

So if the current line is not in the top third of the display, the view is scrolled up or down as little as needed for the line to enter that range.

Also note that this does not change the current address.

4.1 when to start following

Very simply, using any listen link will cause following to start, whether you go directly to the address or follow a link within the site.

Another case that would be better if a deserialized query were also provided to this event.

function entered({new_place}) {
	if ((/[?&]listen=[^\/]+/).test(new_place.search || ''))
		set_following(true);
}
navigating.on(entered);
entered({new_place: window.location});

It doesn’t matter what program was specified, only that this was some kind of listen link. This has to come after the stop handler because it sets following on the initial page load, and the scrolling handler isn’t activated until the “following” setting changes.

4.2 when to stop following

While listening to one part of a play, you may explore others. In such cases, you must retain control of the viewport. If automatic following were to continue while you’re trying to look at something else, it would keep dragging you back to the playback location as cues were reached, which would be really annoying. There has to be a way to stop following.

This doesn’t interrupt any existing scroll animation.

function stop_following() {
	set_following(false);
}

In short, any movement during playback is interpreted as an intent to turn following off. This includes

  • scrolling
  • going to another address (within the site)

Wait a minute, those are exactly the two things that “following” does for you! So in both cases, there’s some question about how you distinguish between the user navigating and the program navigating. Those will be dealt with separately.

Assuming there’s some way to tell when the user has scrolled, it’s just a matter of listening for that when ever following is enabled.

following_changed.on(enabled => {
	if (enabled)
		next_time_user_scrolls(stop_following);
});

For navigating around here, getflow.go is the way to get from one place to another, whether it’s initiated by the program itself or the user. It broadcasts a special event to the window whenever it’s going somewhere, which includes whatever source was passed.

on(window, 'getflow-going', (source = {detail}) => {
	if (source == 'link' || source == 'popstate')
		stop_tracking();
});

popstate is the source when the back button is used, or when history.back() is called. Assuming that the program never calls history.back(), I interpret popstate as user-initiated.

4.3 special case: crossing section

This is a note from my old implementation, and I hadn’t implemented this when I left off.

Even if “tracking” has been turned off because of scrolling, the section will still advance automatically if, when the section transition is reached, the last cued line is still in view. (This is provisional and subject to trial. Also note that it can mean, in rare cases, traversing backwards. Finally, note that it does not apply if the person has changed the URL (i.e. touched a link) since initiating the audio)

5 telling where the audio is

Can this feature really be separated from “following”? At the very least, it builds on it, since you don’t want to show the extra link when that line is already in view.

This is about signifying (and affording a return to) wherever the playback is right now.

  • what program, if you’re in a different scope
  • what section, if you’re in a different section
  • what line, if it’s scrolled out of view

And in any case, who’s speaking.

In all cases, the signifier affords return to the current place, re-enabling following.

All of the above would be reasonable to show during playback, as space affords, as one might wonder, what am I listening to?

That said, the presence or absense of this thing can serve to signify whether following is enabled.

This is entirely about adding (ensuring) and updating a link in the console, so that it always points to the playback location.

function get_return_to_playback_link() {
	let link = document.getElementById('playback-location-link');
	if (!link) {
		// MOVE THIS to the main audio module.
		const _console = document.querySelector('#the-player-console');
		if (_console) {
			link = document.createElement('a');
			link.id = 'playback-location-link';
			link.className = 'playback-location-link';
			_console.appendChild(link);
		}
	}

	return link;
}

This is a “work in progress.” I don’t want to have to do this on every cue, even though await_catalog is memoized. Also, this is largely copied from the “go to current section” handler. I’m not opposed to saving the current section in state, though note that you’d still have to scan backwards in cases where you never cross the section cue.

This link should always point to the current playback location.

tracker.on_cue((cue, cue_number) => {
	const cues = tracker.get_cues();
	if (cues) {
		// Figure out what section you're in.
		for (let n = cue_number; n >= 0; n--) {
			const {section} = cues[n];
			if (section) {
				audio.await_catalog.then(catalog => {
					const {program} = audio.state;
					const record = catalog[program];
					//console.log("record is", record);
					if (record) {
						const link = get_return_to_playback_link();
						if (link) {
							// PROXY for actual text
							link.innerHTML = (cue.a || '').replace(/_/g, ' ');
							link.setAttribute('href', record.path + '/' + section + (cue.a? '#' + cue.a : ''));
						}
					}
				});

				break;
			}
		}
	}
});

Why not just init this here, anyway, and leave it in state.

Using the link always enables “following” (but probably not by including a listen query, since that will make the player think it should seek to that location, when in fact it should just let playback continue). The intent to listen is already present. That said, it should include a listen query if playback is not happening already.

const return_link = get_return_to_playback_link();
on(return_link, 'click', () => set_following(true));

The player should know when following is enabled, so that its contents can adjust themselves accordingly

following_changed.on(enabled => {
	// MOVE THIS getter to the audio module proper
	document.getElementById('the-player').classList.toggle('following', enabled);
});

The link is a little speech bubble that sits over the player state control.

@import thumb-metrics

.playback-location-link

	// ASSUMES these metrics for the player controls, so that it will sit in the
	// middle on top.
	position absolute
	right ($thumbRems / 2)
	bottom $thumbRems

	// It needs to be big enough to touch, even if the text is short.
	padding .5em 1em

	// It uses the same typeface as the text
	@import fonts
	book-font()

	// It's a speech bubble
	background rgba(white, .6)
	background #FFE
	border-radius 1em 1em 0 1em
	box-sizing border-box
	white-space nowrap
	// floating
	box-shadow 1px 1px 2px #222

	@import pointing
	+user_pointing_at()
		&:before
			@import colors
			position absolute
			top 0
			right 100%
			body-font()
			border-radius 2px
			padding .5em 1em
			background $resourcesColor
			color white
			content 'Return to '

	// The link is completely invisible until some playback has occurred.
	// SHOULDN"T this be done on the console itself, anyway
	.player-console:not([data-player-state]) &
	.player-console[data-player-state="waiting"] &
		display none

	// Even then, the link is only shown when you're *not* following.
	transition transform .15s
	transform-origin bottom right
	transform scale(1)
	.following &
		transform scale(0)

	// The text that's shown is only part of the line
	&:after
		content '...'

6 integration with other features

None of these features care whether tracking is on or off. But some of them will provide affordances that result in the re-enabling of following. I’d prefer to remain agnostic of “following” here, so I’m looking for a way to denote that intent by some general means.

6.1 scene timelines

Show where playback is in the scene timeline:

  • the current speech
  • the current position (how would this be different than above?)
  • the current speaker. Maybe affecting the whole lane, so that you

get a sense of the back-and-forth of the dialogue.

And allow you to return to it (and re-enabling following).

6.2 play map

Signify in the play map where playback is currently occurring.

And allow you to return to it (and re-enabling following).

7 incidentals

All right, I should make a module for common stuff like this. Yeah, I’ll end up reinventing jQuery.

function on(element, events, handler) {
	events.split(',').forEach(
		event => element.addEventListener(event, handler));
}
function off(element, events, handler) {
	events.split(',').forEach(
		event => element.removeEventListener(event, handler));
}

7.1 user scroll utility

Register a one-time handler that distinguishes person-initiated scrolling from program-initiated. I expect it will need some refinement for touch screens. This is currently only used in one place, by the text tracking system, but is factored out because it may have more general use. http://stackoverflow.com/a/2836104/4525

const POSSIBLE_SCROLL_EVENTS = 'scroll,mousedown,wheel,DOMMouseScroll,mousewheel,keyup';

function next_time_user_scrolls(do_action) {

	function handler(event) {
		if (event && (
			event.which > 0
				|| event.type == "mousedown"
				|| event.type == "mousewheel")){

			//console.log("USER SCROLL", event.type);
			off(window, POSSIBLE_SCROLL_EVENTS, handler);
			do_action();
		}
	};

	on(window, POSSIBLE_SCROLL_EVENTS, handler);
}

8 proposals

8.1 in scene timeline, dim speeches without cued lines during playback

The proposal should specify why this is useful and how it integrates with existing signifiers in the scene timeline.

This used to be done in the text itself, too, and right now I’m on the fence about it. As it is, I think there’s a risk of too much emphasis on the recorded lines. Besides, the recordings don’t always speak the entire line, anyway, and this gets especially ragged for prose.

about willshake

Project “willshake” is an ongoing effort to bring the beauty and pleasure of Shakespeare to new media.

Please report problems on the issue tracker. For anything else, public@gavinpc.com

Willshake is an experiment in literate programming—not because it’s about literature, but because the program is written for a human audience.

Following is a visualization of the system. Each circle represents a document that is responsible for some part of the system. You can open the documents by touching the circles.

Starting with the project philosophy as a foundation, the layers are built up (or down, as it were): the programming system, the platform, the framework, the features, and so on. Everything that you see in the site is put there by these documents—even this message.

Again, this is an experiment. The documents contain a lot of “thinking out loud” and a lot of old thinking. The goal is not to make it perfect, but to maintain a reflective process that supports its own evolution.

graph of the program

about

Shakespeare

An edition of the plays and poems of Shakespeare.

the works