getflow

You are now reading about “getflow,” which—with all due respect—is a piece of software.

Actually, getflow is two pieces of software that somehow expect you to write a third piece of software. Wait, come back!

1 `GET` and flow : a love story

In 1945, Vannevar Bush dreamed of a world-wide document sharing system so that scientists could accelerate the pace of scientific progress—even in peacetime. He called it the “memex.” In 1993, a bunch of amateurs at CERN created the memex, except they called it the “World Wide Web” and instead of basing it on dry photography, they based it on the Internet. Why the geniuses who created the Internet didn’t also create the memex (so amateurs wouldn’t have to), I still don’t understand.

Anyway, the World Wide Web is still around, 100 years later. Wars have been fought over it. Families made, families broken. People have gone into “the web” and never returned. It’s another world.

A bit of trivia, but the World Wide Web is actually still a functioning document viewer. Yeah, sure, grandpa, and people used to actually drive cars! But it’s true. You can enter an “address” and retrieve a “hypertext” document containing special things called hyperlinks. A hyperlink is like a plane ticket. Except that your destination is usually another document. For several years, that’s all the web was! Oh, and I almost forgot, you could also fill out forms! No wonder the thing took off, right?!

1.1 traveling as a first-class notion

It’s always been possible to “drop in” on a web site. Just go to the URL.

But there has never been a first-class notion of change. All you can do is define new states. Every transition is like blindfolding you and putting you on an airplane, even if you’re just going next door.

Well, AJAX happened, and unleashed a flood of incalculable pain on the web programming world. Sure, it was worth it.

But imagine for a moment what it would mean to define a web site in terms of changes.

But we still don’t have a first-class notion of change—change between two states—and we never will. Everyone will keep doing it their own way. Well, getflow is me doing it my own way.

In getflow, there are two first-class things you can do in a domain. One is dropping in. That’s already first-class. The other one is traveling.

This is retained so as not to break the export.

published/images/getting_from_one_place_to_another.svg

2 blueprints (the language)

Here’s the deal: you define a web site using a highly-constrained, 100% declarative, rule-based language, which is designed so that state transitions are computable—and getflow will do the rest.

That concept can be specified a number of ways, and those specifications could be implemented a number of ways. Here is where we implement getflow’s domain-specific language (DSL) inside the client (browser).

I’ll call that language the “blueprints,” which are the master plans for the site. It uses three main concepts:

routes
context
change rules

It is implemented in JavaScript for the client, and Python on the server. They do mostly the same thing.

tangle next
the language
js next

Outline of the DSL

<<read change rules>>
<<read route>>
<<read blueprints>>

Note that these are in reverse order of dependency; that is, each one is called by the following.

2.1 blueprints

For nothing more than my own ease of authoring (which is admittedly subjective), the blueprints (routes and change rules) are expected in an XML format. This is based on the fact that the majority of the content will be HTML templates, which are practically isomorphic with XML.

Here we transform the getflow XML format into something we can use to actually implement the rules.

previous js next

function read_blueprints(element) {
    return make_array(children_of(element))
	.filter(node => node.nodeName === 'route')
	.map(read_route);
}

And here is the Python version. I still call the blueprints the “plans” here.

python next

from lxml import etree
from ..getflow.changes import ClassChange, TemplateChange

<<server read change rules>>

def read_route(node):
    return {
	'route': node.get('path'),
	'change_rules': list(read_change_rules(node)) }

def read_plans(file):
    return list(map(read_route, etree.parse(file).xpath('route')))

2.2 get the blueprints

One of getflow’s design goals is to support an immediate response to the person. One thing that makes such responsiveness difficult is network latency. If you have to contact the (remote) server every time you want to visit a new location, then you can never hope for immediate reponses as a rule. Even in the best conditions, there is a limit on how fast you can make a roundtrip to the server, regardless of how quickly the server responds. Yet, the only way around this would be for the browser to already know how to get to every other place on the site, which for a site of any size would clearly be impossible.

The approach taken by getflow is to consolidate the essential structure of the site into a (potentially) very small set of rules, all of which can be downloaded up-front and in short order. Then, even if some of the details still have to be retrieved from the server on-demand, there will be at least something we can do while we’re waiting; that is, the broad outlines of the movement will be available, and we may proceed immediately with those.

This, then, is the part where we get these “blueprints,” which at present are in the form of an XML document containing routes and the accompanying change rules.

read the blueprints
previous js next

Get and read the blueprints

function read_the_blueprints(doc) {
    return state.routes = read_blueprints(doc.documentElement);
}

// Let's start reading the blueprints immediately.
const get_the_blueprints = 
	GET_XML("/getflow.xml")
	.then(read_the_blueprints, debug_message);

We also save the “blueprints promise”, which we can reference anywhere we want to guarantee that the blueprints have arrived.

2.2.1 TODO do we need to wait for document load

Before any of this (particularly the event hook)? Or can we just assume that the script is properly placed?

2.3 routes (s/b path patterns)

A route consists of a path pattern and a set of accompanying change rules.

previous js next

Read a route

<<path pattern>>

function read_route(node) {
    return {
	change_rules: read_change_rules(node),
	matcher: get_path_pattern_matcher(node.getAttribute("path"))
    };
}

2.4 path hacking (parent paths) (an implementation detail)

Whereas in principle the web does not impose any relationship between the content at a parent and a child path, getflow’s entire model is based on the assumption that sites are defined by those relationships. As such, we require a formal definition of what a “parent” path is. In short, it is the path with its query and last segment removed. This is sometimes called “hacking” the path, because you “hack off” the end of it.

previous js next

Path hacking

function hack_path(path) {
    return '/' + path
	.split('?')[0]
	.split('/').filter(x => !!x)
	.slice(0, -1).join('/');
}

And here is the Python version.

previous python next

def hack_path(path):
    return '/' + '/'.join(
	list( path.split('?')[0].rstrip('/').split('/')  )[1:-1]  )

And its tests.

getflow/server/test-paths.py
previous python next

from getflow.paths import hack_path
from test_helpers import check

def do_test(case):
    path, expected = case
    check(case, expected, hack_path(path))

[do_test(case) for case in (
    ('/', '/'), 
    ('/abc', '/'), 
    ('/abc/', '/'), 
    ('/abc/def', '/abc'), 
    ('/search?q=123', '/'), 
    ('/abc/def/ghi', '/abc/def'), 
    ('/abc/def/ghi?q=123', '/abc/def'), 
    ('/abc/def/ghi?q=some/url', '/abc/def') )]

/abc	/
abc	/
/abc/def	/abc
/search?q=123	/
/abc/def/ghi	/abc/def
/abc/def/ghi?q=123	/abc/def
/abc/def/ghi?q=some/url	/abc/def

javascript next

<<hack path>>

return table.map(function(row) {
    var path = row[0],
	expected = row[1],
	got = hack_path(path),
	pass = got == expected;
    return ['~' + path + '~', '\\rightarrow', '~' + expected + '~',
	    pass ? '\\checkmark' : '\\xmark' + ' got ' + got];
});

`/`	→	`/`	&check;
`/abc`	→	`/`	&check;
`/abc/`	→	`/`	&check;
`/abc/def`	→	`/abc`	&check;
`/search?q=123`	→	`/`	&check;
`/abc/def/ghi`	→	`/abc/def`	&check;
`/abc/def/ghi?q=123`	→	`/abc/def`	&check;
`/abc/def/ghi?q=some/url`	→	`/abc/def`	&check;

Note that slashes are valid in a URL query.¹

2.5 placeholders

Sometimes we want to leave a blank that will be filled in later. We’ve seen this with path patterns, and we’ll also use it in several other ways, including selectors and throughout templates.¹ Generally, we’ll use curly braces {} to demark placeholders. It should work like this.

the placeholders in	`no placeholders`	are	(none)	&check;
the placeholders in	`one {special} place`	are	`special`	&check;
the placeholders in	`{two} special {places}`	are	`two` & `places`	&check;

So, how do we do it?

This pattern will match anything contained in curly braces (including the braces), and also make a separate capture of the contents.

previous js next

var placeholder_pattern = /\{(.+?)\}/g;

function fill_placeholders(text, fn) {
    return text.replace(placeholder_pattern, fn);
}

function placeholders_in(text) {
    var placeholders = [];
    fill_placeholders(text, function(_, placeholder) {
	placeholders.push(placeholder);
    });
    return placeholders;
}

2.6 path patterns : looking up addresses

We can describe a huge—nay, infinite—number of locations in just a few lines. How can we do this? Using patterns.

These are often called “routes.” We’re going to reserve the term “route” to refer to the list of steps between two locations.

getflow/client-tests/path patterns.js
previous javascript next

    <<test helpers>>

return table.map(function(row) {
    var pattern = row[0];
    var description = row[1];
    return [format_cell(pattern), description || ''];
});

Table 1: Sample path patterns
`/`	The root path
`/special`
`/songs`
`/songs/{song}`	song is a placeholder
`/songs/{song}/listen`	song is still a placeholder
`/life/{kingdom}`	One placeholder
`/life/{kingdom}/{phylum}`	Two placeholders
`/life/{kingdom}/{phylum}/{class}`	Three placeholders
`/chapter-{chapter}`	Embedded placeholder

When someone asks to go to a certain address, we’ll see if it matches any of these patterns. If it does, that means there’s (maybe) something there, and we’ll see about going to that place. For now, we’re just interested in how the matching works.

Those names inside the braces are placeholders. The “real” addresses may fill in those blanks with anything (any single thing, i.e., no slashes). If a path fits the pattern, we’ll remember how names in braces were replaced. In practice, it looks like this:

previous javascript next

    <<test helpers>>
    <<path pattern>>

function format_context(context) {
    return !context ? format_cell(context)
	: Object.keys(context).map(function(key) {
	    return format_cell(key) + ' = ' + format_cell(context[key]);
	}).join(', ');
}

var directory = test_patterns.map(get_path_pattern_matcher);

function look_up_address(path) {
    for (var i = 0; i < directory.length; i++) {
	var matcher = directory[i];
	var match = matcher(path);
	if (match) {
	    return match;
	}
    }

    return null;
}

return [
    ["/dog", null],
    ["/songs", "/songs"],
    ["/songs/Help", "/songs/{song}", {song: "Help"}],
    ["/songs/Freebird", "/songs/{song}", {song: "Freebird"}],
    ["/songs/Freebird/listen", "/songs/{song}/listen", {song: "Freebird"}],
    ["/life/Animalia", "/life/{kingdom}", {kingdom: "Animalia"}],
    ["/life/Animalia/Chordata", "/life/{kingdom}/{phylum}", {kingdom: "Animalia", phylum: "Chordata"}],
    ["/chapter", null],
    ["/chapter-one", "/chapter-{chapter}", {chapter: "one"}]
    // nice
    //["/life/Animalia/Chordata/Mammalia", "/life/{kingdom}/{phylum}/{class}", {kingdom: "Animalia", phylum: "Chordata", "class": "Mammalia"}],
].reduce(function(acc, test_case) {
    var path = test_case[0];
    var expected_pattern = test_case[1];
    var expected_context = test_case[2];
    var expected = !expected_pattern ? null : {
	pattern: expected_pattern,
	context: expected_context || {}
    };
    var got = look_up_address(path);
    var pass = deep_equals(expected, got);
    var out_rows = [[
	format_cell(path),
	'matches',
	!expected ? '/(nothing)/' : format_cell(expected_pattern),
	'',
	pass ? '\\checkmark' : '\\xmark got ' +
	    (got ? format_cell(got.pattern) + ' ' + format_context(got.context) : '=null=')
    ]];
    if (expected_context) {
	var context_cells = Object.keys(expected_context).map(function(key, i) {
	    return (i == 0 ? '/with/ ' : '/and/ ') +
		format_cell(key) + ' = ' + 
		format_cell(expected_context[key]);
	});
	out_rows[0][3] = context_cells[0];
	out_rows = out_rows.concat(context_cells.slice(1).map(function(cell) {
	    return [' ', ' ', ' ', cell];
	}));
    }
    return acc.concat(out_rows);

}, []);

Of course, if the path doesn’t have any placeholders, then it only matches that one path exactly, and thus only represents a single location. This is useful to create a “branching out” point for a set of places.

2.6.1 path pattern implementation

How do we do all that? We need a way to turn a pattern into a function. The deal is, “Hey, function, I’ll give you a path, and you tell me if it matches the pattern, plus, include the little dictionary of placeholders.” How exactly it does that, is its own business.

previous js next

Turning a path pattern into a function

function get_path_pattern_matcher(pattern) {

    <<path pattern magic setup>>

    return path => {
	<<maybe return a match>>
	return null;            // no match
    };
}

Our strategy is to use a regular expression to do the matching. We only have to do this once (when the pattern is defined) and we can keep reusing the matcher.

previous js next

var rex = RegExp(
    '^'
    + pattern.split('/').map(step_to_RegExp).join('/')
    + '/?$');

Everything that is not a placeholder should match itself, so it has to be “quoted” in case it contains any characters that happen to be special instructions. Also, we have to collect the names of any placeholders (for use during the matching), while ensuring that those parts of the string act as wildcard matches.

previous js next

var placeholder_names = [];

function step_to_RegExp(step) {
    return step
	.replace(/([^\{]*)(?:\{(.+?)\})?/g, (_, plain, placeholder) => {
	    // Keep a list of the placeholders
	    if (placeholder)
		placeholder_names.push(placeholder);

	    return quote_regexp(plain) + (placeholder?  '([^/]+)' : '');
	});
}

That replace pattern is a bit ugly. The upside is that it allows us to use wildcards within path segments (like /chapter-{chapter}), which is useful, for instance, when you don’t want a dedicated hub for a set of paths.

The actual matching function uses the regular expression to test for a match and also to collect the “groups”, which hold the values matched by placeholders.

previous js next

const match = rex.exec(path);
if (match) {
    // Create the dictionary of placeholder matches
    const context = {};
    for_each(placeholder_names, (name, i) => {
	context[name] = match[i + 1];
    });

    return { pattern, context };
}

I can’t say the Python version is much better. It’s also based on regex, of course.

next
getflow/server/getflow/routes.py next
previous python next

import re

piece_pattern = re.compile(
    r'''
    ([^{]*)                     # The literal part
    (?:{(.+?)})?                # Maybe a placeholder
    ''', re.X)

The main function, make_path_pattern_matcher, is just like the javascript version. It returns a function that performs path matching based on the given pattern. The function will test a given path. If it matches, it will return a tuple containing the pattern and a map of how any placeholders were filled. Otherwise, it will return None.

previous python next

def read_piece(match):
    plain, placeholder = match.groups()
    return re.escape(plain) + ('(?P<' + placeholder + '>[^/]+)' if placeholder else '')

def make_path_pattern_matcher(pattern):
    rex = re.compile(
	'^'
	+ '/'.join(map(lambda step: piece_pattern.sub(read_piece, step),
		       pattern.split('/')))
	+ '/?$')

    def matcher(path):
	match = rex.search(path)
	return (pattern, match.groupdict()) if match else None

    return matcher

What’s really wanted is a way to actually use the patterns. Given a path, you’d like to know the first pattern from the list that matches. Yes, in the routes, order matters.

previous next
previous getflow/server/getflow/routes.py
previous python next

def first(predicate, iterable):
    return next((x for x in iterable if predicate(x)), None)

def look_up_address(fns, path):
    return first(bool, (f(path) for f in fns))

Its tests, like most, could probably be shared with the JavaScript versions.

getflow/server/test-routes.py
previous python next

from getflow.routes import make_path_pattern_matcher, look_up_address
from test_helpers import check

# These tests are adapted from the javascript versions in `client.org'.

# The pattern descriptions are not used by these tests.
pattern_tests =  [
    ("/", "The root path"),
    ("/special", None),
    ("/songs", None),
    ("/songs/{song}", "/song/ is a placeholder"),
    ("/songs/{song}/listen", "/song/ is still a placeholder"),
    ("/life/{kingdom}", "One placeholder"),
    ("/life/{kingdom}/{phylum}", "Two placeholders"),
    ("/life/{kingdom}/{phylum}/{class}", "Three placeholders"),
    ("/chapter-{chapter}", "Embedded placeholder")]

path_matching_tests = [
    ("/dog", None, None),
    ("/songs", "/songs", {}),
    ("/songs/Help", "/songs/{song}", {"song": "Help"}),
    ("/songs/Help/oops", None, {}),
    ("/songs/Freebird", "/songs/{song}", {"song": "Freebird"}),
    ("/songs/Freebird/listen", "/songs/{song}/listen", {"song": "Freebird"}),
    ("/life/Animalia", "/life/{kingdom}", {"kingdom": "Animalia"}),
    ("/life/Animalia/Chordata", "/life/{kingdom}/{phylum}", {"kingdom": "Animalia", "phylum": "Chordata"}),
    ("/chapter", None, None),
    ("/chapter-one", "/chapter-{chapter}", {"chapter": "one"}),
    ("/life/Animalia/Chordata/Mammalia", "/life/{kingdom}/{phylum}/{class}", {"kingdom": "Animalia", "phylum": "Chordata", "class": "Mammalia"})]

def run_test(directory, test):
    path, pattern, context = test
    expected = (pattern, context) if pattern else None
    check(test, expected, look_up_address(directory, path))

print('using test patterns:')
test_patterns = [t[0] for t in pattern_tests]
print(*test_patterns, sep='\n')

print('\ntesting paths:')
test_routes = list(map(make_path_pattern_matcher, test_patterns))

for test in path_matching_tests:
    run_test(test_routes, test)

2.7 change rules

A “change rule” is basically a record with three fields:

selector: the CSS selector saying what element(s) to target
verb: (or “method”), one of the supported DOM operations to apply to the target
argument: either a list of class names (for class changes) or an HTML template

Currently, getflow defines two kinds of changes: class changes, and template changes. Class changes are much simpler.

previous js next

A “class” for class change rules

function ClassChangeRule(selector, verb, argument) {

    // DUPLICATED!!! in TemplateChangeRule

    // Collect the (unique) inputs that we use.
    var _inputs = {};
    function register_input(input) {
	_inputs[input] = 0;
    }

    // From selectors
    for_each(placeholders_in(selector), register_input);

    // From argument
    for_each(placeholders_in(argument), register_input);

    return {
	selector: selector,
	verb: verb,
	argument: argument,
	inputs: Object.keys(_inputs)
    };
}

You’ll note that all class changes are processed synchronously. But it would be useful in some cases to wait for a transition to complete before going on to the next change. Having expended many hours of effort in that direction, I found that it could not be done in contemporary browsers without degrading user experience. This was because the transitionend event would not fire predictably, particularly in Firefox. It’s perfectly well that no transition should be done when the engine deems it unnecessary. But it means that you cannot count on the event occuring, and thus you have to resort to timeouts, which necessarily end up being used exactly when they should not apply, viz, when the transition wouldn’t have happened anyway.

Template rules are not as simple. Like class changes, templates support context evaluation, inside of (braces in) attribute values, as well as within the content of eval nodes. But “worse,” templates can have xslt nodes that represent the result of an XSL transform (whose input path also supports context evaluation). For this reason, template changes must be processed asynchronously, since it may be necessary to retrieve a number of files.

2.7.1 identify the change rule

Some people like to dream about what’s doable.

Here at getflow, we prefer to focus on what’s undoable.

Adding things is easy. When the author says “add this here,” it’s easy to make that change correctly.

getflow’s M.O. is that things that get added can get removed without any extra thought from the author. The author already has enough problems.

The price of that mercy is the ensuing ugliness, which is chiefly a matter of coordinating between the client and the server. The problem is, how do you know what to remove, when you’re “undoing” a rule?

Of course, the server never has to remove anything—it just makes a page and moves on with its life. And in most cases, the client is the one that added things in the first place, so it should know, right? The problem arises when you drop in somewhere that things have already been added, then you travel to somewhere that doesn’t have those things. In that case, the client and the server need to be “on the same page” about what elements were added by what rule.

So it must be possible to “stamp” each element with the rule and context that created it, in a way that can be exactly the same on the client and server. Each element added, then, will carry metadata indicating its:

selector (pattern)
verb
argument (may be hashed for value comparison)
used context, that is, the map of those context values that are actually referenced by the rule

previous js next

function naive_hash(text) {
    return '' + stupid_string_hash(text);
}

function encode_context_for_attribute(text) {
    return text.replace(/\"/g, '#');
}

function encode_used_context(dictionary, used) {
    return Object.keys(dictionary)
	.filter(key =>
		// HACK: this can be any XPath expression, so a very dirty test here
				used.some(input => input.indexOf(key) >= 0))
	.map(key =>
	     encode_context_for_attribute(key) + '=' +
	     encode_context_for_attribute(dictionary[key]))
	.join('/');
}

function get_bound_rule_keys(rule, context) {
    return {
	selector: rule.selector, // NOT the context expanded selector
	verb: rule.verb,
	argument: naive_hash(rule.argument),
	// Only the context *used* by this rule
	context: encode_used_context(context.map, rule.inputs)
    };
}

previous js next

A “class” for template change rules

function TemplateChangeRule(selector, verb, argument) {
	const range = document.createRange();
	range.setStart(document.body, 0);
    const fragment = range.createContextualFragment(argument);

    // DUPLICATED!!! in ClassChange

    // Collect the (unique) inputs that we use.
    const _inputs = {};
    function register_input(input) {
	_inputs[input] = 0;
    }

    // From selectors
    for_each(placeholders_in(selector), register_input);

    // From eval nodes
    for_each(fragment.querySelectorAll('eval'),
			 eval_node => register_input(markup_of(eval_node)));

    // From attributes (this also covers inputs to, e.g. xslt instructions)
    scan_attributes(fragment, attribute =>
					for_each(placeholders_in(attribute.value), register_input));

    return {
	selector,
	verb,
	argument,
	fragment,
	inputs: Object.keys(_inputs)
    };
}

We resort to createRange().createContextualFragment() because you cannot set the innerHTML of a document fragment (e.g. from document.createDocumentFragment()). The present method is method is supposedly not supported in IE10- (see http://stackoverflow.com/a/25214113)

The setStart call is only needed for Safari iOS (thanks https://twitter.com/jaffathecake/status/613977890040999936). In Chrome that call will fail without the second argument.

Here are the equivalents in Python. I don’t even bother with a “class hierarchy.”

previous next
getflow/server/getflow/changes.py
previous python next

from ..getflow.templates import Template
from ..getflow.dom_tokens import add_token, remove_token
from lxml.etree import _Element as Node

# See below.  Server detection technique from
# https://code.google.com/p/modwsgi/wiki/TipsAndTricks
is_command_line = True
try:
    from mod_wsgi import version
    is_command_line = False
except:
    pass

def naive_hash(s): return sum(map(ord, s))

<<server-side class change>>
<<server-side template change>>

Again, the class change is simple:

previous python next

class ClassChange():
    def __init__(self, verb, argument):
	self.op = add_token if verb == 'add' else remove_token
	self.argument = argument

    def apply_to(self, node, context, selector):
	node.set('class', self.op(node.get('class'),
				  context.expand(self.argument)))

The template change is more involved.

previous python next

class TemplateChange():
    def __init__(self, verb, argument):
	self.verb = verb
	self.template = Template(argument)
	self.op, self.seq = {
	    'before': (Node.addprevious, None),
	    'prepend': (lambda self, x: Node.insert(self, 0, x), reversed),
	    'append': (Node.append, None),
	    'html': (Node.append, None),
	    'after': (Node.addnext, reversed),
	}[verb]

	# only for "change keys"
	self.argument = self.template.inner
	self.argument_hash = str(naive_hash(self.argument))

    def apply_to(self, node, context, selector):
	frags = self.template.evaluate(context)
	keys = None

	if self.verb == 'html':
	    # `clear' kills the attributes as well, which we don't want to do.
	    # This is an alternative to removing the child elements
	    # individually.
	    a = {k: v for k, v in node.attrib.items()}
	    node.clear()
	    for k, v in a.items(): node.set(k, v)

	    # For other verbs, initial text is dropped.
	    node.text = frags.text
	else:
	    keys = {
		'selector': selector,
		'verb': self.verb,
		'argument': self.argument_hash,
		# TODO: This should only include the context actually used by
		# this template.
		'context': '/'.join(k + '=' + v for k, v in context.dictionary.items()) }

	for ele in self.seq(frags) if self.seq else frags:
	    # TEMP: Don't emit these keys when running from the command line.
	    # You could very well want them in some cases, but I don't happen
	    # to, and this is the easiest way to distinguish the cases.
	    if keys and not is_command_line:
		for key, value in keys.items():
		    ele.set('data-getflow-' + key, value)
	    self.op(node, ele)

Here we read the XML data into JavaScript objects.

previous js next

Read change rules

const CLASS_VERB_SUFFIX = /_class$/;

function is_class_verb(verb) {
	return CLASS_VERB_SUFFIX.test(verb);
}

function read_change_rules(ruleNode) {
    return make_array(ruleNode.childNodes).reduce((rules, node) => {
	let verb_node;

		// Skip over comment nodes
		do verb_node = (verb_node || node).nextSibling;
		while(verb_node && verb_node.nodeType != ELEMENT_NODE)

	if (node.nodeType == TEXT_NODE && verb_node) {
	    const verb_name = verb_node.nodeName,
				  is_class_change = verb_name == 'class',
				  selector = node.nodeValue.replace(/^\s+|\s+$/g, '');
	    if (selector)
		rules.push(
		    is_class_change
			? ClassChangeRule(
			    selector,
			    verb_node.attributes[0].nodeName + '_class',
			    verb_node.attributes[0].nodeValue)
			: TemplateChangeRule(
			    selector,
			    verb_name,
			    markup_of(verb_node))
		);
	}

	return rules;
    }, []);
}

And here is the Python version:

previous python next

def read_change_rules(node):
    # A change rule is a selector (text) followed by the change (an element).
    # We only consider non-empty text, of course.
    for selector, ele in zip(
	    node.xpath("text()['' != translate(normalize-space(), ' ', '')]"),
	    node.xpath('*')):
	yield (
	    selector.strip(),
	    (ClassChange(ele.xpath('name(@*[1])'), ele.xpath('@*[1]')[0])
	     if ele.tag == 'class' else
	     # Includes the outer tag, which is not part of the template
	     TemplateChange(
		 ele.tag,
		 etree.tostring(ele, with_tail=False, encoding='unicode'))))

2.8 context and templates

Templates are how you get lots of things from one rule. Context is how you tell a template where you are.

Getflow uses a simple template language.

Templates are XML fragments. Certain bits have special meaning to the template evaluator. When given a context, those places will be expanded.

evaluation of content of eval elements
evaluation inside of {} in attributes
evaluation of xslt elements

Within those rules you can leave certain things undecided (“variable”).

Variables remain undecided until a choice is made, often by a person, while the program is running. So the exact way that things look won’t be decided when you write the code. You yourself won’t see it unless you “run the program” and, typically, make choices.

The above example shows a simple “fruit tree.” The split boxes show where we might use a class to consolidate “siblings” that can be described in a regular way. Whenever you have a common data structure representing a collection of things, these are well served by a combination of context and templates.

context: where you are now, usually a choice that the person has made
template: the shape of a type of thing, which will never be seen as such, but will be “filled out” based on a context, then shown to the person

Great, when can I start?

2.9 evaluating in context

The context is bascially a key-value map from the routes. In practice, it’s an XPath expression where the route groups are elements and their values are text.

You can use context in a number of places. Currently, context evaluation is supported in

selectors
eval elements inside of templates
within {} expressions in attributes in templates (including xslt inputs and arguments)

“Context” as defined here is really the composition of two things: an expression evaluator, and a “mini-language” (a single construct) for switching contexts between literal text and expressions that can be evaluated (in this case, using curly braces). I think it’s mainly the lack of a name for that latter thing that keeps me from separating them.

previous js next

function Context(map) {
    // 'd' is a dummy root element
    var doc = document.implementation.createDocument('', 'd');
    for_each(Object.keys(map), key => {
	var ele = doc.createElement(key);
	ele.appendChild(doc.createTextNode(map[key]));
	doc.documentElement.appendChild(ele);
    });

    function evaluate(expression) {
	return doc.evaluate(expression, doc.documentElement, null,
			    XPATH_STRING, null)
	    .stringValue;
    }

    return {
	map: map,
	evaluate: evaluate,
	expand: function(text) {
	    return (text || '').replace(placeholder_pattern, function(_, expression) {
		return evaluate(expression);
	    });
	}
    };
}

And here it is in Python.

previous python next

import re
from lxml import etree

class Context:

    placeholder = re.compile(r'{(.+?)}')

    def __init__(self, dictionary):
	self.dictionary = dictionary # only used for "change keys"
	self.node = etree.Element('dummy')
	for key, value in dictionary.items():
	    item = etree.Element(key)
	    item.text = value
	    self.node.append(item)

    def evaluate(self, expression):
	return self.node.xpath('string('+expression+')') if expression else ''

    def expand(self, text):
	return self.placeholder.sub(lambda mo: self.evaluate(mo.group(1)), text)

Now, here are the Python tests.

previous python next

from getflow.context import Context
from test_helpers import check

def test_evaluate(case):
    expression, dictionary, expected = case
    check(case, expected, Context(dictionary).evaluate(expression))

def test_expand(case):
    text, dictionary, expected = case
    check(case, expected, Context(dictionary).expand(text))


print('------ evaluate:')
[test_evaluate(case) for case in (
    ('', {}, ''),
    ('', {'x': 'abc'}, ''),
    ('x', {'x': 'abc'}, 'abc'),
    ('y', {'x': 'abc'}, ''),
    ('y', {'x': 'abc', 'y': 'def'}, 'def'),
    ('xy', {'x': 'abc', 'y': 'def'}, ''),
    ('"pq"', {}, 'pq'),
    ('concat("pq", "qp")', {}, 'pqqp'),
    ('translate("hello", "aeiou", "AEIOU")', {}, 'hEllO'),
    ("translate('hello', 'aeiou', 'AEIOU')", {}, "hEllO"),
    ('concat(x, z)', {'x': 'abc', 'y': '123', 'z': '890'}, 'abc890'),
)]

print('\n------ expand:')
[test_expand(case) for case in (
    ('a{x}b', {}, 'ab'),
    ('a{x}b', {'x': '123'}, 'a123b'),
    ('a{x}b{x}', {'x': '123'}, 'a123b123'),
    ('a{x}b{y}', {'x': '123', 'y': '456'}, 'a123b456'),
    ('a{a}b{a}', {'a': '123'}, 'a123b123'),
    ('a{a}b{b}', {'a': '123', 'b': '456'}, 'a123b456'),
    ('a{abc}b', {'abc': '789'}, 'a789b'),
    ('({concat(x, z)})', {'x': 'ALPHA', 'y': 'BRAVO', 'z': 'ZED'}, '(ALPHAZED)'),
    ('({translate(x, "aeiou", "AEIOU")})', {'x': 'hello'}, '(hEllO)'),
    ("({translate(x, 'aeiou', 'AEIOU')})", {'x': 'hello'}, '(hEllO)'),
    ("({translate(x, '.', '-')})", {'x': '1.1'}, '(1-1)'),
)]

2.10 applying changes

previous js next

Apply a change

function promise_to_apply_bound_change(change) {
	return new Promise(resolve => {
		let i, stuff_to_insert, node;
		const context = change.context,
			  path = change.path,     // for marking inserts
			  keys = change.keys,
			  selector = change.selector,
			  verb = change.verb,
			  argument = change.argument,
			  // "Removal" selectors are constructed internally and may contain
			  // context expressions, which should be treated literally.
			  nodes = document.querySelectorAll(
				  verb == 'remove'? selector : context.expand(selector));

		// Could stop now if nothing matches.

		// Is this a template operation?
		if (/insert|prepend|append|before|after|html/.test(verb)) {

			// If this is a call from a prior iteration
			if (argument instanceof DocumentFragment)
				stuff_to_insert = argument;

			else {
				stuff_to_insert = change.rule.fragment.cloneNode(true);

				// Evaluate context in "eval" nodes
				for_each(stuff_to_insert.querySelectorAll('eval'),
						 eval_node => eval_node.parentNode.replaceChild(
							 eval_node.ownerDocument.createTextNode(
								 context.evaluate(markup_of(eval_node))),
							 eval_node));

				// Evaluate context in attributes.
				scan_attributes(stuff_to_insert, attribute => {
					const value = attribute.value;
					if (/\{.*\}/.test(value))
						attribute.value = context.expand(value)
				});
			}

			// Does this thing have transforms?
			const xslt_node = stuff_to_insert.querySelector('xslt');
			if (xslt_node) {

				// Yes, this thing has transforms.      Resolve one transform, and
				// call this routine back.
				const transform_file = xslt_node.getAttribute("transform");
				const document_file      = context.expand(xslt_node.getAttribute("input") || "");

				Promise.all([
					GET_XSLT(transform_file),
					GET_XML(document_file)])
					.then(results => {
						// Destructuring here would be nice, right?      But then
						// you get some stupid shim from the transpiler.
						var proc = results[0];
						var input_document = results[1];

						// Set transform arguments
						for_each(xslt_node.attributes, attribute => {
							if (!/^(transform|input)$/.test(attribute.nodeName))
								proc.setParameter(null,
												  attribute.nodeName,
												  context.expand(attribute.value));
						});

						xslt_node.parentNode.replaceChild(
							proc.transformToFragment(input_document, document),
							xslt_node);

						// When those things are both loaded, then call apply_change
						// with the reified template.
						resolve(promise_to_apply_bound_change({
							selector,
							verb,
							argument: stuff_to_insert,
							context,
							keys,
							path,
							rule: change.rule
						}));
					}, error => {
						throw error;
					});
				return;
			}

			// Now the template is "resolved" and ready to add to the document.
			// But first, since we know that we may want to undo this change
			// later (to back out of this path), we mark all of the top-level
			// elements being inserted.  Note that this won't work for plain
			// text nodes.
			for_each(children_of(stuff_to_insert),
					 top_level_element =>
					 for_each(Object.keys(keys), key =>
							  top_level_element.setAttribute('data-getflow-' + key,
															 keys[key])));
		}

		for (i = 0; i < nodes.length; i++) {
			node = nodes[i];

			if (is_class_verb(verb)) {
				var classes = context.expand(argument).split(' ');
				for (var j = 0; j < classes.length; j++) {      
					node.classList[verb.replace(CLASS_VERB_SUFFIX, '')](classes[j]);
				}
			} else if (verb == 'html') {
				while (node.firstChild) {
					node.firstChild.remove();
				}
				node.appendChild(stuff_to_insert);
			}
			else if (verb == 'remove')
				node.remove();

			else if (verb == 'append')
				node.appendChild(stuff_to_insert);

			else if (verb == 'before')
				node.parentNode.insertBefore(stuff_to_insert, node);

			else if (verb == 'after')
				node.parentNode.insertBefore(stuff_to_insert, node.nextSibling);

			else if (verb == 'prepend')
				node.insertBefore(stuff_to_insert, node.firstChild);

			else
				debug_message("getflow: unknown verb :" + verb);
		}

		resolve(true);
	});
}

Whew. That’s the gnarliest part of the client. It’s much more straightforward on the server, which is still in happy-synchronous land, reading files from the disk!

previous python next

from lxml import etree

def inner_xml(node):
    return (node.text or '') + ''.join(etree.tostring(e, encoding='utf-8').decode('utf-8') for e in node)

# Treat rooted paths as filenames relative to the site root.
class SiteResolver(etree.Resolver):
    def resolve(self, url, pubid, context):
	return self.resolve_filename(url.lstrip('/'), context)


class Template:
    parser = etree.XMLParser()
    parser.resolvers.add( SiteResolver() )

    # Although the template is a "fragment," it should be pre-wrapped for (our)
    # convenience.  That is, the string accepted by the constructor should be a
    # well-formed XML element, whose outer tag will be ignored.
    def __init__(self, xml):
	self.xml = xml
	# Just for "change keys"
	self.inner = inner_xml(etree.fromstring(self.xml))

    def evaluate(self, context):
	clone = etree.fromstring(self.xml)

	# We must convert to a list because we're modifying the tree while
	# traversing it.  Since we just parsed the thing, consider an iterparse
	# based approach, as here http://stackoverflow.com/a/22495071
	for node in list(clone.iter()):

	    if node.xpath('boolean(@*[contains(., "{")])'):
		for key, value in node.attrib.items():
		    node.set(key, context.expand(value))

	    if node.tag == 'eval':
		node.text = context.evaluate(node.text)

	    elif node.tag == 'xslt':
		document = etree.parse(node.get('input'), self.parser)
		transform = etree.XSLT(etree.parse(node.get('transform'), self.parser))
		arguments = {k: etree.XSLT.strparam(v) for k, v in node.attrib.items()}

		result = transform(document, **arguments)
		node.clear()

		# Replace content.  See note.
		if result:
		    node.getparent().replace(
			node, etree.fromstring('<xslt>'+str(result)+'</xslt>'))

	etree.strip_tags(clone, 'eval', 'xslt')

	return clone

Well, there is one wrinkle. That “replace content” part is inefficient, assuming that the result tree was generated in-memory. But I can’t find any way to access the full result through the result tree, when the result is a fragment. This gets close, but it doesn’t include leading text:

previous python next

first_result = transform(document, **arguments).getroot()
if first_result is not None:
    node.extend(first_result.itersiblings())
    node.insert(0, first_result)

The server version does have unit tests, unlike the client. I’m not sure these are worth it.

previous python next

from getflow.templates import Template, inner_xml
from getflow.context import Context
from lxml import etree
from test_helpers import check

def do_test(case):
    fragment, context, expected = case
    check(case, expected,
	  inner_xml(Template('<r>'+fragment+'</r>').evaluate(Context(context))))

[do_test(case) for case in (
    ('<a/>', {}, '<a/>'),
    ('<a>text</a>', {}, '<a>text</a>'),
    ('<a><!--comment--></a>', {}, '<a><!--comment--></a>'),
    ('<a>text and <!--comment--></a>', {}, '<a>text and <!--comment--></a>'),
    ('<a><eval></eval></a>', {}, '<a></a>'),     # Should this throw?
    ('<a><eval>expr</eval></a>', {}, '<a></a>'), # Should this throw?
    ('<a><eval>expr</eval></a>', {'expr': 'value'}, '<a>value</a>'),
    ('<a x="{x}" y="{z}">text</a>', {'x': '123', 'z': 'def'}, '<a x="123" y="def">text</a>'),
    ('<a x="{x}"><eval>expr</eval></a>', {'expr': 'value', 'x': '123'}, '<a x="123">value</a>'),
    ('<a x="0{x}"><eval>expr</eval></a>', {'expr': 'value', 'x': '123'}, '<a x="0123">value</a>'),
    ('<a x="{x}"><b y="{z}"/></a>', {'z': 'q', 'x': '123'}, '<a x="123"><b y="q"/></a>'),
    ('head<a/>', {}, 'head<a/>'),
    ('<a/>tail', {}, '<a/>tail'),
    ('head<a/><b/>', {}, 'head<a/><b/>'),
)]

Back to the client…

Once we know the locations that lie between where we are and where we’re going, we can use the site definition (here called routes) to collect all of the change rules that will apply.

previous js next

Reverse a forward change

function reverse_class_verb(verb) {
	return verb[0] == 'a' ? 'remove_class' : 'add_class';
}

// The "change keys" will be used to identify anything that is inserted, so that
// it can be removed later.
function reverse_forward_change(rule, change_keys) {
    var verb = rule.verb;
    var selector = rule.selector;

    // TODO: reversing HTML changes is much more involved
    if (verb == 'html')
	return {
	    fragment: rule.fragment,  // TEMP
	    selector,
	    verb,
	    argument: ''        // TBD: lookup content in parent
	};

    // Class change
    if (is_class_verb(verb))
	return {
	    selector,
	    verb: reverse_class_verb(verb),
	    argument: rule.argument
	};

    var remover =
		Object.keys(change_keys).map(
			key => '[data-getflow-'+key+'="'+change_keys[key].replace(/"/g, '\\$&') + '"]')
		.join('');

    // Template change
    return {
	// Assumes that all nodes inserted by getflow were marked thus
	selector: remover,
	verb: 'remove'
    };
}

Regarding the escaping of quotes in attribute selectors, see “Strings”, CSS2 Specification.

That function is supposed to behave like this:

Turning off export. Because of the new way of marking inserts, it’s not clear that this test can be done meaningfully.

Change rules themselves cannot be applied directly. They have to be done in a context. Moreover, they can also done in reverse.

Before we can actually apply a change, we have to first:

(possibly) reverse it, i.e. turn it into an “undo” rule
resolve the rule in some context

Step 2 is an asynchronous operation, since it may require looking up external resources (not to mention other processing).

previous js next

function BoundChange(rule, backwards, context, path) {
    var keys = get_bound_rule_keys(rule, context);
    var actual_change = backwards? reverse_forward_change(rule, keys) : rule;

    return {
	rule: rule,
	backwards: backwards,
	context: context,
	path: path,
	keys: keys,
	selector: actual_change.selector,
	verb: actual_change.verb,
	argument: actual_change.argument
    };
}

previous js next

Collect all changes between two locations

function reverse_class_verb(verb) {
	return verb[0] == 'a' ? 'remove_class' : 'add_class';
}

// The "change keys" will be used to identify anything that is inserted, so that
// it can be removed later.
function reverse_forward_change(rule, change_keys) {
    var verb = rule.verb;
    var selector = rule.selector;

    // TODO: reversing HTML changes is much more involved
    if (verb == 'html')
	return {
	    fragment: rule.fragment,  // TEMP
	    selector,
	    verb,
	    argument: ''        // TBD: lookup content in parent
	};

    // Class change
    if (is_class_verb(verb))
	return {
	    selector,
	    verb: reverse_class_verb(verb),
	    argument: rule.argument
	};

    var remover =
		Object.keys(change_keys).map(
			key => '[data-getflow-'+key+'="'+change_keys[key].replace(/"/g, '\\$&') + '"]')
		.join('');

    // Template change
    return {
	// Assumes that all nodes inserted by getflow were marked thus
	selector: remover,
	verb: 'remove'
    };
}

function changes_between(from_path, to_path, routes) {
    return steps_between(from_path, to_path).reduce((all, step) => {
		// Yeah a little destructuring here would be nice but stupid transpiler
		// emits some junk when you use it.
	const path = step[0];
	const backwards = step[1];
	const route_match = find_route(path, routes);
	const route = route_match[0];
	const match = route_match[1];
	const context = Context(match.context);

	const bound_changes =
			  route.change_rules.map(
				  rule => BoundChange(rule, backwards, context, path));

	// When going "backwards," we not only reverse the operations
	// themselves, but also the order in which we do them.  Note that this
	// applies only to this batch, i.e. the rules for one path step.
	if (backwards)
	    // MUTATION: JavaScript's Array.prototype.reverse() reverses the
	    // array *in-place*.
	    bound_changes.reverse();

	// This creates an intermediate (throwaway) array for each batch.
	return route? all.concat(bound_changes) : all;
    }, []);
}

3 flow (the client)

Getflow’s basic requirement is to achieve “parity” between the client and the server as far as carrying out the rules is concerned.

But that’s kind of unfair to the client. The client has to do almost everything that the server does, and considerably more—not least of which are the “state transitions” that are getflow’s entire raison d’être. This section covers those special burdens.

The client side is special. Why? Because it’s closest to the person. It works directly with the person’s documents in memory, unlike a server implementation, which will generally be communicating with the internet via text (even if it uses in-memory representations of the document internally).

The getflow that runs inside the web browser is necessarily more essential than one that runs on the server because it directly implements the transitions that people see.

3.1 the person wants to go somewhere

This is what it’s all about.

As noted elsewhere, getflow is all about links. The one basic thing that getflow does is handle links. When someone indicates that they want to follow a link, getflow might be able to do its thing.

How do we do that? We have to listen. We listen for an “event” indicating that someone is trying to act on a link.²

listen for a touch
previous js next

Listen for a touch

document.addEventListener("click", the_person_touched_the_site);

TODO: the following notes are a little tangled up between the notion of stopPropagation and that the of the (currently implicit) false in the above listener. Some of this explains why we use bubble instead of capture, but it’s rather tied up with the other matter.

The astute reader may observe that, getflow not being the only piece of software in the world, it’s possible that even here on this page, some other piece of software may also be interested in “clicks” on links. So at this point, we have to ask, do we want first dibs on this click event? See, there are two methods of capturing events in the DOM. A pretty good explanation of this (with ASCII art!) is at Peter-Paul Koch’s QuirksMode’s article “Event Order” (from his book ppk on JavaScript) (http://www.quirksmode.org/js/events_order.html), (although note that notwithstanding the heading “Page last changed today”, this content is rather old and mostly of historical interest).

So we have the option to “capture” this event before lower-down elements (such as the link itself) see it, or we can be polite and let any such handlers go first, after which the event will “bubble” up to us. We’ll assume that a more specific handler (as such low-down handlers would be) knows more about the the intended outcome than we do. Besides, stopping propagation is just generally rude, as it will cause unexpected results. As Philip Walton puts it,

If you’re ever unsure about what to do, just ask yourself the following question: is it possible that some other code, either now or in the future, might want to know that this event happened? The answer is usually yes.³

We here at getflow don’t place a high premium on playing well with others, but we’re not sociopaths, either; we don’t go out of our way to be hard to live with. So we’ll be polite, strictly for the reason that someone else might not be; that is, some lower-level handler may want to respond to the event and then stop propagation.

previous js next

The person touched the site

function the_person_touched_the_site(event) {
    var link = event.target;

    <<shall we respond to this touch>>
    <<okay try to handle it>>
}

3.1.1 shall we respond to this touch?

All we know is that someone touched the site somewhere. Before we proceed, we have to make sure that this is actually something we can take care of.

was it a link that the person touched?

Links are how people get around (in getflow’s worldview). If the touched-thing wasn’t a link (an <a> tag), move along.

If link itself contains visible sub-elements, then the event target may be one of those lower-down elements.

To be sure whether or not the touch was inside of a link, we have to “walk up the tree” looking for a link.

previous js next

while (link.nodeName !== "A") {
    link = link.parentNode;
    if (!link) {
	return;
    }
}

aside: Regarding the fact that nodeName is ‘A’ for links (and not ‘a’)

If you ever see anyone writing HTML tags written in uppercase, just remain calm, excuse yourself, and slip away quietly. Under no circumstances should any living person write

html

<A HREF="http://geocities.yahoo.com/80s_dude">My home page!</A>

Yet, according to the W3’s definition of the “Element interface,"⁴

The HTML DOM returns the tagName of an HTML element in the canonical uppercase form, regardless of the case in the source HTML document.

If this discursion has not fully quenched your thirst for arcana, you can read (lots) more about “.nodeName Case Sensitivity” from the man himself, John Resig.⁵

was this touch supposed to create a new “portal”?

Okay, that’s the closest I could come to a technology-agnostic way of saying, Was this supposed to open a new browser tab? In that case, we must bypass getflow (i.e. leave this tab alone). The following test is a hack to enable Ctrl+click for opening in another tab. I don’t know of a more semantic way to do this (i.e. one which relies on an explicit “open link in new tab” intent).

shall we respond to this touch next
previous js next

if (event.ctrlKey
	|| event.which !== LEFT_MOUSE_BUTTON
	|| link.target) {
    return;
}

Note that the presence of a target attribute also indicates that the destination is intended to be a separate window or tab.

is the destination in this domain?

Naturally, getflow only handles links within the domain.

previous shall we respond to this touch
previous js next

if (link.host !== window.location.host) {
    return;
}

3.1.2 okay, try to handle it

Okay, getflow is going to (try to) handle this.

okay try to handle it next
previous js next

Try to go to the place

try {
    go(link.href, 'link');
} catch (error) {
    debug_message(error);
}

If getflow thinks that it succeeded, then we don’t want the “natural” (or “default”) behavior of the link (viz, having the browser GET a completely new page).

previous okay try to handle it
previous js next

Suppress the “old” way of traveling

event.preventDefault();

And that’s it. As noted elsewhere, we let the event continue propagating.

3.2 actually go somewhere

It all comes down to this. Give me an address, and I’ll take you there.

Note that this is a state-manipulating function.

previous js next

Go somewhere

<<simplify changes>>

function go(to, source) {
	const url = parse_url(to);
	const to_path = url.pathname;
    const from_path = state.path;

	// TODO: I'm going to get rid of one of the following methods.  If the
	// latter, then I'll need a PubSub here.  Note that although
	// window.dispatchEvent is used below, that's a special case, since it has
	// to be catchable before you know that getflow has loaded.
	if ('function' == typeof _on_going) {
		try {
			_on_going(from_path, to_path);
		}
		catch (e) {
			debug_message(e);
		}
	}
	// Another way of doing this that allows multiple dispatch without a
	// separate PubSub,
	if (Event)
		window.dispatchEvent(
			new CustomEvent(
				'getflow-going', { detail: { from_path, to_path, source, } }));

    function update_path_state() {
	state.path = to_path;

		// Popstate is usually triggered by the "back" button (but also by
		// history.back()).  Either way, the location is already in history.
		if (source != 'popstate')

			// state has non-serializable things in it now, triggering Firefox's
			// "cannot clone" error if you pass it here.
			window.history.pushState(url, '', to);
    }

	function go_there(blueprints) {

		const changes =
			  changes_between(from_path, to_path, blueprints);

		const promises =
			  simplify(changes)
			  .map(change =>
				   () => promise_to_apply_bound_change(change));

		return chain(promises)
			.then(() => maybe_scroll_to(url.hash))
			.then(update_path_state, debug_message);
	}

	return get_the_blueprints.then(go_there);
}

3.3 smooth scrolling

At the end of a state transition, the web site should always be in the state that it would have started in if you’d gone directly to that same address. And if the address included an “anchor” (or “hash”), then a specific part of the document is supposed to be scrolled to the top.

So after the DOM-manipulating is done, there may still be more to do.

previous javascript next

<<scroll to>>

function maybe_scroll_to(hash) {
	let id = hash.slice(1), target;

	if (id && (target = document.getElementById(id)))
		return scroll_to(target, 500);

	return resolve_now();
}

Since the whole point of getflow is to provide fluid transitions, it’s assumed that the scroll should be “smooth” rather than abrupt.

Indeed, if not for that, this would be a one-liner.

previous javascript next

target.scrollIntoView();

So it’s not altogether surprising that smooth scrolling is coming to the platform.⁶

What good does that do here? The scroll-behavior: smooth property will only have any “natural” effect if the user clicks a link whose anchor points to the same page. But getflow has to deal with the scrolling “long” after the initial click. CSS isn’t going to help here.

Fortunately, there’s an API.⁷ When it’s not available, a custom animation is used.

previous javascript next

function scroll_to(target, duration) {
	let layer = target, is_main_layer;

	// Force abrupt jump if signaled by a special attribute.
	if (target.getAttribute('getflow-scroll-behavior') == 'auto') {
		target.scrollIntoView();
		return resolve_now();
	}

	while (layer = layer.parentElement) {
		is_main_layer = layer == document.documentElement;

		if (is_main_layer
			|| window.getComputedStyle(layer).overflowY == 'scroll')
			break;
	}

	// Best method: Smooth scrolling API.
	if (layer.scrollBy) {
		layer.scrollBy({
			behavior: 'smooth',
			top: target.getBoundingClientRect().top});

		// Worst method: Abrupt jump to the element if we can't animate.
	} else if (!queue_frame) {
		target.scrollIntoView();

		// Compromise: Roll your own animation.
	} else return animate(duration, ratio => {
		const change = ratio * target.getBoundingClientRect().top;

		is_main_layer?
			window.scroll(0, window.scrollY + change)
			: layer.scrollTop += change;

		return true;
	});

	return resolve_now();
}

Regarding the ambiguity about which layer is the “main layer,” the browsers have not agreed on whether document.body or document.documentElement is supposed to be the main scrolling container. As near as I can tell, the “spec” indicates document.documentElement [citation needed]. But Chrome doesn’t play along (and Safari, too, AFAIK). Rather than trying to detect which browser we’re running in, we write this in a “cross-browser” way. That means treating the “main document” as a special case. (The browsers work the same way for scrolling child elements).

3.4 state

Like it or not, we will have internal state that we need to keep track of. In particular, we need to maintain a “current path” that is separate from the browser’s “actual” current path (given by window.location), because, unlike in the standard GET model, we don’t change location all at once.

previous js next

State variable

// TODO: why do you decode the search here?
// normalizePathname(window.location.pathname) +
//         decodeURIComponent(window.location.search)

var state = {
    path: normalizePathname(window.location.pathname) +
	decodeURIComponent(window.location.search)
};

If this site is not being run over HTTP, but over a file system, then the initial path will be assumed as the root. This is only needed for the Android app, where, inside of Cordova, window.location.pathname gives us some implementation-specific nonsense (see “fix paths” plugin for “the app”).

state variable
previous js next

if (window.location.protocol == 'file:')
	state.path = '/';

3.5 the program

3.5.1 declarations versus actions

So, “everything that we will do” is divided into two sections: declarations and actions.

previous tangle next
everything that we will do
previous js next

Everything that we will do

<<declarations>>
<<actions>>

We call this the program’s “lexical” structure⁸, because it’s the order in which the code will be read by the computer (declarations first), as distinct from this document, which is arranged for human reading. The above placeholders allow us to “tangle” code blocks into one of those two places, regardless of where we present them here. For our purposes, the sections are defined this way:

action: anything that affects the outside world
declaration: anything that doesn’t

In practice, then, a declaration is basically the assignment of an initial value to a locally-scoped variable, including function definitions; whereas an action can mean actually calling (or “executing” a function), modifying the global scope, including the DOM, and sending network requests.

So we will consider the actions first, since the whole point of the program is of course to have some effect, and thus the action is, as it were, where the action is; while the non-effecting parts of the program exist only in support of those ends. That said, the actions make up only a small portion of the program. And the declarations must lexically precede the actions, since the actions will have to refer to them.

previous tangle next
declarations
previous js next

Outline of declarations

<<pure functions>>
<<constants>>
<<state variable>>
<<the language>>
<<routing>>
<<making changes>>
<<responding to the person>>

actions
previous js next

Outline of actions

<<read the blueprints>>
<<listen for a touch>>
<<deal with history>>
<<announce ourselves to the world>>

3.5.2 create a scope

Perhaps JavaScript owes some of its staying power to the fact that many of its defects can be effectively bandaged by its “good parts.” Almost every JavaScript program you see will have an overall structure like the following, in which the hazard of its default (global) scoping is patched in the conventional way by the goodness of its function closures.

previous tangle
getflow/client/getflow.js
yes
previous js next

Main program

((window, document) => {

    <<everything that we will do>>

}(window, document));

This bracketed function area appears to “fence in” everything that we do, and indeed, that is its effect. It creates an “enclosure” (or “local scope”) in which we can selectively isolate our work from the world (i.e. the rest of the web page). Without it, we would be working in a “global namespace,” where every single thing we do would potentially clash with some other program in that same space by choosing the same name for some object that we create. We have no way to know what other programs we’ll find ourselves having to live with, so this fencing-in effectively solves a major problem. As we will see shortly, this control over visibility is also very useful for “reasoning about” the program.¹

Note that with ES2015, the scope thing is no longer an issue. The above is now strictly about the stupid minifier trick.

3.6 traversal / movement / steps / “routes”

Between any two locations in the site, there is one way to go—at least in getflow’s view. These functions calculate that route. We know that every location has one direct route to the “top” (or “root”, or “home”), namely by hacking off pieces until there’s nothing left. The route between two locations, then, is to go “upwards” from where you are until to reach a path that is in your destination’s ancestry, which in the “worst case”, will be the very top of the site.

Here’s how we expect path traversal to behave.

In finding the shortest path between two places, it helps to think of each location as seeing a ray from home.

Of course, the place we’re going also connects to home this way:

`/`	↔	`/`	&check;
`/a`	↔	`/` → `/a`	&check;
`/a/b`	↔	`/` → `/a` → `/a/b`	&check;
`/a/b/c`	↔	`/` → `/a` → `/a/b` → `/a/b/c`	&check;
`/a/b/c?q`	↔	`/` → `/a` → `/a/b` → `/a/b/c?q`	&check;

previous js next

    <<steps from home>>

// The route will go "backwards" from the starting point, and, when it hits the
// ray pointing to the destination, it will get on board.
function steps_between(A, B) {
    // These are the "rays" from home.
    var home_to_A = steps_from_home_to(A);
    var home_to_B = steps_from_home_to(B);

    // How many steps are in common?
    var common = 0;
    while (common < home_to_A.length
	   && common < home_to_B.length
	   && home_to_A[common] === home_to_B[common]) {
	common = common + 1;
    }

    // Get the two legs of the route
    var steps_to_common = home_to_A.slice(common).reverse();
    var steps_from_common = home_to_B.slice(common);

    return steps_to_common.map(step => [step, true])
      .concat(
	steps_from_common.map(step => [step, false])
      );
}

This function behaves as follows:

from	`/`	to	`/`	the steps are		&check;
from	`/`	to	`/a`	the steps are	+ `/a`	&check;
from	`/`	to	`/a/b`	the steps are	+ `/a` + `/a/b`	&check;
from	`/`	to	`/a/b/c`	the steps are	+ `/a` + `/a/b` + `/a/b/c`	&check;
from	`/a`	to	`/a/b`	the steps are	+ `/a/b`	&check;
from	`/a`	to	`/a/b/c`	the steps are	+ `/a/b` + `/a/b/c`	&check;
from	`/a`	to	`/`	the steps are	– `/a`	&check;
from	`/a/b`	to	`/`	the steps are	– `/a/b` – `/a`	&check;
from	`/a/b`	to	`/a`	the steps are	– `/a/b`	&check;
from	`/a/b/c`	to	`/`	the steps are	– `/a/b/c` – `/a/b` – `/a`	&check;
from	`/a/b/c`	to	`/a`	the steps are	– `/a/b/c` – `/a/b`	&check;
from	`/a/b/c`	to	`/a/b`	the steps are	– `/a/b/c`	&check;
from	`/a`	to	`/x`	the steps are	– `/a` + `/x`	&check;
from	`/a/b`	to	`/x`	the steps are	– `/a/b` – `/a` + `/x`	&check;
from	`/a/b`	to	`/x/y`	the steps are	– `/a/b` – `/a` + `/x` + `/x/y`	&check;

What is the data structure for path traversal?

It must depend on the data structure that you use to define the edges, the shape of the thing. But that’s based on the our (implicit) model.

You might say, it simply lists the paths that you travel through. That’s the way that steps_between has worked so far.

But that doesn’t take into account edges. In other words, that takes for granted that there exists an edge between each of the nodes. But I think that that’s what the function is asserting—that you can travel between those nodes.

So the question is, is there a difference between the way we’d express the path if we had to be agnostic of where the edges lay (i.e., it might not be a tree), versus how we’d express it if it were a given that traversal were always up/down in a tree?

Surely, we could express things differently in the latter case. The result would be a list of tuples: location and direction (up or down). The latter bit is necessary in that you have to know whether the changes are to be reversed or not, but that again is specific to the method of traversal, where you know that you’re constructing each path as a set of changes to its parent. Of course, we have to cross the line into those assumptions at some point, the question is, where?

Also, related, should/can the return value assume that you know where you are to start with? Suppose it were a series of relative steps. Given that the caller must know what the start node was, is there any semantic difference or otherwise anything clearer about expressing the path not as having two absolute endpoints? Indeed, can you be agnostic of the starting point? I don’t think so, because the changes you’re reversing will be context-dependent. In other words, the path has to be in effect reversible itself, even though you won’t be traveling it in reverse.

The way that getflow works in practice is that you apply the router to each path in order to figure out what change rules to apply. But when you’ve “zipped” the steps, so that you have pairs, there’s an irregularity to it. You apply the router to the “deeper” path. Whereas, if you had just the location/direction pairs, you’d never actually need to reference the common path (the junction) explicitly. In other words, to go from /a to /b you would say reverse /a and forward /b, without ever mentioning /. Does that make sense? Ultimately it may, given that we often “cancel out” rules for lateral steps against the root. But that is optional, and there is a sense in which / is undisputably a path on the way from /a to /b (as we compute it).

Are the semantics of this data structure going to be helpful here, then?

The way that we go about implementing the transition takes some things for granted.

3.6.1 route matching

This is really just a route matching function. It’s little more than a find operation on the routes, that is, finding the first route whose pattern matches the given path. To some extent, the terminology is confusing, as “routing” is a common part of web frameworks, but the kind of route-finding that we do above is much more particular to getflow. In other words, I wonder whether this doesn’t belong with the DSL.

routing
previous js next

<<find route>>
<<steps between>>

previous js next

Find route

function find_route(path, routes) {
    var i, route;

    // HACK: first drop the query.  Queries will count towards context (I
    // expect), but aren't included when matching.
	path = path.replace(/\?.*/, '');

    for (i = 0; i < routes.length; i++) {
	route = routes[i];
	var match = route.matcher(path);
	if (match) {
	    return [route, match];
	}
    }

    return null;
}

3.7 shortcuts

At this point, we could argue that getflow is “correct.” It will get you from point A to point B, making all of the changes indicated by the blueprints.

But we can do better. Insert story about my first day as a bike courier.

Sometimes the “shortest route” between two points can be made even shorter with a little off-roadin’. Consider once again the matter of going between two sibling places.

Yes, this is the shortest “official” route. But in some cases—especially cases like this one—you may be able to skip some of those changes.

How? Consider the gray node. It has three children. Remember that you can use placeholders to describe many places using one set of rules. Let’s suppose that the rules for gray’s children look like this:

prepare to make a colorful place
set the color to {color}

When an actual path is chosen, the “color” placeholder gets filled in, and that’s what allows the “red” and “blue” nodes to look different while sharing the same template.

But note that the first rule doesn’t use a placeholder for color. That means that it’s the same exact bit of work for all of the children. So going from red to blue will look like this:

unset the color to red
unprepare to make a colorful place
prepare to make a colorful place
set the color to blue

Look at changes 2 and 3. What a waste of work! Why should we undo something only to immediately do it again! Instead, we really could just do this:

unset the color to red
set the color to blue

This would give us kind of a “shortcut” between those places:

So before we actually apply the changes that we’ve collected, we’re first going to identify changes that cancel each other out (or can otherwise be skipped), and get rid of them!

We’ll start by scanning through each of the changes. This setup allows us to traverse the list while remembering the indices of any change that we know we want to exclude.

previous js next

Setup for simplifying changes

<<compare changes>>

function simplify(changes) {
    var i, j, change, other, skips = [];

    for (i = 0; i < changes.length; i++) {
	change = changes[i];
	<<maybe skip this change>>
    }

    return changes.filter((_, i) => skips.indexOf(i) < 0);
}

For each change, we’ll apply the following logic.

First, duplicate “idempotent” changes will be removed. Class changes, specifically, can be applied over and over again with no further effect.

But I use scare quotes because we don’t check that the changes are successive, so intervening changes could impact which elements are matched, not to mention the content of the document. In fact, even the change itself could do so, if a class selector is being used. So this is not “safe” at all.

previous js next

let rule = change.rule,
	verb = rule.verb;
if (is_class_verb(verb)) {
	let reverse_verb = reverse_class_verb(verb);

	for (j = i + 1; j < changes.length; j++) {
		other = changes[j];
		let other_rule = other.rule,
			other_verb = other_rule.verb;
		if (change.rule.selector == other_rule.selector
		   && change.keys.context == other.keys.context) {

			if (change.rule.argument == other_rule.argument
				&& (verb == other_verb
					|| (change.backwards != other.backwards
						&& other_verb == reverse_verb)))
				skips.push(j);

			break;
		}
	}
}

I admit that that’s a rather horrid way to do something incorrect. See the test called “toggling state” to see why this is needed to cope with the following incorrect shortcut in some cases.

We also “wash out” changes that reverse each other. This, too, is done very loosely.

previous js next

// TODO: should also check for relativedepth = 0
if (change.backwards) {
    for (j = i + 1; j < changes.length; j++) {
	other = changes[j];
	if (!other.backwards &&
	    changes_match(change, other)) {
	    skips.push(i, j);
	    break;
	}
    }
}

3.7.1 comparing changes

You might have noticed the compare_changes reference above. In order for these heuristics to work, we need a way to compare rules to one another.

previous js next

Compare change rules

// This is a "pure function" but currently only used here.
function shallow_compare(A, B) {
    if (!A || !B)
	return false;

    const a_keys = Object.keys(A);
    const b_keys = Object.keys(B);
    return a_keys.length == b_keys.length && a_keys.every(
		key => A[key] === B[key]);
}

function changes_match(A, B) {
    return shallow_compare(A.keys, B.keys);
}

DEPRECATED: inputs match

I’m pretty sure we don’t need this now, since BoundChange.keys already includes all this (the context-evaluated expressions, and it’s limited to the used context values).

Of course, rules have different effects in different contexts. So in order to know that two “bound” changes really mean exactly the same thing, we also have to compare their contexts.

But we only want to compare the pieces of the context that are actually used by the changes.

previous js next

function inputs_match(A, B) {
    var i, input;
    var these = A.rule.inputs;
    var those = B.rule.inputs;

    for (i = 0; i < these.length; i++) {
	if (those.indexOf(these[i]) < 0) {
	    return false;
	}
    }

    for (i = 0; i < those.length; i++) {
	var input = those[i];
	if (those.indexOf(input) < 0 ||
	    A.context.evaluate(input) !== B.context.evaluate(input)) {
	    return false;
	}
    }

    return true;
}

3.8 dealing with history

When you travel, you build up a history of the places you’ve been.

And in a web browser, your history is treated as a stack.

A stack is what it sounds like: a bunch of things on top of one another. But pretend that you can’t touch the middle of the stack—only the top. If you’ve ever seen those spring-loaded plate dispensers in a restaurant kitchen, it helps visualize how a stack structure works. Pushing onto the stack means adding a new item, which always goes on top. Popping off of the stack, means removing an item, which is always from the top. You don’t get to lower-down things without first going through the upper (i.e. more recent) ones. This is also known as last-in, first-out access.

Browsers treat history as a stack because it’s close to how people experience exploration. You take a sequence of steps, and you can retrace those steps. When you go to a new place, it gets pushed onto the end of the list. When you go back, the place you’re leaving is popped off the end of the list, so that your next-most-recent place becomes your new current place.

Browsers do let you jump around in your history. You can also “rewrite” history by going a different direction from an earlier point. Your history always reflects a single sequence of steps. But I digress.

In the old model, web sites generally didn’t have to worry about history, because the browser handled it. History was a freebie. But in order to take advantage of AJAX to create fluid spaces—whose states are still addressable—you need programmatic control of the path. In other words, your web site has to decide what forward and back mean, and make them happen.

The History API (one of the first HTML5 features to be widely implemented) uses pushState (something the site can do) and onpopstate (something the user triggers by going “back”). Again, when you use the history API, it becomes your responsibility to make it work.

All of this to say, that if you use pushState (to go forward), you must handle the “pop state” event, which occurs when the user goes “back” (or when history.back() is called).

deal with history next
previous js next

Handling popped state

window.addEventListener('popstate', go_back);

So, what does it mean to “go back”?

Well, by the time the pop state handler is called, the document’s location has already changed. So it’s just a matter of telling getflow to go to the location that the browser already thinks is current.

previous deal with history next
previous js next

Go back!

function go_back() {
	go(document.location.href, "popstate");
}

See the note in update_path_state about the need for the source argument.

I used to pass: normalizePathname(document.location.pathname) + document.location.search. As I recall, that was a way of dealing with apostrophes in pathnames being different across browsers.

3.9 saying hello

Finally, we need to let the world know that we’re here! In truth, this is just for the system tests.

announce ourselves to the world
previous javascript next

// Some browsers (that *always* means IE) don't support custom events using
// Event.
if (Event) {
	window.dispatchEvent(new Event('getflow-ready'));
}

4 GET (the server)

This section covers matters that are specific to the server.

previous python next

4.1 GET

The GET part of getflow is only done on the server.

previous python next

from ..getflow.routes import make_path_pattern_matcher, look_up_address
from ..getflow.paths import hack_path
from ..getflow.plans import read_plans
from ..getflow.context import Context
from ..getflow.selectors import css_selector_to_xpath
from ..getflow.cache import Cache
from lxml import html
from io import StringIO
import os.path

# Poor man's "import as", which I'm avoiding because the code is actually
# concatenated.
route_matcher = make_path_pattern_matcher

class Site:
    def __init__(self, plans):
	self.directory = [route_matcher(rule['route']) for rule in plans]
	self.route_rules = {the['route']: the['change_rules'] for the in plans}

PLANS_FILE = 'getflow.xml'
site_cache = Cache(
    lambda: (Site(read_plans(PLANS_FILE))
	     if os.path.exists(PLANS_FILE) else None),
    lambda: (os.path.getmtime(PLANS_FILE)
	     if os.path.exists(PLANS_FILE) else 0))

def read_content(filename):
    with open(filename) as the_file:
	return the_file.read()

INDEX_FILE = 'index.html'
index_html_cache = Cache(
    lambda: read_content(INDEX_FILE),
    lambda: os.path.getmtime(INDEX_FILE))

def apply_changes(doc, rules, context):
    for selector, change in rules:
	for node in doc.xpath(css_selector_to_xpath(context.expand(selector))):
	    # Selector is only needed for "change keys"
	    change.apply_to(node, context, selector)

    return doc

And here it is, the notoriout GET function:

previous python next

def get(path, site):
    """Returns an HTML document for the given path."""

    if path == '/':
	# Need a new instance each time because we mutate it.  Note that even
	# though we have a string here, we cannot use `fromstring' because it
	# will return an element, not a document, thus losing the doctype.
	with StringIO(index_html_cache.get().decode('utf-8')) as f:
	    return html.parse(f)
    else:
	base_doc = get(hack_path(path), site)
	if base_doc is not None:
	    match = look_up_address(site.directory, path)
	    if match:
		route, context = match
		rules = site.route_rules[route]
		return apply_changes(base_doc, rules, Context(context))

    return None

Alternatively to the html.parse, you could use this version which kills whitespace. You’ll still get whitespace from templates and transform output, though, so it’d be cool to have a whitespace-killing writer. When I have all the tests passing, I’ll try this out.

previous python next

return html.parse('index.html', parser = etree.XMLParser(remove_blank_text = True))

Also, hit the index cache as soon as the server starts. This is kind of a development thing—I think it came up because of auto-reloading.

getflow/server/getflow/z-getflow.py
previous python next

if __name__ != '__main__':
    index_html_cache.get()

4.2 command-line interface

Since getflow is just a function of inputs and outputs, it can be called from the shell without any server running it all. Of course, since many of the inputs come from files, it has to be done in the site directory (or be told where it is).

previous python next

if __name__ == '__main__':
    import sys
    from lxml import etree
    from ..getflow.plans import read_change_rules

    if len(sys.argv) < 3:
	sys.exit('usage: getflow {index file} {changes file}')

    doc = etree.parse(sys.argv[1])
    rules = read_change_rules(etree.parse(sys.argv[2]).getroot())

    # Because Tup
    import os
    if (len(sys.argv) > 3):
	site_dir = sys.argv[3]
	if not os.getcwd() == site_dir:
	    os.chdir(site_dir)

    apply_changes(doc, rules, Context({}))

    # Even though we use this pattern elsewhere, it's not working as hoped here,
    # for reasons I will worry about later.
    #
    # print(html.tostring(doc, encoding='utf-8').decode('utf-8'))
    print(html.tostring(doc).decode('utf-8'))

This is useful for applying the initial change rules to the site index, which you may do as part of a build process (since getflow doesn’t otherwise process any rules against the site root).

4.3 WSGI integration

The “actual” web server is even smaller. It’s nothing but a very nominal WSGI binding.

previous python next

# -*- mode:python -*-

from getflow import get, site_cache
from lxml import html

def application(environ, start_response):
    output = ''
    response_headers = []

    path = environ['REQUEST_URI']

    site = site_cache.get()
    doc = get(path, site) if site else None
    if doc is not None:
	status = '200 OK'
	response_headers.append(('Content-Type', 'text/html'))

	output = html.tostring(doc,
			       doctype=doc.docinfo.doctype,
			       encoding='utf-8')

    else:
	status = '404 Not found'
	response_headers.append(('Content-Type', 'text/plain'))
	output = 'Friendly neighborhood not-found page'

    response_headers.append(('Content-Length', str(len(output))))
    start_response(status, response_headers)
    return [output]

handle_request = application

The WSGI spec says that your entry point must be called application. But for reasons unknown to me, the entry point must be called handle_request if you use the WSGIHandlerScript directive of mod_wsgi. For that same reason, it’s important to get the path from REQUEST_URI instead of PATH_INFO, because the only former works with both methods.⁹

4.4 selectors

One thing you get “for free” in the browser is a CSS selector implementation. (Even that wasn’t free until querySelectorAll… thanks, John Resig!)

On the server, no such luck. Fortunately, CSS selectors map pretty cleanly onto XPath expressions, which we do have. The selectors module implements a naïve CSS-predicate-to-XPath translation. It covers only the main constraints and their unions—no combinators.

previous python next

import re

css_predicate = re.compile(r'''
(                         # one of
(?P<tag>[^\#.[]+)         # - tag selector          some-element
|(\#(?P<id>[^\#.[]+))     # - id selector           #some-id
|(\.(?P<class>[^\#.[]+))  # - class selector        .some-class
|(\[                      # - attribute selector    [attribute="value"]
    (?P<attribute>[^\#.[]+) = "(?P<value>[^"]*)"\])
)
''', re.X)

def xpath_predicate(match):
    it = match.groupdict()
    return next(template for group, template in [
	['id'       , '@id="{id}"'],
	['class'    , 'contains(concat(" ",@class," ")," {class} ")'],
	['attribute', '@{attribute}="{value}"'],
	['tag'      , 'name()="{tag}"']
    ] if bool(it[group])).format(**it)

def css_selector_to_xpath(selector):
    matches = list(css_predicate.finditer(selector))
    xpath = '//*[' + (']['.join(map(xpath_predicate, matches))) + ']'
    return '('+xpath+')[1]' if any(map(has_singleton, matches)) else xpath

It does include a slight “optimization” for cases where only one match is expected.

previous next
getflow/server/getflow/selectors.py
previous python next

def has_singleton(match):
    _ = match.groupdict()
    return _['id'] or (_['tag'] in ['head', 'body', 'title', 'main'])

The html module from lxml does include a cssselect function, but apparently the CSSSelector package has to be installed separately. So I’m sticking with this for now.

Anyway, let’s test it:

previous python next

from getflow.selectors import css_selector_to_xpath
from lxml import etree
from test_helpers import check

test_doc = etree.fromstring("""
<<server selector test xml>>
""")

test_cases = (
    # (*) means the test covers multiple matches

    # Predicates
    ('Class', '.xyz', []),
    ('Class : (*)', '.abc', ['box1', 'box2', 'box3', 'link3']),
    ('Class : one of many on element', '.def', ['box3']),
    ('Class : exact name', '.de', ['box2']),
    ('ID', '#box1', ['box1']),
    ('ID', '#span1', ['span1']),
    ('Tag', 'span', ['span1']),
    ('Tag (*)', 'div', ['box1', 'box2', 'box3', 'box4']),
    ('Attribute value', '[data-key="400 years"]', ['box4']),
    ('Attribute value, including apostrophe', '[data-key="Who\'s on First?"]', ['p1']),
    ('Attribute value, including brackets', '[data-class="[very] special"]', ['box3']),
    ('Attribute Value (*)', '[data-class="special"]', ['box2', 'link2']),

    # Unions
    ('Tag and class', 'a.abc', ['link3']),
    ('Tag and class (*)', 'div.abc', ['box1', 'box2', 'box3']),
    ('Tag and attribute value', 'a[data-class="special"]', ['link2']),
    ('Tag and attribute value (2)', 'div[data-class="special"]', ['box2']),
    ('Class and attribute value', '.abc[href="#"]', ['link3']),
    ('Class and attribute value (*)', '.a-link[href="#top"]', ['link1', 'link2']),
    ('Classes', '.a-link.abc', ['link3']))

for test in test_cases:
    _, css_selector, matching_ids = test

    xpath = css_selector_to_xpath(css_selector)
    got_matches = list(test_doc.xpath(xpath))
    got_ids = [ele.attrib['id'] for ele in got_matches]

    if not check(test, matching_ids, got_ids):
	quit()

The test data as XML:

server selector test xml
xml

<html>
	<head>
		<title>sample</title>
	</head>
	<body>
		<div id='box1' class='abc'>
			hello
		</div>
		<hr/>
		<div id='box2' class='abc de' data-class='special'>
			there
			<a href='#top' id='link1' class='a-link'>here</a>
		</div>
		<div id='box3' class='abc def' data-class='[very] special'>
			world
			<span id='span1'>
				<a href='#top' id='link2' class='a-link' data-class='special'>yhere</a>
			</span>
		</div>
		<div id='box4' data-key='400 years' data-class='special [very]'>
			view
			<p id='p1' data-key='Who&apos;s on First?'>
				<a href='#' id='link3' class='abc a-link'>yhere</a>
			</p>
		</div>
	</body>
</html>

4.5 dom tokens

Another thing you don’t get for free on the server is the classList API, which is really just “a set of space-separated tokens.” The dom_tokens module implements the essential DOMTokenList API¹⁰, except with support for multiple tokens in add/remove. This latter bit may be removed (so it becomes the cilent’s job), for parity with the browser implementation.

previous
getflow/server/getflow/dom tokens.py
previous python next

import re

delimiter = re.compile(r'\s+')

def has_token(existing, token):
    return bool(existing) and token in delimiter.split(existing)

def add_token(tokens, add):
    if not bool(add): return tokens or ''
    if not bool(tokens): return add
    existing = list(delimiter.split(tokens))
    new = [t for t in delimiter.split(add) if t not in existing]
    return tokens + ' ' + ' '.join(new) if new else tokens

def remove_token(existing, remove):
    if not bool(remove): return existing or ''
    if not bool(existing): return ''
    removals = list(delimiter.split(remove))
    return ' '.join(t for t in delimiter.split(existing) if t not in removals)

And you better believe I test that.

previous python next

from getflow.dom_tokens import has_token, add_token, remove_token
from test_helpers import check

def token_test(case, function):
    existing, arg, expected = case
    check(case, expected, function(existing, arg))

print('\n\n------has token')
[token_test(case, has_token) for case in (
    (None, '', False),
    ('', None, False),          # Nonsense
    ('', '', False),            # classList throws
    ('', 'a', False),
    ('a', 'a', True),
    ('a', ' a', False),         # classList throws
    ('a', 'ab', False),
    ('a', 'ba', False),
    ('a b', 'a', True),
    ('a b c', 'b', True),
    ('alpha', 'alpha', True),
    ('alpha beta', 'alpha', True),
    ('alpha beta gamma', 'beta', True))]

print('\n\n------add token')
[token_test(case, add_token) for case in (
    (None, 'a', 'a'),
    ('a', None, 'a'),
    (None, None, ''),           # Nonsense
    ('', 'a', 'a'),
    ('', 'a b', 'a b'),
    ('a', 'b', 'a b'),
    ('b', 'a', 'b a'),          # Adds in the provided order
    ('a', 'a b', 'a b'))]       # Doesn't add duplicates

print('\n\n------remove token')
[token_test(case, remove_token) for case in (
    (None, '', ''),           # Nonsense
    ('a', None, 'a'),
    ('', '', ''),
    ('a', 'a', ''),
    ('a b', 'a', 'b'),
    ('a b', 'b', 'a'),
    ('a b c', 'b', 'a c'),
    ('a b c', 'a b', 'c'),      # though classList doesn't
    ('a b c', 'a c', 'b'),      #   take multiple tokens
    ('alpha', 'alpha', ''),
    ('alpha beta', 'alpha', 'beta'),
    ('alpha beta', 'beta', 'alpha'),
    ('alpha beta gamma', 'beta', 'alpha gamma'),
    ('alpha beta gamma', 'alpha beta', 'gamma'),
    ('alpha beta gamma', 'alpha gamma', 'beta'))]

4.6 caching

The server will load the same documents over and over again. Actually, the client does this caching, too, but it’s coupled with the GET.

previous python

import sys

class Cache:
    def __init__(self, get_value, get_mtime):
	self.get_value = get_value
	self.get_mtime = get_mtime
	self.cache = {}

    def get(self):
	try:
	    time = self.get_mtime()
	    if not self.cache or time > self.cache['mtime']:
		self.cache['value'] = self.get_value()
		self.cache['mtime'] = time
	except:
	    # Use the cached value if something goes wrong.
	    if not self.cache or not "value" in self.cache:
		raise

	return self.cache['value']

5 api / exports

This allows you to hook the beginning of a path transition. It was added as an expedient for willshake. If I keep it, I’ll polish it up a bit.

constants
previous js next

let _on_going;

Expose some stuff for use by others.

previous deal with history
previous js next

const getflow = {
	go, GET,
	GET_XML,
	animate,
	on_going(fn) { _on_going = fn; },
	// See notes below
	set_GET(alternate) { GET = alternate; }
};

// Is the site using an AMD loader?
if (window.define && window.define.amd) {
	// If so, assume that getflow will be loaded through it.
	window.define('getflow', [], getflow);
} else {
	// If not, just define it as a global the old fashined way.
	window.getflow = getflow;
}

Note that this only affects what the module exports. By the time we get to this point, all of the above setup is already done, whether a module loader is present or not.

set_GET allows you to wrap or replace the function used for making requests to the server. Exposing a setter like that is a stupid way to go about it, but maybe less stupid than the alternatives.

6 build the thing

Transitional, blah blah blah.

Build the client.

tup next

# Requires babel < 6
: $(ROOT)/getflow/client/getflow.js \
|> ^ 6 to 5 %B^ babel %f > %o \
|> $(ROOT)/getflow/client/es5/%b \
   $(ROOT)/getflow/<client>

: $(ROOT)/getflow/client/es5/getflow.js \
|> ^ minify %B^ uglifyjs --mangle \
   --compress hoist_vars \
   %f > %o \
|> $(ROOT)/getflow/client/es5/%B.min.js \
   $(ROOT)/getflow/<client_min>

Build the server.

previous tup next

: $(ROOT)/getflow/server/getflow/*.py \
|> ^o bundle getflow CLI ^ grep -hv 'from \.\.getflow' %f > %o \
|> $(ROOT)/getflow/getflow.py $(ROOT)/getflow/<cli>

: $(ROOT)/getflow/getflow.py \
  $(ROOT)/getflow/server/getflow/*.wsgi \
|> ^o bundle getflow WSGI ^ grep -hv 'from getflow' %f > %o \
|> $(ROOT)/getflow/getflow.wsgi $(ROOT)/getflow/<wsgi>

The grep is there to remove the import lines that would refer to the programs being bundled. It requires that odd way of importing things from within the package (and indeed, requires the package itself to exist).

Run the tests.

previous tup

: foreach $(ROOT)/getflow/server/test-*.py \
  | $(ROOT)/getflow/server/getflow/*.py \
    $(ROOT)/getflow/server/test_helpers.py \
|> ^ server unit test %g^ python3 -B %f > %o \
|> $(ROOT)/getflow/server-test-results/%g.log

The -B prevents creation of __pycache__ directory, which Tup doesn’t like.

7 incidentals

You’d think this were part of JavaScript by now.

quote regexp
previous js next

RegExp quote utility

function quote_regexp(obj) {
    return obj.toString().replace(/([.?*+\^$\[\]\\(){}|\-])/g, "\\$1");
}

Presumably for good reason, JavaScript’s DOM-querying methods don’t return arrays, but “array-like” things (node lists) which do not have Array in their prototype chain, so the results lack Array methods like map and filter. Since we will typically want those functions, we usually “wrap” such result sets in this make_array function:

pure functions next
previous js next

Turn array-like things into arrays

function make_array(array_like_thing) {
    return array_like_thing instanceof Array? 
		array_like_thing
		: [].slice.call(array_like_thing);
}

Using [] references Array.prototype so that we can access its methods as independent functions.¹ Of course, this creates a “copy” of the list; be aware of that when using it.

7.1 debug messages

This is not a “pure” function—on the contrary, it’s all about side-effects. But it doesn’t mess with any variables, so it’s safe to put it up front. In fact, it’s used by a bunch of the other “pure” functions, so it should be up front.

previous js next

Debugging messages

function debug_message(o) {
	const console = window.console, // stupid minifier trick
		  items = arguments;

	// DIRTY test for mobile
	if (/Android/.test(navigator.userAgent)) {
		alert('getflow: ' + o);
	}

	if (console) {
		if (console.error)
			console.error('getflow', items);

		else if (console.log)
			console.log('getflow', items);
	}
}

7.2 animation

The art of animation means “breathing life into” things. Generally, that involves interpolation over time.

So to do animations, you first need a good timer. Browsers started providing a high-resolution timing function called window.performance.now at some point, but this might not be one of those browsers. If not, the old Date object will have to do.

previous pure functions next
previous javascript next

const now = (window.performance && "function" == typeof window.performance.now
		   ? () => window.performance.now()
		   : () => new Date().getTime());

Note that you can’t just assign window.performance.now directly to a value because it will complain when invoked that it is “called on an object that does not implement interface Performance”.

You also need a way to queue frames. In this case, I don’t bother with a fallback, although setTimeout is commonly used.

previous pure functions next
previous javascript next

const queue_frame = window.requestAnimationFrame;

animate promises to performs some action every available frame for a given period. The callback is provided a number between 0 and 1 indicating how much of the period has elapsed. The callback can stop the whole thing by returning a falsy value (although this is currently unused).

previous javascript next

function animate(durationMS, fn) {
	const start_time = now();

	return new Promise(resolve => {

		function frame() {
			const complete_ratio = (now() - start_time) / durationMS;

			if (complete_ratio <= 1 && fn(complete_ratio))
				queue_frame(frame);
			else
				resolve();
		}

		queue_frame(frame);
	});
}

7.3 element children

I like children. By which I mean, I like the children property that is defined on DOM Nodes, at least by some browsers.

previous pure functions next
previous js next

function children_of(parent) {
	return parent.children ||
		make_array(parent.childNodes).filter(_ => _.nodeType == ELEMENT_NODE)
}

The children property differs from childNodes in that it only includes elements, not text or other nodes. The above is a bit inefficient in that it makes two arrays each time it’s called. I wonder if it would work with querySelectorAll('*'), which wouldn’t require the filtering step?

7.4 inner xml

I’m tempted to say that this is another dispatch from the browser front, since, to wit, this is only a problem on Safari iOS. Yet there is some logic to the fact that innerHTML doesn’t work on XML documents, which our blueprints are.

previous pure functions next
previous js next

const SERIALIZER = new XMLSerializer();
function serialize(node) {
	return SERIALIZER.serializeToString(node);
}
function markup_of(node) {
	return node.innerHTML != undefined
		? node.innerHTML
		: make_array(node.childNodes).map(serialize).join('');
}

http://stackoverflow.com/a/6170981

7.5 a note on javascript

The javascript program defined here uses ECMAScript 2015, also known as ES6, Harmony. Since, as of 2015, most browsers don’t fully support this standard, we use a tool (called “Babel”) to transpile the program to ECMAScript 5, which practically all browsers support.

One consequence is that, in order to build the client module, you have to have babel and the es2015 preset installed localy. Unfortunately, starting with Babel 6, you cannot use globally-installed presets—at least not from the CLI.

Incidentally, the build now uses Babel 5.8 because 6 was so slow and annoying.

Not all features of ES6 can be effectively provided in this way. Some features—such as iterators and generators—require additional runtime support. This module is avoiding those features at the moment, although they are used in the tests with the help of the regenerator runtime.

Also, although this is a “literate” program, we don’t bother explaining anything that is provided only by the transpiler, such as the “use strict” directive. That artifact and the need for it may be consigned to history.

7.6 hashing strings

We’re using an insanely stupidly naïve hash algorithm. I hesitate to even call it one. I’m surprised it can hash the word “naïve” (though it can). Even “checksum” sounds too sophisticated for what this is. For right now, though, we just need something that is exactly portable to Python, and this is.

previous js next

function stupid_string_hash(s) {
    var hash = 0, i, len = s.length;
    for (i = 0; i < len; i++)
	hash += s.charCodeAt(i);
    return hash;
};

7.7 a set function

JavaScript arrays have a native forEach “method” (starting with version 5). But since I frequently use it in combination with make_array, this makes the code read a bit better. Also, I think the code reads better with the for_each in the front.

previous pure functions next
previous javascript next

function for_each(list, action) {
	make_array(list).forEach(action);
}

7.8 scan attributes

This isn’t really a “pure function” in that the callback will almost surely be side-effecting. I just extract it out because it’s used in two places.

previous pure functions next
previous js next

function scan_attributes(node, action) {
    for_each(node.querySelectorAll('*'), element =>
			 for_each(element.attributes, action));
}

7.9 promise chains

Love ‘em or hate ‘em, Promises are native now. Let’s use ‘em.

previous pure functions next
previous js next

Chain several promises together sequentially

function chain(promises) {
    return promises.reduce(
		(previous, next) => previous.then(next),
		resolve_now(true));
}

While we’re on the subject of promises, I find myself resolving them a lot.

previous pure functions next
previous javascript next

function resolve_now(value) {
	return Promise.resolve(value);
}

7.10 promising to get things

Remember this—I’ll use it in a minute, and then again a minute later.

previous pure functions next
previous javascript next

function memoize_promise(f) {
	const cache = {};
	return x => cache.hasOwnProperty(x)?
		resolve_now(cache[x])
		: f(x).then(result => (cache[x] = result));
}

Note that this only works with single-arity promise functions.

previous js next

Promise to load a document

let GET = path =>
	new Promise((resolve, reject) => {
	const request = new XMLHttpRequest();
	request.open('GET', path);
	request.onerror = what => {
	    debug_message('Error loading ' + path);
	    debug_message(what);
	    reject(request);
	};
	request.onreadystatechange = () => {
	    if (request.readyState === READY_STATE_DONE) {
		if (request.status == 200)
		    resolve(request);

		else {
		    debug_message('Could not load document', request);
		    reject(request);
		}
	    }
	};

	request.send();
    });

Now let’s do something really naïve. Let’s cache all remote documents in memory, forever, with no way to evict anything at all.

previous pure functions next
previous javascript next

const GET_XML = memoize_promise(
	path => GET(path).then(request => request.responseXML));

Like document promise, except that we’ll import a stylesheet (i.e. transform), and also cache transforms, since we expect to have a limited number of them and would rather avoid the cost of loading them all the time.

previous js next

Promise to load an XSL transform

const GET_XSLT = memoize_promise(
	path => GET_XML(path).then(doc => {
	const processor = new XSLTProcessor();
	processor.importStylesheet(doc);
	return processor;
    }));

7.11 a note on style

Note that while we observe certain byte-saving conventions, we eschew others, when those can be accomplished by way of a minifier. Examples include the consolidation of var declarations and conditional return statements.

7.12 constants

These constants are defined by the DOM specification, and will never change. We could access them without reference to their exact values using the definitions on the right—indeed, that’s the whole point of those definitions. This just saves a few bytes in the output.

previous js next

W3-defined constants used in the program

const LEFT_MOUSE_BUTTON = 1;
const ELEMENT_NODE = 1;                 // Node.ELEMENT_NODE
const TEXT_NODE = 3;              // Node.TEXT_NODE
const READY_STATE_DONE = 4;       // XMLHttpRequest.DONE
const XPATH_STRING = 2;                   // XPathResult.STRING_TYPE

7.13 normalize pathname

Coerces a pathname into the same non-encoded form across browsers. Firefox returns pathname with certain reserved characters percent-encoded (viz, apostrophe), even if they are not encoded in the href. This should always be used when getting pathname.

previous pure functions next
previous js

Normalize pathname

function normalizePathname(pathname) {
    return decodeURI(encodeURI(decodeURI(pathname)));
}

7.13.1 TODO is that really needed?

If not, get rid of it. If so, explain what happens if you don’t do it. And also ensure that you have a test case to exercise its purpose.

If it is needed, then it’s not really incidental.

7.14 `URL` shim

Most browsers have a URL constructor that parses the given string and returns an object with the breakdown into parts. Not all do yet, hence this.

previous pure functions
previous javascript

const parse_url = (() => {
	const duty = document.createElement('a');
	return uri => {
		duty.href = uri;
		return {
			pathname: duty.pathname,
			search: duty.search,
			hash: duty.hash
		};
	}
}());

Footnotes:

Fielding, et al. “3.4. Query”, RFC 3986 (2005) https://tools.ietf.org/html/rfc3986#section-3.4

This document specifically states the policy about slashes in query strings, unlike its predecessors RFC 1738 (1994) and RFC 2396 (1998).

For more, see Mozilla Developer Network, “addEventListener” https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener

Philip Walton, “The Dangers of Stopping Event Propagation” https://css-tricks.com/dangers-stopping-event-propagation/

⁴

W3, DOM Level 2 Core, http://www.w3.org/TR/DOM-Level-2-Core/core.html

⁵

John Resig, “.nodeName Case Sensitivity” http://ejohn.org/blog/nodename-case-sensitivity/

⁶

§ 3.1 “Scrolling”, CSSOM View Module, W3C CSS Working Group, Editor’s Draft, 4 January 2016