the system

The purpose of “the system” is to establish a foundation that can be used for everything else that is done.

This document describes some properties that the system should have, and then creates a system that has those properties.

“System,” as we will see, means “to cause to stand.” And indeed, this system is mostly a bootstrapper.

1 thinking

I’m not happy with the system. But I can imagine at some point being happy with a system. It’s very hard to say exactly what it would look like, but I somehow remain confident that I’m moving towards it.

Shakespeare never used the word “system.” The word first entered English usage near the end of his career, according to the Online Etymology Dictionary.

system (n.)
1610s, “the whole creation, the universe,” from Late Latin systema “an arrangement, system,” from Greek systema “organized whole, a whole compounded of parts,” from stem of synistanai “to place together, organize, form in order,” from syn- “together” (see syn-) + root of histanai “cause to stand” from PIE root *sta- “to stand” (see stet).¹

1.1 process

It’s about bootstrapping yourself over and over again. It’s about pushing forward, making something that supports your next expirement/exploration. It’s not about a particular product with particular properties. It’s about constantly asking what’s important, in light of everything that you’ve learned, and getting rid of the things that you don’t want anymore. God forbid that willshake ever care about “backward compatibility” (except w.r.t. URL’s).

1.2 the system versus the product

The system is about creating a “product,” which people might call willshake. In some respect, the system itself is willshake (if Microsoft were on fire, what would Microsoft grab on the way out, a copy of Microsoft Word, or its source code? which you can’t do anything with?)… at any rate, I’m not striving for a system that is also identical with the product, although that might be interesting. In other words, one of the things that I don’t expect to change soon, is that willshake the program and willshake the product are two different things. We’re so used to that model that it’s hard to explain (or indeed concieve) how it could ever be otherwise.

1.3 on “solving problems”

I don’t like talking about “solving problems,” at least not in the context of willshake. Willshake doesn’t solve a problem. If anything, it creates problems. This project is all about beauty and pleasure.

1.4 red / yellow / green code

This really could belong to self documenting, since it’s just about providing and displaying some metadata on parts of the documents.

red/yellow/green code. on solving the right problem
- how often, for example, are you parsing text? when this should be a solved problem, right?

Another way to alook at the latter point is to ask, in each section, is this incidental complexity, and if so, to what external system (if any) does it accrue? Is it something we can expect to drop by dropping that system? Is that on the roadmap?

1.5 our design philosophy

Our “design philosophy”: always ask

what if you could do whatever you wanted (no constraints)?
what can you do with what you have?

And answer in that order. Answering the first question may very well lead you to wonder if you would have this problem at all—if the problem is not entirely about technical matters, with no substance of its own.

2 the actual system

This repository contains

some static files, including
- “Org” documents
- data files
a bootstrapper

The “bootstrapper” is a bit of code that allows us to treat the directory of documents as a program. The documents may contain code, of two main kinds: plain code, and build rules. The plain code ends up in static files. The build rules are a little more interesting. They can cause other things to happen.

The bootstrapper, which depends on Tup and gawk, has the following features

rules
- automatic tangling of code blocks (in (restricted) Org format, but no emacs required)
- automatic extraction of explicit build rules (in Tup syntax)
- automatic grouping of build inputs and outputs (supports global communication between build rules)
  - of tangles (with override)
  - of explicit build rules (with override)
- subsystems
  - directories are subsystems
  - “vertical” subsystems (communication between descendants/ancestry)
    - “services” from parent
    - “super” rules from parent (not tested)
- order-agnostic. rules are automatically reordered within a subsystem
  - including super rules and services
- shelving of features
  - from a central list
    - for individual documents (by listing document name)
    - or entire subsystems (by listing directory name)
  - also shelve individual documents from their own metadata
tangler (depends on gawk)
- multi-block
  - files
  - placeholders
- placeholder references
  - prefix-preserving
  - warn on missing (though should be error)
- language- or block-level:
  - shebangs / executables
  - prologues and epilogues

2.1 Org

Org mode is… well, what does it say for itself?

Org mode is for keeping notes, maintaining TODO lists, planning projects, and authoring documents with a fast and effective plain-text system.

Created by Carsten Dominik in 2003, maintained by Bastien Guerry and developed by many others.²

Basically, Org mode is a document system built on Emacs.

2.2 Tup

Tup is a build system—/the/ build system—for files. If you’re familiar with spreadsheets, it’s kind of like a spreadsheet for your file system, where files are cells, and build rules are formulas.

It was created by Mike Shal to solve the problem of incremental builds that were slow, incorrect, or both.

Tup gives you lots of upside, if you can provide a static dependency graph. Since it is much easier to reason about a static dependency graph, you’re better off organizing your build that way regardless of Tup.

In other words, it only works in the desired way when you can declare to it all of the inputs and outputs for each command before any build action is taken.

Anything that is not an authoring task should be encoded in our build system. Our build system is Tup.

In essence, willshake’s “system” is really just a way of using Org documents and Tup to fulfill the objectives discussed earlier. The next section shows exactly how this works.

2.2.1 on Tup’s strictness

Tup wants to guarantee correct builds all the time, and one way it does that is to control all access to the project files during a build. It exposes a “virtual” file system to the programs that are supposed to be carrying out commands, and if any process that it doesn’t own tries to access one of the files, it will throw a security exception.

I have worked around this while experimenting, by disabling tha check, viz context_check in fuse_fs.c. I just had it return 0 up front. Fortunately, Tup is easy to build.

Of course, Mike Shal is right—this is not “safe.” It may produce incorrect builds, for example, if the settings in your running Emacs differ from those that we set presently. Don’t use this method for production. (Or do.)

3 the bootstrapper

Nothing will come of nothing.
King Lear

“Bootstrapping” is a mystery of creation—maybe the deepest one.

—God created the world.

—Okay, but who created God?

—Um…

So it is with computers.

—What happens when you turn on a computer?

—It loads programs.

—How does it load programs?

—With a program loader.

—What is the program loader?

—A program.

—So who loads that?

—Um… God?

The answer is, the bootstrapper.

You need a bootstrapper any time you have an “all the way down” concept—a “self-substantial flame,” if you will.³ (Except God—God gets a pass.) In American mythology, it’s “your money all the way down,” and the bootstrapper is “hard work.” Or you might say that the US Constitution is a bootstrapper, since it initiates a process for the creation and revision of laws, including the Constitution itself.

For willshake, the system is documents all the way down, and the bootstrapper is the violation of that statement that makes it possible.

The entire program for willshake is supposed to come from these documents. But these documents are not directly executable; they need to be “tangled” into code first. So we need a program to tangle the documents. And while we’re at it, we need a program that says what should be done (such as tangling). And where is that program? Read on.

3.1 level zero : get the rules

it shall be called Bottom’s Dream,
because it hath no bottom
A Midsummer Night’s Dream

This section is about the “seed money” that makes that possible. The objective is that we start with as little seed money as possible. And notwithstanding my dissatisfaction with the system, I’m content that there’s little enough here.

This is for the things that can’t be expressed in documents. To be specific, this level is just the non-document, non-data files that are essential to the project, i.e. that ship with the repository, i.e. that are necessary to have.

The programs that appear in this section are imported into the document for reference. Normally, when you see source blocks in the program, they are actually contained in the document’s source, and serve in turn as the source for a program file. Here, it is the opposite. The program file is already written in-place, and that content is pulled in as part of this document’s production.

So this is all about a starting point. For Tup, the starting point is a Tupdefault file.

tup next

The Tupdefault.lua file

include_rules
include $(BOOTSTRAP)/rules.lua

This merely includes a Lua program from the bootstrap directory. This is done so that the “system stuff” can be kept out of the way of the documents as much as possible.

The rule scraper

-- ASSUMES the bootstrapper is in a first-level directory.  I don't know how to
-- dynamically get a path back to the root from here.
local bootstrapper_to_root = '..'
local root_to_processing = tup.getrelativedir(ROOT)
local processing_dir = bootstrapper_to_root..'/'..root_to_processing

-- Destructively collect a list of graph nodes sorted by dependency.  Not
-- worried about falsy keys or values.
function sort_node(g, k, sorted, stack)
   stack[k] = 1                         -- for cycle detection
   -- Does anything link to this?  If so, deal with them first.
   for _, o in next, g[k] do
	  if not stack[o] and g[o] then
		 sort_node(g, o, sorted, stack)
	  end
   end
   table.insert(sorted, k)
   g[k] = nil
end
function sort_keys(g)
   local sorted = {}
   -- How to traverse an array while killing it
   local k = next(g)
   while k do
	  sort_node(g, k, sorted, {})
	  k = next(g)
   end
   return sorted
end

-- Merge two tables by key assuming values are also tables.
function merge_dictionaries(left, right)
   for key, right_table in next, right do
	  if not left[key] then left[key] = {} end
	  tup.append_table(left[key], right_table)
   end
end

function tokenize(args)
   local t = args.into or {}
   for x in args[1]:gmatch(args.on or '%S+') do table.insert(t, x) end
   return t
end

-- Support input and output lists, including "extras"
function parse_files(expression, extra_field)
   local mains, more = expression:match'(.*) | (.*)'
   local items = tokenize{mains or expression}
   if more then items[extra_field] = tokenize{more} end
   return items
end

function get_rules(base_dir, filename)
   local tangles = {}
   local builds = {}
   local services = {}
   local supers = {}
   local used_services = {}

   local tangle_targets = {}

   function process(line)

	  -- These directives are always on #+BEGIN_SRC lines.  It's
	  -- case-insensitive in org-mode, so we don't check for the whole thing.
	  -- This is just enough to avoid false positives from other text.
	  if line:match'^#' then

		 -- Document metadata
		 local property, value = line:match'^#%+PROPERTY: (%S+) (%S+)'

		 -- Support shelved documents
		 if property == 'status' and value == 'shelved' then
			return false
		 end

		 -- Support opt-in services
		 if property == 'uses_service' then
			used_services[value] = true
		 end

		 -- Code blocks.  Note that a tangle file may be listed more than once.
		 local target = line:match':tangle (%S+)'
		 if target then
			-- Um...
			target = base_dir..'/tangled/'..target

			-- All tangles go into a group.  You can set the name.  If the file
			-- uses multiple blocks, the first block is used.
			if not tangle_targets[target] then
			   tangle_targets[target] = line:match':group (%S+)' or 'all'
			end
		 end
	  end

	  -- Collect build rules in Tup syntax
	  local super, ins, command, outs = line:match'^(.*): (.*) |> (.*) |> (.*)$'
	  if outs then
		 -- Support services
		 local service = super:match'^service (%S+)'

		 -- Support "foreach" rules
		 local is_foreach = ins:match'^foreach '
		 if is_foreach then
			ins = ins:sub(string.len('foreach ') + 1)
		 end

		 local output_dir = outs:match'%S+/'

		 local rule = {
			is_foreach = is_foreach,
			ins = ins, inputs = parse_files(ins, 'extra_inputs'),
			command = command:gsub('%%p', root_to_processing),
			outputs = parse_files(outs, 'extra_outputs'),
			output_dir = output_dir }

		 -- All build outputs go into a group.  If one is not specified
		 -- explicitly in the rule, a group called `<all>' is used, in the
		 -- directory of the first output.
		 if not outs:match'>$' then rule.group = output_dir..'<all>' end
		 if super:match'^super ' then
			table.insert(supers, rule)
		 elseif service then
			services[service] = services[service] or {}
			table.insert(services[service], rule)
		 else
			table.insert(builds, rule)
		 end
	  end

	  return true
   end

   -- The more concise io.lines *will not work* here.  Tup will not notice files
   -- opened by io.lines, and thus won't reprocess rules when they change.
   local line
   local file = assert(io.open(processing_dir..'/'..filename, 'r'))
   for raw_line in file:lines() do
	  -- Support continued lines.  Note that `raw_line' evidently includes the
	  -- newline.
	  if line and line:match'\\$' then
		 line = line:sub(0, -2)..raw_line
	  else
		 if not process(line or raw_line) then return {} end
		 line = raw_line
	  end
   end
   file:close()

   -- Tangle
   if tangle_targets then

	  local outputs = {}
	  for tangled in next, tangle_targets do table.insert(outputs, tangled) end
	  table.insert(
		 tangles, {
			inputs = {filename},
			command = '^o tangle %B^ '..BOOTSTRAP..'/tangle --assign ROOT='..ROOT..'/tangled "%f"',
			outputs = outputs})

	  -- Now link the "actual" location to the "actual" file.
	  for tangled, group in next, tangle_targets do
		 local real_target = tangled:gsub('/tangled/', '/')

		 -- Support grouping of tangles
		 local folder = real_target:match'.*/' or ''
		 table.insert(
			tangles, {
			   inputs = {tangled},
			   command = '^ link tangle %f^ ln --symbolic --relative "%f" "%o"',
			   outputs = {real_target},
			   group = folder..'<'..group..'>'})
	  end

   end

   return {
	  tangles = tangles,
	  builds = builds,
	  services = services,
	  supers = supers,
	  used_services = used_services,
   }
end

function add_build_rules(rules)
   for _, rule in next, rules do
	  local add_rule = tup.rule
	  if rule.is_foreach then add_rule = tup.foreach_rule end
	  if rule.group then
		 local outs = rule.outputs
		 if not outs.extra_outputs then outs.extra_outputs = {} end
		 table.insert(outs.extra_outputs, rule.group)
	  end
	  add_rule(rule.inputs, rule.command, rule.outputs)
   end
end

local documents

-- Is this directory shelved?
local skip = shelved and shelved[root_to_processing]

-- If there are no documents in the directory, `glob` will throw an error.
if not skip and pcall(function() documents = tup.glob('*.org') end) then

   local services = {} -- opt-in rules that may be used by descendant subsystems
   local tangles = {}           -- files extracted from documents
   local builds = {}            -- explicit build rules from documents
   local used_services = {}

   for _, doc in next, documents do
	  -- Is this document (centrally) shelved?
	  if not shelved[doc] then
		 local rules = get_rules(ROOT, doc, graph, used_services)
		 merge_dictionaries(services, rules.services or {})
		 tup.append_table(builds, rules.supers or {})
		 tup.append_table(builds, rules.builds or {})
		 tup.append_table(tangles, rules.tangles or {})
		 for _ in next, rules.used_services or {} do used_services[_] = true end
	  end
   end

   -- Get rules from supersystems
   local up = ''
   for __ in processing_dir:gmatch('/') do
	  up = up..'../'
	  if pcall(function() documents = tup.glob(up..'*.org') end) then
		 for _, doc in next, documents do
			-- Is the *ancestor* document shelved?  Note that ancestor documents
			-- can only be shelved by filename, which is assumed unique.
			if not shelved[tup.file(doc)] then
			   local super_rules = get_rules(ROOT, doc, {}, {})
			   tup.append_table(builds, super_rules.supers or {})
			   merge_dictionaries(services, super_rules.services or {})
			end
		 end
	  end
   end

   add_build_rules(tangles)

   -- Merge service rules into build rules.
   for service, rules in next, services do
	  if used_services[service] then
		 tup.append_table(builds, rules)
	  end
   end

   -- Reorder build rules based on dependencies.  This is done using a graph of
   -- the dependencies among input and output directories.
   local graph = {}
   local by_dir = {}
   for _, rule in next, builds do
	  local output_dir = rule.output_dir
	  if not by_dir[output_dir] then by_dir[output_dir] = {} end
	  table.insert(by_dir[output_dir], rule)
	  if not graph[output_dir] then graph[output_dir] = {} end
	  graph[output_dir] = tokenize{rule.ins, on='%S+/', into=graph[output_dir]}
   end
   for _, dir in next, sort_keys(graph) do
	  add_build_rules(by_dir[dir])
   end

end

The rule scraper itself is (not currently) broken into parts. One of its most important jobs is creating “tangle” rules—that is, telling Tup how each document is going to produce a number of other files. This “low level” function can’t be done from a document because (as things stand) the system doesn’t support in-document build rules that map one file to many outputs. Besides, this does additional processing to support the placement of tangled files into groups.

This script, like the others here writes rules for consumption by Tup. It assumes that the “tangler” lives at $(BOOTSTRAP)/tangle, and that that (gawk) program expects its ROOT variable to be set to the base directory where tangled files should go.

The script itself also expects a ROOT variable to be set, indictaing—in principle—the same thing. However, different directories are used. The “actual” files are going to be written to a tangled subdirectory under the project root. The locations requested by the actual code blocks will be symlinked to those locations. This is done so that it’s possible to put each tangled file into a specific build group. If this were possible to do from a single rule, I’d do that; but it’s not possible, so this is kind of an implementation detail. Nothing outside of here should rely on the tangled directory.

So much for the tangles. But tangles will only get you so far—they don’t do anything but create text files. Technically, that’s true of everything that the build system does, but from here on you can do that text-file-writing using any command you want—including commands that you just “tangled.”

The main script, which includes the “tangle rules” script just seen, is the one that collects expicit build rules from the documents.

I’m not sure this is true, about the automatic ordering.

All of our build rules come from documents. We scrape them out here. The great advantage of this is that we can treat build rules just like any other part of the program, placing them wherever they are relevant. This is perhaps not “normal” Tup usage, which generally assumes one Tupfile per output directory, in which case it will figure out in which order your rules should be run. But within a Tupfile, you’re on your own. Each rule will only “know about” the rules in that file that have come before it.

http://comments.gmane.org/gmane.comp.programming.tools.tup.user/1171

I’m not sure this is true, about the group feature.

Note this means that Tup’s <group> feature is of no use to us, since it only applies across Tupfiles. The tangles are easy, because they never depend on the output of other rules (except the bootstrapper), so we just print them out in place as we encounter them.

The rest of the rules are trickier. Ideally—from an authoring point of view—we could write them wherever we want and know that the right thing will happen. We will try to maintain that freedom here. To do that, we’ll have to do better than document order, because dependencies can go in both directions: a rule may depend on input from another rule that is output by a “later” document, meaning later in alphabetical order.

The program includes some logic to sort the rules within a subsystem so that they can be written in any order.

The original strategy was extremely quick and only somewhat dirty. It used dot (the Graphviz tool) to “rank” the rules by output directory. The current program does something similar in spirit, but more to the point.

You’ll note there was also a “continued lines” bit in there. This allows build rules to use the continuation format supported by Tup, where long lines may be broken using backslashes (as in numerous other languages).

3.1.1 others

There are a few other files that come with the program.

Tupfile.ini

Tup has some options that control how the build is done. You can set these in several different places, including a Tupfile.ini at the root of the project. But,

In addition to setting options, Tupfile.ini files have a special property. Tup uses Tupfile.ini to identify the root of a project to automatically set up a .tup directory in the project root the first time it is run. This file may be empty. At a minimum, the root directory of a project should contain a file named ‘Tupfile.ini’ so that users do not need to explicitly run ‘tup init’ when first setting up a new project.⁴

In other words, this file is just a marker, and it could just as well be empty. As a courtesy, it includes a reference back to this document.

conf next

# See `documents/the_system.org`

Tuprules.tup

Tup also lets you put a Tuprules file anywhere you want. It lets you set rules and definitions which will apply in that directory and all of those below it.

previous conf next

# See `documents/the_system.org`

# This is how you make the “project root” available anywhere in your build
# rules.
ROOT = $(TUP_CWD)

BOOTSTRAP = $(ROOT)/bootstrap

include shorthands.tup

This file only includes the generic definitions—those that might apply in another project. I use the shorthands.tup file for those that are specific to willshake.

shorthands for build rules : `shorthands.tup`

Thanks to the bootstrapper, it’s possible to treat build rules like other “literate” code. But Tup doesn’t allow anything besides rules to be defined by a run script. Other constructs—variables, conditionals, macros, etc— only be done in Tup’s actual config files. And since those files are used by the build, you can’t sneak around that by trying to tangle them, which would make them “generated” files from Tup’s point of view. It’s the chicken and egg thing again.

But you can use variable and macro definitions from run scripts. And those shorthands make the build rules more readable.

It’s mostly just a bunch of directory aliases—shorter names for some of the file locations. There are also a couple of “macros,” another shorthand for build rules. It’s imported by the include_rules directive in the Tupfile.

previous tup

shorthands.tup

PLAYS = $(ROOT)/text/plays

DATABASE = $(ROOT)/database
IMAGE_RECORDS = $(DATABASE)/images
AUDIO_RECORDS = $(DATABASE)/audio

DOCUMENTS = $(ROOT)/documents

MEDIA = $(ROOT)/media
IMAGES = $(MEDIA)/images
AUDIO = $(MEDIA)/audio

IMAGE_USAGE = $(ROOT)/image-usage

DOWNLOADED = $(ROOT)/downloaded
IMAGE_METADATA = $(DOWNLOADED)/image-metadata
IMAGE_LOCATION = $(DOWNLOADED)/image-location
AUDIO_METADATA_LOCATION = $(DOWNLOADED)/audio-metadata-location
AUDIO_METADATA = $(DOWNLOADED)/audio-metadata
AUDIO_LOCATION = $(DOWNLOADED)/audio-location

PROGRAM = $(ROOT)/program
ROUTES = $(PROGRAM)/routes
TRANSFORMS = $(PROGRAM)/transforms
STYLUS = $(PROGRAM)/stylus
CSS = $(PROGRAM)/css
PUBLISH = $(PROGRAM)/publish

PUBLISHED = $(ROOT)/published

GETFLOW = $(ROOT)/sub/getflow

SITE = $(ROOT)/site
SITE_TRANSFORMS = $(SITE)/static/transforms
SITE_STYLE = $(SITE)/static/style
SITE_IMAGES = $(SITE)/static/images
SITE_SCRIPT = $(SITE)/static/script
SITE_DOCS = $(SITE)/static/doc
SITE_PLAYS = $(SITE_DOCS)/plays

# The quotes are needed to support filenames containing, e.g. apostrophes.
# Using it here means that this macro can't be used for multiple files at once.
!copy_to = |> ^ copy %b to %o^ cp "%f" "%o" |>

# It's tempting to use symlinks in place of copies, since we'll never modify
# generated files after the fact.  However, changes to symlinks don't get picked
# up by live reload.
!link_from = |> ^ link %b to %o^ ln --symbolic --relative "%f" "%o" |>

These definitions are not really essential to the system, they are just a convenience. Everything would work the same way if the shorthands were all expanded.

That said, these shorthands must be used consistently, because of the special trick that is used to sequence the build rules.

In fact, those definitions would work exactly the same way if they were in the Tupfile itself. I only split them into a separate file so that the essence of the actual Tupfile is more apparent.

.hgignore

This file is maintained explicitly as a temporary measure until such time as we can get Tup to generate it based on the build graph (which is currently only supported for Git) . See the section “machine and human work shall not mingle” in the file document/program/the-build.org for more.

previous conf

.hgignore

# See `documents/the_system.org`

# Tup uses its own database to keep track of the state of the build graph.  This
# is valuable for enabling fast iterations, but it can always be recreated.
^\.tup/

# The tangled programs are disposable.  We'll need to generate a number of
# programs for later use, so let's keep them in one diposable place.
^tangled/

# And so on.
^program/
^change_rules/
^scripts/
^test/
^media/
^image-usage/fullscreen/
^image-projections/
^wrapped/
^collated/
^site/
^published/
^facsimiles/
^downloaded/
^hashed/
^server/
^server-setup/
^app/
^data/
^getflow/

^start-site                                             # a special case
^start-livereload                               # another special case

3.2 level one: the tangler

When your entire program is based on documents, practically every single change you make is going to run a tangle. So you want it to be as fast as possible.

Starting up a new Emacs session and loading Org Mode every single time you make a change carries an unfortunate penalty.

Following needs cleanup

This section produces a script that tangles a single program document.

This section (an included file) is itself tangled during the “bootstrap” phase of our build, by a special rule whose command is a simpler version of this program. This program is then used for all subsequent tangles. We are thereby still able to use a “literate” program even for the tangle command itself.

However, this program is not quite like the other programs, in that it cannot respond to their output to modify its behavior. In other words, we cannot use additive programming to enhance the behavior of the tangle. Why not? Because those modifications would themselves have to be tangled, which depends on this program having run already. If you consider what it would mean for an addition to the program to change the very way that programs were generated from all pre-existing documents, it’s clear that such an ability would introduce considerable complexity to weigh against whatever benefit it would bring. The biggest downside to the present method is, to wit, that we must here address some language-specific matters, which we might rather address where those languages are under discussion.

In summary, this document is a kind of middle ground. It is the master version of a program (that is, changes to the document will be reflected automatically by the build), but it does contribute to the fixed portion of the program that cannot be modified ex post facto.

Note that the “seed” rule in the Tupfile has to match the outputs of the program explicitly.

Have I mentioned that we’re obsessed with fast builds? Well, when you’re doing literate programming, nothing slows you down more than the tangle. It’s a preprocess to every single thing that you do. If you can’t tangle your document within milliseconds, you have no chance of a responsive development cycle.

So, we do our tangling in-house, and we do it using gawk. Oh my, gawk is fast.

Yes, to do this, we have to give up untold amounts of power that Org Babel offers us. In all seriousness, that’s not much of a bug. Since we’re also writing our build rules from the (tangled) documents, we can do effectively the same thing (in the way of multi-language code generation), and still see all of the intermediate steps (as files) without polluting our documents with generated state.

So here’s our tangle program. It supports a limited set of Org Babel’s features, in a way that is mostly syntax-compatible with it. (Note the hack to deal with the fact that it’s run from the root.) This program is tested against a set of documents to compare its output with Org Babel’s.

#!/usr/bin/gawk -f

BEGIN {
	current_file = ""
	current_ref = ""
	is_header_line = 0

	shebangs["awk"] = "#!/usr/bin/gawk -f"
	shebangs["python"] = "#!/usr/bin/python3"

	shebangs["sh"] = "#!/bin/sh"
	prologues["sh"] = "set -e\n"
	epilogues["sh"] = "\nexit 0"

	prologues["xsl"] = "<?xml version='1.0' encoding='utf-8'?>\n"   \
		"<xsl:transform version='1.0'"                                                          \
		" xmlns:xsl='http://www.w3.org/1999/XSL/Transform'"                     \
		" xmlns:ex='http://exslt.org/common'"                                           \
		" exclude-result-prefixes='ex'>"

	# We put this at the end so you can use imports as the first element.
	epilogues["xsl"] = "<xsl:output omit-xml-declaration='yes' />\n" \
		"</xsl:transform>"
}
/^#\+BEGIN_SRC/ {
	is_header_line = 1

	if (match($0, /:noweb-ref (\S+)/, this_ref))
		current_ref = this_ref[1]

	if (match($0, /:tangle ((\S+\/)?\S+)/, tangle)) {
		current_file = ROOT "/" tangle[1]
		out_dir = tangle[2]

		if (match($0, /:shebang "([^\"]+)"/, sheb))
			file_shebangs[current_file] = sheb[1]

		type_of[current_file] = $2 # the language follows BEGIN_SRC

		mkdir -p out_dir
	}
}
/^#\+END_SRC/ {
	current_file = ""
	current_ref = ""
}
! is_header_line {
	if (current_ref) {
		# Avoid trailing newline
		if (current_ref in refs)
			refs[current_ref] = refs[current_ref] "\n" $0
		else
			refs[current_ref] = $0
	}

	if (current_file) {
		n = current_file in files ? length(files[current_file]) : 0
		files[current_file][n] = $0
	}
}
is_header_line {
	is_header_line = 0
}
function resolve(array, i) {
	# Preserving the prefix of the placeholder complicates things rather
	# substantially here.  Keep in mind that for nested references, you'll no
	# longer be matching line-by-line, so the placeholder will probably occur
	# somewhere in the middle of a multi-line string.
	while (match(array[i], \
				 /(^|\n)([^\n]*)<<([^>]+)>>/, \
				 part) > 0) {
		newline = part[1]
		prefix = part[2]
		ref = part[3]
		line_start = RSTART
		line_length = RLENGTH

		if (! (ref in refs))
			printf "warning: placeholder '%s' not found.\n", ref > "/dev/stderr"

		# This is still a pointer -- no copy of the string has been made.
		body = refs[ref]

		# Preserve placeholder prefix.  Because of this prefix, the replacement
		# text can be different each time the substitution is used.  So you
		# can't short-circuit future work by resolving the refs themselves.
		if (prefix != "")
			# You're not supposed to pass a non-lvalue as the target here, but
			# this is the only way I could find to do a non-destructive
			# substitution.
			body = gensub(/^|\n/, "&" prefix, "g", substr(body, 1))

		array[i] = \
			substr(array[i], 1, line_start - 1) \
			newline body \
			substr(array[i], line_start + line_length)
	}
}
END {
	for (file in files) {
		language = type_of[file]

		shebang = ( \
			file in file_shebangs ? file_shebangs[file] :
			(language in shebangs ? shebangs[language] :
			 ""))

		if (shebang) {
			print shebang > file
			system("chmod +x " file)
		}

		if (language in prologues)
			print prologues[language] > file

		line_count = length(files[file])
		for (i = 0; i < line_count; i++) {
			resolve(files[file], i)
			print files[file][i] > file
		}

		if (language in epilogues)
			print epilogues[language] > file
	}
}

One consequence of this approach is that you can no longer tangle without this script and expect the same result. In other words, the required settings are no longer in the documents themselves (nor in setup files referenced by them). So just running org-babel-tangle (interactively or otherwise) won’t produce the same program as the build, which uses these settings. This is acceptable as a consequence of our view that build operations should always be automated.

We rely on tangling to create the document structure from scratch as needed. The mkdir -p means that this is equivalent to setting =:mkdirp “yes"= on all code blocks.

3.3 language-specific settings

These should be written in reference to the awk versions, which should be extracted to a separate file.

3.3.1 shell scripts

We always use bash with fail-on-error. The shebang header makes the tangled script executable.

tangle-setup next
elisp next

(set-babel-header :shebang "#!/bin/bash" "sh")
(set-babel-header :prologue "set -e\n" "sh")
(set-babel-header :epilogue "\nexit 0" "sh")

3.3.2 XSLT

Most XSL transforms can be wrapped this way. If you don’t want it, just set the language to xml instead.

previous tangle-setup
previous elisp

(set-babel-header :padline "no" "xsl")
(set-babel-header
 :prologue
 (concat
  "<?xml version='1.0' encoding='utf-8'?>\n"
  "<xsl:transform version='1.0'"
  " xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>\n"
  "<xsl:output omit-xml-declaration='yes' />")
 "xsl")
(set-babel-header :epilogue "</xsl:transform>" "xsl")

Note that if the transform uses an xsl:import, this will technically be incorrect since the import is supposed to be first. Some XSLT processors (such as the xsltproc that shipped with my system) will ignore such errors with a warning. I’m not saying you should rely on this behavior, but I’m not saying you shouldn’t, either.

4 subsystems

TBD write about subsystems.

Which means talking about Tup groups. Thanks to Freddie Chopin for tipping me off to this undocumented feature.⁵. See also the issue about detection of new inputs⁶, although I think that’s largely mitigated by the more systematic use of groups now in place. Still, using a group is (currently) the only way to make Tup respond to a new generated input (or maybe any new input) that matches a glob.

4.1 lateral

4.2 vertical

5 unfiled

5.1 files

Most of the time, the system is dealing with files.

File systems are probably the world’s most successful data structure. Every computer system that people encounter directly has some kind of hierarchical file system. Programs may take it for granted.

In the computer world, file systems are so ubiquitous, they are like water to a fish.(Insert Kay/MacLuhan thing.) They are invisible.

5.2 the repository

Kind of goes under “files.”

But how is it
That this lives in thy mind? What seest thou else
In the dark backward and abysm of time?
If thou remember’st aught ere thou camest here,
How thou camest here thou mayst.
The Tempest

I’ll let you in on a little secret. Programmers have time machines. Yep, programmers have been using time machines for many years now, and those who started using them more recently, don’t know how they ever lived without them. (And they’ll never know, because the funny thing about programmers’ time machines is that you can’t go back to before you started using them.)

These “time machines” are also known as version control systems. Version control is kind of like a file system with lots of “undo.” And a revision history. And branching and merging and tagging and patching and a whole mess of other madness.

Anyway, I’m not here to talk about version control systems, except insofar as they concern willshake.

Of course, willshake uses a version control system. But not quite the same way that software projects usually do. The main difference is that willshake’s repository is always disposable.

By “disposable,” I mean that the revision history—either the messages or the changes themselves—MUST NOT be used to carry any information about the program.⁷

Unless you’re just fixing a typo, there’s always going to be some explanation of the changes being made. That’s what at issue here. Of course, if you work on a team where someone else reviews your changes, then including that explanation in the commit message is a courtesy, if not a requirement. But the explanation should also be included with the code itself, even when you’re explaining why something isn’t done a certain way, or is no longer done at all. In other words, the program itself should contain everything relevant to the choices that were made—"how thou camest here,” how things came to be this way.⁸

Literate programming doesn’t make this more important, it just makes it more idiomatic. As commentary goes out of date, it can be relegated to footnotes, or an appendix. If comments ever become truly obsolete, and of no possible value whatsoever, then they can be deleted, meaning lost forever. If you keep revelant information in your revision history, there’s no way to distinguish it from dust.

The point is that willshake keeps “bootstrapping the next version of itself” (to paraphrase Alan Kay). It’s already outlived several repositories, each one lasting a few years. I’ve re-made the whole thing several times, nearly from scratch, and at some points I just wanted a clean start. The point is that those old repositories have no possible relevance anymore. I’m confident (and you reading this should be) that if it matters, it’s in here now.

The only exception I can think of is that a repository could be used to provide permalinks to specific versions of documents. But such a thing would be a long way off, and may be obviated by the Internet Archive, anyway.

5.3 the build

First Clown: What is he that builds stronger than either the
mason, the shipwright, or the carpenter?

Second Clown: The gallows-maker: for that frame outlives a thousand tenants.

—Hamlet, 5.1

The build is a process. It means what it sounds like, which is, how we get from the blueprints to the building.

No one cares about the build. It’s not a feature. Like many topics in software, it’s just the residual complexity of someone else’s bad decisions. So we’ll first talk about why software needs to be “built” at all. We’ll argue that a build is actually called for in our case. Then we’ll talk about build systems and why we’ve chosen the one that we’re using.

This document describes a process for building willshake. We won’t get into too many specifics; instead, we’ll just establish a way of adding things, and other documents will do the adding.

Our build process is somewhat unusual. It’s more of a philosophy than a process, really.

The good news is that, since “the build” is not an actual program feature, this document doesn’t contain any program code at all. Feel free not to read it. It does explain some things that may seem odd about the actual program documents.

Building is a necessary evil. That’s how much we want to think about it. We don’t want to think about the building process any more than is absolutely necessary. That’s our objective.

So—given that a build system is necessary—we will now describe the “perfect” build system. Then we’ll do it.

5.3.1 some preliminaries

Before we get into the build itself, let’s address a few questions.

what does the build do?

On a computer system, what does it mean to “do” something, anyway?

Usually, when we talk about “doing” something on a computer, we are just talking about changing the “state” of the machine in some way. Some of those state changes are invisible to us: changing values in memory, writing to a storage device, sending and receiving messages on a network, broadcasting or receiving infrared signals, capturing video, recording sound, taking biometric or fingerprint scans, and so forth. Other state changes have human-discernible effects: coloring pixels on the screen, producing sounds, vibrating, administering shocks, remote controlling a drone, printing a tax form or a heart, and such like. In computing, just as in criminal law, we may speak about “intent.” Those actions that do not directly fulfill a program’s intended requirement are called “side effects.”

the build produces files

The build’s intent—what it promises—is to put your machine into a state in which it contains the “final product.” When it succeeds, the build produces (or updates) files, along with visible messages indicating its status. When it fails, it should gently inform you why.

Everything else is a side-effect, and should be benign.

what does the build not do?

The build may not modify its input files (the human-authored ones), or any other file not designated as a build output (or one of its temporary files).

but where are those files going to go?

On most systems, there are understood conventions about which locations may be modified when building software. Such conventions exist partly to answer our present question, where should a person expect to find the thing that was just built?

we build somewhere within this copy of the project

As a rule, we would prefer not to assume anything. We don’t want to know what planet we’re on, let alone how the person’s files are organized. (Yes, we have to assume a huge amount for this to work at all. But we try not to make any avoidable or undocumented assumptions.)

We will only assume that the folder containing the source documents has a certain layout, the top of which we call the “root” of the project, and that we may reserve the right to create new locations within it and use them. This is a safe assumption to the extent that the source documents are distributed as a complete package (usually as the clone of a “repository”). We can even use license terms to require those conditions on unofficial distributions. It’s safe to assume those will be honored!

Also, as we will see later, the build outputs need to share an ancestor location with their inputs in order to efficiently track their state.

what we build can be washed away at any time

Everything produced by the build is disposable. Also known as “build artifacts,” or “generated files” or “derived state,” it is the reproducible output of a deterministic transform on a discrete, static input (viz, these human-authored documents).

In other words, you can always make the machine do the work again and expect the same result. No big deal. It is always safe to obliterate the building, and sometimes it even feels good. After all, we’re talking about sand castles here (also known as “software”). Years ago, it was enough to rebuild once a day⁹. Nowadays you’re a slug if you don’t rebuild “continuously.”

machine and human work shall not mingle

So now we have human-authored work and machine-derived work living together in one place, our “project directory.”

If this makes you uncomfortable, don’t worry. We are going to officially segregate them. The machine-generated work is not part of the project, as far as history is concerned. We don’t commit it to the repository, and thus we don’t track changes to it. Why would we? The whole point of the build is to make sure that it’s replaceable.

The following two paragraphs are provisional. While Tup does have a feature for generating an ignore list, it is only supported for the Git revision control system, and not for Mercurial, which is used by this project. However, since the existence of the .gitignore feature means that the Tup project is not opposed in principle to such a behavior, we’ll assume that we can get this feature provided for Mercurial upstream, and as such we’ll maintain the ignore list manually in the meantime. An alternative (that still allowed us to avoid a non-literate file) would be to tangle together the ignores as needed and build the .hgignore file like anything else. This would be perfectly acceptable, but Tup doesn’t allow you to include “hidden” files in the build graph.

Fortunately, we can effectively “ignore” this problem, since we’re using a build system that knows exactly which files are generated. With that knowledge, it can maintain a list of the disposable files. To take advantage of this, we just have to use a special directive in our Tupfile.

Now anywhere that we “break ground,” so to speak, Tup will add the path to this “ignore” list. And after we build, we will not see the generated files appear as “new” or “modified.”

5.3.2 reminder of what a build system is for

When the source documents are written or edited, the human part of the development cycle is done. Yet the source documents are not directly executable. So more work is necessary to carry out the person’s intent.

The goal of “the build” is that anyone with a copy of the source documents should be able to execute a single command that will produce willshake’s deliverables (“final product”) on the local system (i.e., the person’s computer).

In development of willshake, the build system is our go-to place for the formalization of any mechanical task.

5.4 prerequisites

For now, we are aiming somewhat short of that lofty goal, in that certain up-front preparation will be assumed on the part of the builder, in particular, the installation of the external tools used to carry out the build. Because of the great diversity of development tools and package management systems, it usually falls to the developer to install at least some part of the required “stack,” despite efforts to standardize distribution of the most commonly-used tools. In any case, such preparation is a one-time step, and once it is done the build can in fact be automated.

At this time, the tools required for the build are:

emacs: an emacs with org-mode is required to tangle the source documents.
tup: a copy of the Tup build system
node

I said at the beginning that the installation of the required stack was outside the scope of this program, and it is. However, I do need to at least document the dependencies somewhere. In “publish the documents” there are some notes about installing LaTeX. The following bits probably belong under “the web platform.”

5.4.1 node

You need to first have “node,” from Joyent. I recommend using nvm to install it so that you avoid permissions issues. As such, the following commands assume that you can install packages globally without root privileges.

sh next

Get node packages

npm install -g stylus
npm install -g autoprefixer
npm install -g postcss-cli

You need the autoprefixer package to use postcss with the --use autoprefixer option.

5.5 continuous building

Can you make a build in one step?

—Joel Spolsky, “The Joel Test : 12 Steps to Better Code”

Yes, Joel, we can:

previous sh next

Our one-step build

tup

That’s it. Our one-step build is tup. Someone who gets the program can just run that command, and voila. To be even more explicit, a real one-step build means that someone could type the following in a shell:

shell

Soup-to-nuts willshake. VC particulars are changed to protect the innocent.

$ hg-or-git clone https://vc.willshake.net/blessed-repo willshake
$ cd willshake
$ tup

This works from anywhere in the project directory. The benefits of a single build command are so great that we will sometimes bend over backwards to support it. If we do something that looks rather out-of-the-way, the answer is usually “because Tup.” But though this be madness, yet there’s method in it.

When tup’s work is complete, there will be the “final product.” Everything that is willshake will be on the local system: the edition compiled, all documents published, all media downloaded and processed, the web site running, etc. You still have time to get to the theater.

Let’s just take for granted that tup is the fastest and most correct build tool available. Tup can be a tough master, but as long as we serve it, it will serve us. As long as we make the program so that it can be built by Tup, we’ll have the most reliable possible build.

So we have the simplest possible one-step build. But we’re going to do better than that:

previous sh

Our zero-step build

tup monitor -a

Now we don’t have to do anything at all. The program will simply get rebuilt as we change the documents. This is why it’s important to have the fastest possible incremental builds. The faster our incremental builds, the more closely we can approximate “live” programming.

5.6 tangle the code

As we have said, “the program” comprising willshake exists as a set of documents. In order to turn that into code, we have to “tangle” the documents, in the parlance of literate programming.

Some of willshake’s documents don’t contain programs. It doesn’t matter. All of the documents are together in one folder, and if a document doesn’t contain any code or build rules… then it doesn’t.

5.6.1 deterministically derived state

Beautiful code looks like a directed, acyclic graph

https://news.ycombinator.com/item?id=9439626

This is the real beautiful part. We want to get to this stage as quickly as possible.

Given the need for some derived state in the file system, we would ideally have just two things:

A declaration of the transformations to be made
A command that says “update everything”

If we didn’t care how long step two took, we would just say “build everything,” i.e. “build everything from scratch.” And it is important that we can always build everything from scratch. But as the size of the project grows (irrespective of its complexity), the time needed to build everything will increase. So in practice, we care very much how long this stage takes.

That way, as we make changes to the program, the system will rebuild only what is necessary to reflect those changes.

So what we really want is a command that says “update everything,” also known as an “incremental build.”

This need to balance speed and correctness is the reason for much of the complexity of build systems.

But what is the “declaration of transforms”? Of course, we wish to express our intent in the most natural way. The ideal language would give us just enough power to do that and no more (principle of least power), minimizing the difficulty of declaring and reasoning about it. The “directed acyclic graph,” commonly used in build system, is such a construct.

The other consideration is whether the graph is static or dynamic, that is, can the graph change while it is being evaluated?

The short answer is, we use a (basically) static dependency graph.

5.7 organization of the program documents

The whole point of literate programming is to write for humans foremost, so the arrangement of the material will naturally be governed by that aim. However, since the documents are thereby the source of all source code, some incidentals must be considered, with respect to the build process.

Currently, willshake uses Org Babel as its literate programming platform. Org Babel is an Org Mode extension written by Eric Schulte that extends the capabilities of source code blocks with many features, including evaluation and tangling. It is included with Org Mode, which is included with Emacs, which is (usually) included with GNU/Linux. As such, it is as close as anything to a “commonplace” literate programming platform.¹⁰ If nothing else, this increases the likelihood of its being well-maintained.

But I would stress that the choice of Org Babel is not essential to willshake. I do not rule out the possibility of implementing a custom system at some point. While Org Mode is both powerful and convenient (no small feat), it does have some hard limits, not least of which is a dependency on Emacs. It is also not clear how Org documents could be extended with novel structures for special situations, such as inline blocks for marginal comments, or side-by-side listings. There are irregularities with its formatting syntax, and some combinations of bold, italic, and link text, for example, are impossible. Finally, it’s kind of slow—at least tangling. We’ll come to that in a moment. For now, though, there a million upsides to Org Babel, chiefly that it is an outstanding authoring format (at least for anyone who already lives in Emacs).

5.7.1 special considerations to improve tangling speed

Fast builds are extremely important. Indeed, it’s kind of a bug that we have builds at all. But that’s a story for another time. As long as our program is based on source blocks in document files, we will have a build process, and we need it to be as fast as possible. Still, we’d rather not make a mess of things on that account. Ideally, any design choices—even when they are motivated by a desire for fast builds—should be worth doing in their own right.

Here we will describe a few principles that will inform our arrangement of program documents, especially with respect to their tangle targets and internal references.

In short, the program documents should be constructed as modularly as possible, where “modularly” means that smaller lexical scopes are preferred in the use of “noweb” references, and where “as possible” means, without alterations that would not otherwise improve the readability of the document.

In practice, the purpose of this modularity is to support composition of documents by the use of #+INCLUDE directives, which is in turn designed to support faster tangling. Since Org Babel does not resolve includes when tangling, we can achieve much more granular (and hence faster) tangling by reducing the number of files that are output by a given document. The #+INCLUDE’s allow us to make the “woven” documents as long as necessary. Since the “weave” process is not necessary for ordinary development, this is a perfectly workable solution.

The “downside” is that we cannot make as extensive a use of noweb-references. That is, we cannot build a single output file from references in different documents. This is already the case, of course, without INCLUDE’s, which calls for some techniques that might be called workarounds (such as writing multiple files and then concatenating them with a build rule). Applying the same restriction “internally” to documents will presumably call for more such workarounds.

However, these notes will probably be turned on their head once we have such a practice in place. It’s likely that such a structuring (with fewer placeholders and more granular scopes) will actually improve the construction of the program in the long run. Then, we can restate the above as a matter of fact, rather than as a proposal.

5.8 the dependency graph

Tup allows you to graph the dependencies. This can be useful. Here, I’m keeping some notes related to that.

First thing you notice with Tup is that it puts everything in the graph, like every file in the project. You can ask it to graph just one target, but we’re looking for an overview. Something you can hang on your wall.

Using the --combine option substantially cleans up the graph. In effect, it depicts foreach-rules as single entities. But on inspection, it cleans up the graph a bit too much: you don’t see a lot of the tangled output files. If you limit the command to a particular directory (particularly, this one, documents/program), you get the most interesting results.

6 the future

6.1 separate Tup projects for faster builds

This system doesn’t deal optimally with large file sets. For example, I have thousands of images in the catalog. I use multiple foreach rules to process them. This means that, once the file lists are expanded, there are tons of rules. In theory, Tup is designed to handle this efficiently. However, my rules are all coming from the “same” Tupfile (via the run script). So in practice, every change that I make “requires” the entire vector of image-related rules to be re-checked. I don’t understand exactly why this is, but I know that it adds a huge amount of work to incremental builds, even when the file monitor is running. Looking at Tup’s debug-sql log after a minimal change to one document, there are e.g. four SQL queries related just to one command for one image file. And there are three or four rules that will apply to each image. I’ll never have satisfactory build times as long as this is the case.

I’ve tried breaking the image-related rules out of the documents (even though that’s cheating), and it doesn’t help, since the documents contain rules that take the output images as input (the image projections). Not to mention rules that ship the images. So the line between where the “catalog” and the “documents” is not quite sharp.

I want the system to be as simple as possible, but I also want it to be as fast as possible. I really want “normal builds” to be under 100ms. Where “normal build” means some small change to a single document. Right now, having shelved thousands of unused images, I’m back down to sub-second builds (not counting document exports, which are a separate matter). I’d have to study Tup more to know whether any improvement can be made in this case.

6.2 “live” programming: better than the alternative?

How many people here still use a language that essentially forces you in the development system, forces you to develop outside of the language: compile and reload—even if it’s fast… so if you think about that, that cannot possibly be other than a dead end for building complex systems.

(Yes, if the build takes more than one second, it’s too slow.)

Let’s blue-sky for a minute.

Forget everything you know about programs.

What would programs be like, ideally? Alan Kay—maybe because of his background in microbiology—often talks about the most interesting systems as being like living things: adaptability, the drive to survive and evolve. Can we expect such traits from programs that are compiled from source? Can we imagine living things that are produced from code, as from a template? Or are living systems—to the extent that they have anything to do with code—already bootstrapped. That is, it seems they’d have required a working system to get started in the first place, but that working system is the product of the life process. It’s the chicken and egg problem.

It’s worth stopping to think about this because not all software has to be “built” as such. There are systems where a program looks exactly the same to its creators as it does to its users at all times. The most notable example is Smalltalk, created in the 1970’s by Alan Kay and Dan Ingalls. Smalltalk programs are always running and are always modified in-place. This is sometimes called “live” programming. Smalltalk programs are truly live systems in the sense that there is no other way to modify them except by direct interaction. The program is a full binary image that includes its running state and all of the runtime mechanisms needed to keep it alive. This is radically different from the alternative, which we might be forced to call “dead” programming if it were not the overwhelmingly most common practice. As such, we can get away with just calling it “programming.”

Still, we will not appeal to common practice. We will choose what we choose on the merits. Just because live systems are not widely used today does not mean that they may not be better for some applications. The same could be said for literate programming, which we use. Of course, nothing is perfect. By the 1980’s, even Alan Kay felt that Smalltalk was ready to be superceded by something else.

Ultimately, we’d like something that balances the benefits of live programming with other benefits. See, of course, Alan Kay. https://www.youtube.com/watch?v=ubaX1Smg6pY&t=1h6m25s

This concerns the building process in the sense that we’d rather keep things as simple as possible. Live systems don’t have a separate build step—they just are. Whereas in this sytem, we only preserve the documents.

In live programming, process and product are a little more than kin.

6.3 a bad analogy

We might ask, why do we need to “build” anything?

Why shouldn’t I just give you something that is usable as-is?

Let’s use a food analogy. If you come into my restaurant, you can order prepared food. The preparation goes on behind closed doors. So it is with software. Your software is prepared behind closed doors.

“Users” don’t need to build anything—they just use what we give them.

But our focus here is teaching how to build. If you want to learn how to make the software, how to work on it, how to prepare it, that is a different matter.

Programming is writing. But we make the distinction between a program that runs and a program that builds the program. A build program is like a recipe.

6.4 an in-house literate system

Traditionally, “literate programming” involves basically two processes: tangle and weave. Org Mode provides both of those features, but, as discussed earlier, willshake does its own tangling for the sake of speed.

At some point I expect that I’ll implement the export of documents as well. Speed is definitely an issue. But this would also give me freedom to add and remove features as I see fit. It would also drop the rather hefty dependencies of Emacs and Org Mode. Of course, it would behoove me to make something that runs inside of the browser (i.e., in javascript).

“Quaint” looks interesting.¹¹ It’s an “extensible markup language” written in Javascript, apparently just to support the documentation of the “Earl Grey” language.¹². Its author also created an “incremental build system” (called “Engage”) that “tracks all reads and file changes"—like Tup.

Footnotes:

“system,” Online Etymology Dictionary. http://etymonline.com/index.php?term=system

http://orgmode.org/

To paraphrase Sonnet 1.

⁴

The Tup Manual. http://gittup.org/tup/manual.html

⁵

https://groups.google.com/d/msg/tup-users/F0s62gAkF9A/d0rIj42PAwAJ

⁶

“tup doesn’t recognize new files” GitHub (2014) https://github.com/gittup/tup/issues/214

⁷

If you enjoyed reading the words “MUST NOT” in all caps, you’d really love reading RFC’s. http://ietf.org/rfc.html

⁸

See the note “Forget Me Not” about a case of this in Emacs, which was recently moved to a new repository system (preserving its history). See also the discussion on Hacker News.

⁹

Joel Spolsky, “Daily Builds Are Your Friend”

¹⁰

That said, I note that there are programs called weave and tangle already installed on my system, which appear to be an implementation of Knuth’s original WEB system. To wit, WEB supports only Pascal as a target language and as such is, despite its inclusion with a mainstream Linux distribution, presumably in little use today.

¹¹

http://breuleux.github.io/quaint/

¹²

https://breuleux.github.io/earl-grey/