text of Shakespeare’s works

In the beginning were the words. And the actors said the words, but lo, they said the words wrong, and so the playwright was frustrated, and finally the solution was to write the words down, and give each person a “roll” of paper from which they could read their words. That’s literally where we get the word “role."1

Beyond performing this necessary function, it’s not clear that Shakespeare cared much about the longevity of his written or printed words, notwithstanding his apparent obsession with the immortality afforded by poetry.

O heavens! die two
months ago, and not forgotten yet? Then there’s
hope a great man’s memory may outlive his life half
a year: but, by’r lady, he must build churches,
then; or else shall he suffer not thinking on, with
the hobby-horse, whose epitaph is “For, O, for, O,
the hobby-horse is forgot.”

Hamlet 3.2

And then there’s this bit in JC, in which “this our lofty scene” is a wïerd kind of self-reference:

                             How many ages hence
Shall this our lofty scene be acted over
In states unborn and accents yet unknown!

Sonnet 16:

So long as men can breathe or eyes can see,
So long lives this and this gives life to thee.

1 we need a text

Clearly, in order to bring people the beauty and pleasure of Shakespeare, we need a text of the works.

So, which one?

What are our options?

1.1 the MOBY

The MOBY is what everybody uses, and we could go on and on about that.

http://www.opensourceshakespeare.org/info/moby_shakespeare.php

We’d rather not use the MOBY, since it’s bad. Its faults are many:

  • lots of errors. Some of these are explainable by bad OCR, some are just uncaught errors.
  • old-style spelling. Yes, it’s “modernized,” but for a century ago
  • the punctuation is sheer nonsense. It is positively lousy with colons (and semicolons). It takes source texts that are already cancerous with colons, and then adds more – yes, the MOBY uses colons in places where F1 had a perfectly good period or comma. It’s insane.
  • the part names are in ALL CAPS in the s.d.’s. Why? Because italicizing part names in s.d.’s was a convention, and since you can’t get italics in plaintext, it was a convention to do this with uppercase instead.

1.2 the Folger

The Folger text is publicly available for non-commercial use, in XML form. Since it’s the result of many person-years of scholarship, this would seem like a perfect option.

But it’s a black box – they only release the product, not the process. We don’t get any explanation of what is what. We have no reason to suspect that it’ll ever be really open.

So we’ll use it as an “input” (explained later), but not as a “source.”

1.3 Internet Shakespeare Editions

Indeed, the only thing that passes for a “source” text in Shakespeare is one of the folios or quartos. After a certain point, all published versions of the plays were considered “derivative,” so only these originals are considered to have any authoritative value.

The University of Victoria provides these original publications in both facsimile and transcribed editions. It’s not clear that all of them are proofread, but they are all digitized.

So lacking anything else, we may start with these editions as our source texts.

2 but people don’t use source texts anymore

But source texts are no longer used for a reason. Actually a lot of reasons:

  • spelling has changed
  • punctuation has changed
  • some of the layout (and hence line breaks) was an artifact of the printing process
  • many (implied) speech prefixes are missing
  • many (implied) stage directions are missing
  • they contain few scene change markings
  • they contain many apparent errors
    • which in many cases can be rectified by collating an alternate printing (i.e. a “quarto”)

3 the willshake edition

Hence we propose to use our own edition, which we will edit as we see fit.

The way that it works is basically this:

how-editing-works.svg
Figure 1: How editing works

3.1 editorial changes

What kinds of changes can the editor make?

  • “modernize” spelling
  • “modernize” punctuation
  • choose variants from alternate versions where you think there’s good reason
  • add scene headings where you believe the scene changes
  • add stage directions that you believe are implied
  • regularize speech prefixes
  • format special sections (e.g. letters, songs)

3.1.1 emendations

Only in very rare cases would you change a word to something which had no basis anywhere. This is called an “emendation,” and thousands have been suggested over the years. Some have become more or less accepted, so accepting these is mainly a matter of demarking the deviation from the copy text and crediting whom you believe to be its first proposer.

4 storage format

Again, the long-term plan is to create a lower-level language for collating source texts with editorial changes. But for now, we just maintain a text “directly.”

The format of the text is governed entirely by convenience. Given that it will eventually be the output from a prior process, it is seen as derivative in that respect.

What, then, is the most convenient way to store the text? On the web site, we can view the text of one section at a time (for reasons discussed elsewhere, link TBD). In situations where transmission time matters (such as when using a web server), it makes sense to deliver the text by scene. If we delivered the whole play at once, much time would be spent transmitting text that is not for immediate use and may not be used at all.

play-text-delivery.svg
Figure 2: We deliver the plays one scene at a time

We’ll also want the full-text documents, for other reasons (such as indexing). So we have two choices: we could store the full play documents as the “master” and split them up during the build, in which case the individual scene files would be “throwaway” files. Or we could store each section in a separate document as the “master” and combine them during the build, in which case the full-text files would be “throwaway” files. The latter is more consistent with our general practice, and (or because) it allows us to write the build rules in the documents, as we will do presently.

5 collate the plays

I confess that this seems a little backwards, to have the plays split up, only to put them back together again. Again, this is to be considered temporary.

: foreach $(PLAYS)/*.xml \
  | $(PROGRAM)/* \
    $(PLAYS)/sections/%B-* \
|> ^ collate %B^ \
   $(PROGRAM)/collate-play %f > %o \
|> $(ROOT)/collated/plays/%B.xml
BEGINFILE {
	skip = 0
	match(FILENAME, /(.*)\/([^/]*)\.xml$/, path)
	dir = path[1]
	play = path[2]
}
match($0, /<section key="([^"]+)"\s*\/>/, key) {
	section = key[1]
	file = dir "/sections/" play "-" section ".xml"
	system("cat " file)
	skip = 1
}
! skip { print }
{ skip = 0 }

6 notes on the schema

6.1 sparse prose indicators would be more efficient

Currently, a line or a speech can be marked as prose. (Also currently, this information is not used.)

In most cases, it would be more practical to demark the boundaries of prose and verse, so that all speeches within those boundaries would enjoy that status. The question is how to do this in a way that still admits of efficient processing, and without an awkward structure. Most of the multi-line prose speeches are marked. But I will shortly enable the wrapping of prose speeches for the sake of inline speech prefixes, and this calls for single-line prose speeches — especially long lines — to be marked as such.

7 issues

7.1 BUG MM 4.1 missing song?!?

See the f1 sceti facsimile pp 0093.

7.2 BUG errata

Reviewing Rom 3.5 LEFT OFF at “O Fortune”

Notes indicate that I was also in the middle of reviewing Ant 2.6: Catches, some s.d.’s.

[ ] Shr 4.2
“and as for as Rome;” (s/b as “far”, right?)
[ ] Ado 2.1
“pierce” of valiant dust
[ ] AYL 1.1
“nearer to his reverence,” Living edition says “revenue,” doesn’t it (which makes more sense), which is it?
[ ] TN 1.5
Lady, you are the cruell’st she alive, “cruellest”, unless it’s contracted in source
[ ] LLL 5.2
Fair as a text B in a copy-book. “text B”?
[ ] R2 4.1
The shadow or your face. s/b “of”, right?
[ ] Ham 3.4
Gertrude disappears in timeline
[ ] (no term)
http://ws/plays/Ado/5.2#fed_and “fed and gone” s/b “fled” right?
[ ] (no term)
http://ws/plays/TN/3.1#So_thou “lies” by a beggar I guess should be “lives”
[ ] R3 3.7
“But penetrable to your. kind entreats,” stray period
[ ] (no term)
Thou art pinch’d fort (Tmp)
[ ] (no term)
Right now all the links appear to the previous section in resources/Complete_Essays_(Warner)/Contents.
[ ] (no term)
“a story of slaver” http://ws/resources/My_Literary_Passions_(Howells)/XI-Uncle_Tom's_Cabin
[ ] MND 1.1
“momentany”. Q1 has it this (and so does Folger); F1 reads “momentarie”
[ ] (no term)
“if Peste is as witty as Touchstone…” http://ws/resources/An_Introduction_to_Shakespeare_(Durham)/XI-Second_Period_Comedy_and_History#ws-TN (note this is also copied in point)
[ ] Tmp 2.2
Caliban’s speech starting at 2.2 is screwed up. There are missing lines after “cloven tongues”.
[ ] (no term)
how came it that “Claudio as executed” at an unusual hour, http://ws/resources/Beautiful_Stories_from_Shakespeare/MM
[ ] 1H6
many speeches have role Richard_Plantagenet and s.p. “York” (I think this is correct)
[ ] Cor 5.3
check Riverside re question marks (or not… this is now part of review process)
[ ] (no term)
hieroglypnical? see Hazlitt lectures, re Tro, it looks like
[ ] (no term)
1628 s/b 1623, see archive version: http://ws/resources/An_Introduction_to_Shakespeare_(Durham)/(title_page)
[ ] Rom 2.2
crux at “Nor arm”. Folger reads as F1 and Q1, but see MOBY takes a “part” line from Q2. Riverside?
[ ] Rom
uncertain catches. Not sure about catches in nunnery scene, don’t have Riverside handy
[ ] WT 2.1
“Who taught you this”. ISE F1 has “Who taught ‘this”; Folger has “Who taught this”.
[ ] Ham 3.2
“I will the king” -> “Will the King”. The “I” is not in ISE F1 and Q2.
[ ] Ham 3.1
did you assay him? is one line and sentence in both sources
[ ] Tmp 4.1
The dropsy drown this fool I -> The dropsy drown this fool!; strong’st suggestion. -> strong’st suggestion
[ ] Mac 4.3
But not a niggard -> be not
[ ] Cym 1.6
“And yet of moment to”, s/b “too”, right?
[ ] Err 5.1
and maybe throughout, Aemelia’s speeches aren’t linked to her
[ ] Tim
has one role called “Lucilius” and one called “Lucullus”

7.3 BUG commas touching apostrophes

Not sure if this is still true, but I’d noted at some point that if you search for

,'

there are a bunch of cases where a contracted word follows a comma, with no space.

7.4 BUG other notes

anchor for this line isn’t right The point!—envenom’d too!

“Thus didest thou.”

TGV PROTEUS Than men their minds! ‘tis true. O heaven! were man s/b one line, right?

Shr, IND Your honour’s players, heating your amendment s/b “hearing”, eh?

Johnson’s note on “Enough to press a royal merchant down” goes to wrong line

regarding Johnson’s note to TGV 5.4 Theprivate of course someone cursing time has no regard for meter duh

R3 4.5 And Rice ap Thomas with a valiant crew;

III.I.45 Miranda’s line appears to include her sp: Miranda.—O my father,

AWW Not my virginity yet: ff

Something may be missing here. It is not clear what Helena means by “There” in the next line.

LLL 5.2 And what to me … beds of people sick.

Riverside notes that these lines represent an earlier draft of the speeches that begin “Studies, my lady?” further down. They appear in both Q1 and F1.

LLL 4.3 And where that you … learning there?

These lines (which appear in both Q1 and F1) are apparently an earlier draft of the lines that immediately follow, starting with “O, we have made a vow to study, lords.”

Cor 2.3 And Censorinus … chosen censor

These lines are emended to fill an apparent gap in F1. Like the Riverside (which follows a different emendation by Delius), the inserted text “rests upon the passage in North’s Plutarch which Shakespeare was following closely at this point.”

Ham 3.4 And either … the devil

There appears to be a word missing in Q2, which reads, “And either the deuill, or throwe him out”. “Tame” is a conjecture here, which I prefer to any others I have seen.

Ham 4.1 And what’s untimely done, ff

This line is apparently incomplete. Q2 reads “And whats vntimely doone, / Whose whisper”. Capell’s conjecture, “so, haply, slander,” is provided here to bridge the sentence.

MM 1.1 But that to your sufficiency … And let them work.

When the Duke says “let them work,” it is not clear what he means by “them”; between that and the awkward rhythm of the previous line, it would appear that some words are missing after “sufficiency” or “able”.

R2 2.1 Thomas … Arundel

This line was added by Hudson based on the passage in Holinshed that Shakespeare was following here, primarily so that the attribution in the following line will have the correct antecedent.

TN 3.3 And thanks … good turns

Some editors (including those of the original MOBY text) believe that this line is incomplete. Although there are many ways to add a beat (such as “And thanks, and [ever] thanks”, as Theobald has it), it can be understood as it is.

7.5 TODO review NO-ENTRY markers

At some point, I wrote a script to mark all instances of a speaking part without an explicit entry. Most of these are nameless “First Lord” types, but some look like errors. There are about 740 of them in all.

8 roadmap

8.1 PROPOSAL un-apostrophize -ed words that would be the same now

In other words, change “banish’d” to “banished.” There’s no need for an apostrophe to indicate that the “ed” is not a separate syllable, because it wouldn’t be anyway. This is the converse of adding an accent for cases where it’s needed.

8.2 PROPOSAL capitalize words in the plays after terminal punctuation

In many cases, a capital is not used after question marks and exclamation points. It just looks wrong, and I think there’s no special textual basis for it.

This will mess up a lot of references, though.

8.3 TODO add TNK (The Two Noble Kinsmen)

It wasn’t in the MOBY, and I’ve never added it.

8.4 IDEA formalize asides

Even though there’s basically never “authoritative,” as long as you’re going to include editorial stage directions indicating “Aside” and “Aside to so-and-so,” you might as well do something with that information—demark it in some way. It’s borderline, for sure.

8.5 TODO modernize spelling of to-morrow and related words

8.6 TODO demark songs

This would be useful in a number of ways:

  • formatting as verse
  • mark where songs occur on map or timeline (especially where you have audio)
  • index of songs

8.7 TODO need a process for validating external line references

This doesn’t really belong here, but it’s about having a way to “cascade” changes to line keys, which can come from anywhere. At least for willshake’s own internal references, these should all be good.

8.8 stage directions still don’t have an anchoring scheme

That guarantees uniqueness as with lines. It’s a little thornier in that the text is largely editorial.

8.9 enable generic linking to roles in stage directions

Whereas right now you can only do so in an enter or exit.

Of course, the links are unusable as long as the roles section is not implemented, but that’s another matter.

about willshake

Project “willshake” is an ongoing effort to bring the beauty and pleasure of Shakespeare to new media.

Please report problems on the issue tracker. For anything else, public@gavinpc.com

Willshake is an experiment in literate programming—not because it’s about literature, but because the program is written for a human audience.

Following is a visualization of the system. Each circle represents a document that is responsible for some part of the system. You can open the documents by touching the circles.

Starting with the project philosophy as a foundation, the layers are built up (or down, as it were): the programming system, the platform, the framework, the features, and so on. Everything that you see in the site is put there by these documents—even this message.

Again, this is an experiment. The documents contain a lot of “thinking out loud” and a lot of old thinking. The goal is not to make it perfect, but to maintain a reflective process that supports its own evolution.

graph of the program

about

Shakespeare

An edition of the plays and poems of Shakespeare.

the works