Jane Austen (a TEI project)

March 27, 2012 in Afternoon, Meeting, Office

I’ve just ended a meeting with another PhD student who agreed to help us out with the Jane Austen’s Fiction Manuscripts project. The project was completed last year, but we maintain the site and occasionally correct errors when they show up. I introduced the student to what the workflow used to be and pointed him to the right resources to understand how the project works. He has advanced understanding of TEI and XSLT, so it was a relatively quick meeting.

The project presents high resolution images and diplomatic transcriptions of the manuscripts (including the private one that has been recently sold for $1.6M – you can read the transcription and look at the facsmile on the site).

The transcriptions are encoded in TEI, after an encoding model prepared by Dr. Elena Pierazzo. The model is structured to record a large amount of details about the manuscript text, including spelling and punctuation subtleties, pasted patches and authorial revisions (this preliminary work likely contributed to the making of the new TEI Genetic Module). Many of the encoding problems (mainly: where to stop?) are discussed in

  • K. Sutherland, E. Pierazzo. ‘The author’s hand: from page to screen’. Collaborative Research in the Digital Humanities. M. Deegan and W. McCarty (eds.) Ashgate: Aldershot, 2011.

My work on the project focused on writing a group of XSLT transformation for generating the HTML view. Because of the complexity of the encoding, this was harder than it sounds, with the XSLT code reaching about 8000 lines in total. Unfortunately, to my utter disappointment, the TEI for this project is not exposed or accessible on the website, so this complexity cannot be fully appreciated.

Unsurprisingly, the main obstacle were overlapping hierarchies. I’ll describe one example – this will assume some understanding of XML and XHTML+CSS. Let’s consider p. 67 (as numbered by Austen) of the Juvenilia, Volume the Second.

Each line is encoded in TEI as being followed by a line break (<lb/>). You can see that occasionally there are interlinear insertions, for example:

To achieve this, the line is wrapped in an XHTML <span> element with a CSS display: block; property. This is essential to be able to pile up the interlinear text.

TEI’s <lb/> element, like XHTML’s <br/> element, is designed to be not intrusive so that it doesn’t break other hierarchies (for example a pragraph). In this case, however, I have been forced to make an empty element into a full one, thus needing to “invert” quite a few hierarchies.

For example, scrolling down on the same page, there is this line:

The quill icon indicates a “change of hand” (TEI <handShift/>), that is when a new scribe starts writing on the page (in this case Henry Austen; hover on the quill on Austen’s site to see a tooltip).

These changes of hand can happen anywhere in the text, at any level. Therefore, also these are recorded with an empty element (a “milestone” in TEI jargon). For this project, we also decide to represent the colour of the medium used to write. For pencil, for example, the text will be grey. To do this in XHTML+CSS, I need to know from where till where the colour must be applied. Now, what happens when I both need to expand <lb/> elements and <handShift/> elements to deal with this? It gets tricky.

Eventually I decided to store information about hands at the lowest level possible in the tree: around text nodes. This is the result for the example above:

<span class="block">
    <span class="ha">Meſs</span> <sup><span class="ha">rs</span></sup>
    <span class="ha"> Demand &amp; Co — please to pay Jane Austen </span>
    <span class="ha">Spinster</span>
    <span class="erased_only"> <span class="ha">and</span> </span>
    <span class="ha"> the sum of one hundred guineas on account </span>
</span>

The <span> elements with class=”ha” mark the text nodes written by Henry Austen. This is arguably messy XHTML, though it’s perfectly valid and allowed me to deal with all sorts of overlapping problems that I had to face in this project.

For the rest of the afternoon, before I go to a seminar at 6pm (more on this later), I will be working on another TEI project. Ancient Inscriptions of the Northern Black Sea (IOSPE). This is an epigraphy project led in DDH by Dr. Gabriel Bodard and uses a flavour of TEI called Epidoc. More on this later (hopefully I’ll have the time to put something together before rushing to the seminar! Busy, busy).

Examples in this post from:

  • Jane Austen’s Fiction Manuscripts: A Digital Edition, edited by Kathryn Sutherland (2010). Available at http://www.janeausten.ac.uk. ISBN: 978-0-9565793-1-7

Leave a reply

You must be logged in to post a comment.