Enhanced Scholarly Publications… and Wrap-Up

March 27, 2012 in Conference, Evening, Home

I went to a seminar part of the Centre for e-Research seminar series, then cycled back home, had some dinner put on comfy pants and ready to relax after a long DH day. Here are a few notes about the seminar.

Enhanced Publications in the Social Sciences and Humanities: tensions, opportunities and problems - Andrea Scharnhorst, Nick Jankowski, Clifford Tatum, Sally Wyatt, Royal Netherlands Academy of Arts and Sciences, Netherlands (abstract)

Each of the three speakers gave a short talk. They ranged from a general discussion of enhanced publication to an actual project they’ve been working on. Enhanced publication typically involves the inclusion of collateral information around a central text, such as research data, link and hooks to and from other resources (if web-based, this was presented as implying Linked Data / Semantic Web).

The project presented is an enhanced publication of four books that features a WordPress-based structure with Linked Data features, extra content (multimedia), author pages with growing bibliography, etc. Quite fascinating really and this brief summary doesn’t do it justice. More information is available on the project website.

I am interested in the topic of enhanced publication, but mainly for editions of pre-existing text rather than new text. It’s a subtle but important diffrentiation. So, what I gathered from the talk was mainly a list of interesting resources to look at and consider more carefully in the future. An extract:

And that’s all folks. It’s been a good experience writing summaries of what I did during the day and let my fingers run through the keyboard with some quick reflections. I’m looking forward to next year DayofDH, but the computer goes off now.

IOSPE, Solr and Kiln

March 27, 2012 in Afternoon, Office, Programming, Project Work

So I’ve been working for the past three hours on Ancient Inscriptions of the Northern Black Sea (IOSPE). The project will eventually publish editions of inscriptions (mainly in Greek) found around the Northern Black Sea region, here’s a map. The project is led in DDH by Dr. Gabriel Bodard.

The inscriptions are being encoded in Epidoc, a flavour of TEI for epigraphists and papyrologists. I’ve had a few opportunities to work with the people behind Epidoc and they are a great group to work with, with great work and academic ethics. Their projects are always fully open source and available to the whole community (see Papyri.info for a remarkable example of work from this community).

Today, I’ve been working on putting together indices for IOSPE. This is part of development work that has been going on for a few working days already (remember that I only work 12hrs a week).

There are indices of fragmentary words, different categories of people, symbols, abbreviations, numerals (this one is particularly grinding my gears), etc.

Most of the algorithms for obtaining these indices were already written in xQuery (partly by Zaneta, a former colleague, partly by me) for another project. In IOSPE, though, we are using Apache Solr to index fields from the TEI files, so I need to break down the xQuery into Solr fields. The display XSLT, as a consequence, also needs to be substantially changed as most of the grouping and sorting, etc. is now done with much faster Solr queries. Believe it or not, this is quite fun to do :)

The web application of the project is built with Kiln, an open source Cocoon-based web publishing framework developed here at DDH mainly by Miguel Vieira and Jamie Norrish. The framework comes integrated with TEI support, Solr and Sesame. I quite like this newest version of Kiln and I’ve also started using it for a prototype webapp for the digital edition of Der Freischütz that I’m working on for my PhD. I managed to integrate my XSLTs for MEI in seconds and built the website from it. More fun!

Off to the seminar (for which I’ll be late): ”Enhanced Publications in the Social Sciences and Humanities: tensions, opportunities and problems” by Andrea Scharnhorst, Nick Jankowski, Clifford Tatum, and Sally Wyatt (Royal Netherlands Academy of Arts and Sciences, Netherlands).

Possibly I’ll write a quick post on it later – then it’s time for either the pub, or going home.

Jane Austen (a TEI project)

March 27, 2012 in Afternoon, Meeting, Office

I’ve just ended a meeting with another PhD student who agreed to help us out with the Jane Austen’s Fiction Manuscripts project. The project was completed last year, but we maintain the site and occasionally correct errors when they show up. I introduced the student to what the workflow used to be and pointed him to the right resources to understand how the project works. He has advanced understanding of TEI and XSLT, so it was a relatively quick meeting.

The project presents high resolution images and diplomatic transcriptions of the manuscripts (including the private one that has been recently sold for $1.6M – you can read the transcription and look at the facsmile on the site).

The transcriptions are encoded in TEI, after an encoding model prepared by Dr. Elena Pierazzo. The model is structured to record a large amount of details about the manuscript text, including spelling and punctuation subtleties, pasted patches and authorial revisions (this preliminary work likely contributed to the making of the new TEI Genetic Module). Many of the encoding problems (mainly: where to stop?) are discussed in

  • K. Sutherland, E. Pierazzo. ‘The author’s hand: from page to screen’. Collaborative Research in the Digital Humanities. M. Deegan and W. McCarty (eds.) Ashgate: Aldershot, 2011.

My work on the project focused on writing a group of XSLT transformation for generating the HTML view. Because of the complexity of the encoding, this was harder than it sounds, with the XSLT code reaching about 8000 lines in total. Unfortunately, to my utter disappointment, the TEI for this project is not exposed or accessible on the website, so this complexity cannot be fully appreciated.

Unsurprisingly, the main obstacle were overlapping hierarchies. I’ll describe one example – this will assume some understanding of XML and XHTML+CSS. Let’s consider p. 67 (as numbered by Austen) of the Juvenilia, Volume the Second.

Each line is encoded in TEI as being followed by a line break (<lb/>). You can see that occasionally there are interlinear insertions, for example:

To achieve this, the line is wrapped in an XHTML <span> element with a CSS display: block; property. This is essential to be able to pile up the interlinear text.

TEI’s <lb/> element, like XHTML’s <br/> element, is designed to be not intrusive so that it doesn’t break other hierarchies (for example a pragraph). In this case, however, I have been forced to make an empty element into a full one, thus needing to “invert” quite a few hierarchies.

For example, scrolling down on the same page, there is this line:

The quill icon indicates a “change of hand” (TEI <handShift/>), that is when a new scribe starts writing on the page (in this case Henry Austen; hover on the quill on Austen’s site to see a tooltip).

These changes of hand can happen anywhere in the text, at any level. Therefore, also these are recorded with an empty element (a “milestone” in TEI jargon). For this project, we also decide to represent the colour of the medium used to write. For pencil, for example, the text will be grey. To do this in XHTML+CSS, I need to know from where till where the colour must be applied. Now, what happens when I both need to expand <lb/> elements and <handShift/> elements to deal with this? It gets tricky.

Eventually I decided to store information about hands at the lowest level possible in the tree: around text nodes. This is the result for the example above:

<span class="block">
    <span class="ha">Meſs</span> <sup><span class="ha">rs</span></sup>
    <span class="ha"> Demand &amp; Co — please to pay Jane Austen </span>
    <span class="ha">Spinster</span>
    <span class="erased_only"> <span class="ha">and</span> </span>
    <span class="ha"> the sum of one hundred guineas on account </span>
</span>

The <span> elements with class=”ha” mark the text nodes written by Henry Austen. This is arguably messy XHTML, though it’s perfectly valid and allowed me to deal with all sorts of overlapping problems that I had to face in this project.

For the rest of the afternoon, before I go to a seminar at 6pm (more on this later), I will be working on another TEI project. Ancient Inscriptions of the Northern Black Sea (IOSPE). This is an epigraphy project led in DDH by Dr. Gabriel Bodard and uses a flavour of TEI called Epidoc. More on this later (hopefully I’ll have the time to put something together before rushing to the seminar! Busy, busy).

Examples in this post from:

  • Jane Austen’s Fiction Manuscripts: A Digital Edition, edited by Kathryn Sutherland (2010). Available at http://www.janeausten.ac.uk. ISBN: 978-0-9565793-1-7

Teaching Programming in DH

March 27, 2012 in Class, Morning, Programming, Reflecting, Teaching

The students have started their assessed exercise! This leaves me some time to write an update.

I’ve been a Teaching Assistant in the Tools and Resources course for the past 10 weeks (and this is the third year that I’ve helped running the course). The course’s tutor is John Bradley, Senior Lecturer at DDH.

The course focuses on introducing our MA Digital Humanities students to programming and it’s one of the two core courses of the MA programme. We teach rudiments of Python mainly with two aims:

  1. introduce the students to what happens behind the scenes of the tools they use, so to improve their understanding of the Digital Humanities as a discipline;
  2. encourage the students to apply their newly-developed programming skills to the area of Digital Humanities they’re interested in.

Particularly to address the latest point, we also briefly introduce some Python modules such as the Python Image Library and the Natural Language Toolkit.

Now, the fact that this is a core module clearly shows that King’s MA in Digital Humanities puts forward programming as a key skill, even if only at a basic level. Other courses in the MA also teach programming, for example Advanced Text Technologies. However, it’s still debated whether programming is a required skill in the Digital Humanities and a large number of our students continue their professional and/or academic career without programming again.

In my opinion, some level of exposure to programming is essential for our students and to anyone involved in the Digital Humanities. This is different, mind, from being someone who programs every day or being a fully-skilled programmer. I think that the key point of the discussion around programming in DH lies in the uncertainty in the definition of the field. Many think that Digital Humanists “do” and “make” and “build” (see for example Stephen Ramsay’s opinion post), while others focus on the impact of technology on our cultural environment, the arts, our heritage, etc. The first two approaches thrive together and are not exclusive, though people will tend to do more of one thing than the other. While the “makers” will necessarily need programming skills, the analytical work of the others will still benefit from a good understanding of the skills involved in the technologies whose impact they evaluate.

Here are some web readings on the topic. There are many more a Google search away, these are the ones that I either refer to often, or came across recently.

With writing the post and occasionally checking on the students (John is also in the computer room with me), time has been flying. Will post again in mid-afternoon, after a meeting about the Jane Austen’s Fiction Manuscript project (now completed, but occasionally undergoing maintenance of the rather complex XML markup and diplomatic display – more on the topic later).

Getting started

March 27, 2012 in Biography, Home, Location, Morning, Time

The day starts nice and sunny in London! I will soon be cycling East to West across London to get to King’s College London Strand campus, but before here’s a bit more information about who I am and how am I a Digital Humanist.

About me

I am enrolled in the second year of the Digital Humanities PhD under the supervision of Dr. Elena Pierazzo at the Department of Digital Humanities (DDH) and Prof. Roger Parker in the Music department.

My research focuses on the production and publication of digital scholarly editions of music. The thesis builds around its casestudy: a critical edition of Carl Maria von Weber’s opera Der Freischütz, for which I am collaborating with the Weber-Gesamtausgabe at the University of Paderborn (Germany). I am now writing the case study chapter and preparing an encoding model for the opera using the Music Encoding Initiative (MEI) XML format.

I grew more and more interested on the impact of digital scores on music performance, a topic on which I will focus in the final phase of my PhD. I occasionally blog some of my thoughts on my blog “It is not sound” or on the department’s blog “DH Work In Progress“.

Today

For 12 hours a week, however, I work as a Research Assistant at DDH. I mainly work as a Teaching Assistant during term, or I do project work mostly in XSLT and Python. Occasionally I also get involved in the initial phases of analysis for some projects. Today is a “work day” so I will be busy with teaching and programming rather than PhD research.

This is the last week of term and today we’re having an in-class assessed exercise for the Tools and Resources course at 11am. During the course we have introduced the students to Python programming. My next post will explain more what the course is about and address the hot topic of learning how to program in the Digital Humanities.

Hello world!

March 23, 2012 in Uncategorized

Welcome to Day of DH 2012 Sites. We recommend that you test out the software with an About Me post, either by editing this post or deleting it and writing a new one. Enjoy your Day of DH!