The Text Encoding Initiative: Allowing Preservation and Access to our Textual Heritage through Digital Means

Scott Paul McGinnis
January 22, 2013
Image of the T E I logo.

By developing and maintaining an encoding standard for the digitization of text, the Text Encoding Initiative (TEI) is helping to deliver on the the internet's promise to democratize access to the world’s cultural—in this case, textual—heritage.

Viewed from this author’s lay-person’s perspective, libraries, museums, archives, and other custodians of the world’s textual heritage exist to serve at least two crucial functions when it comes to their rare collections: access and preservation. But there is a tension between the two, because institutions often must limit the access to their special collections in order to preserve them. Each time a researcher visits the Bancroft Library at the University of California, Berkeley, to view the Tebtunis Papyri collection, for example, the millenniums-old papyrus pith is susceptible to damage. What’s more, the Folger Shakespeare Library in Washington, D.C. doesn’t want kids on field trips handling a 1604 edition of Hamlet. And no adolescent could match the perniciousness of well-informed thieves such as Farhad Hakimzadeh, the businessman and antiquarian who from 2003 to 2009 used a scalpel to extract pages from some 150 rare books at the British Library and Oxford's Bodleian Library (see this BBC article for more). But even in the face of these dangers, preservation must not occlude access. After all, one of the goals of the American Library Association is “to ensure access to information by all” (ALA “Mission and History”).

Digital technology promises to relieve this tension by granting access to a very wide audience with only minimal risk to the materials. So the Bancroft Library works with the Advanced Papyrological Information System (APIS) project to make their papyri available; the Folger Library makes its collections available and offers virtual tours through its website; and the Fihrist Islamic Manuscripts Catalogue Online offers access to materials (minus, perhaps, a few pages) from the British Library, the Bodleian, and several other UK institutions. All of these projects use TEI standards.

What is TEI? The TEI is an institution: a consortium of universities, libraries, archives, and others dedicated to the development of an encoding standard for the preservation, description, and publication of digital editions. As such, it holds an annual meeting, maintains discussions and working groups, and, most importantly, publishes and maintains guidelines for the encoding of text, also called TEI. Confused? Put another way, the TEI is both the name of an organization and the name of an encoding language produced by that organization. In the 1980s, incompatible systems for encoding and representing texts were multiplying, a situation which “was inhibiting the development of the full potential of computers to support humanistic inquiry” (TEI “Origins”). In other words, under such circumstances, computers could not increase access in the way that people had hoped. Also, this proliferation of standards aggravated the problem of preservation: what would happen when a standard became obsolete? Who would ensure the long-term preservation of these new documents in the years, decades, and even centuries to come? These important questions were not being answered. So, in 1987, an international group of researchers seeking to remedy these problems created the foundations of the Text Encoding Initiative. A first draft of its guidelines were published in 1990, and the TEI Consortium was formed in 1999 (ibid).

The point of a standardization like the TEI language is to allow for inter-operability among projects, and between projects and users, which should increase access to the encoded texts. And the TEI Consortium exists to make sure that the guidelines continue to be developed and maintained into the future, thus helping to preserve the new, digital editions of these important human creations. This important work will help the custodians of human culture better achieve the dual aims of preservation and access.

Nevertheless, I would be remiss if I did not end this piece with a note of caution. As we continue through this transition to the digital, we must never accept the misguided notion that a digital edition is a perfect and adequate substitute for the material artifact it describes. It is not. The human and scholarly value of the physical carriers of our textual heritage cannot be overestimated. This is well known by those who work most closely with them, including the members of the TEI, but might too easily be forgotten (or ignored) by those who control their purses. It is important that those who provide the funding for the conservation of our textual heritage—governments, institutions, and individual donors—not be tricked by seeing Shakespeare on the web into thinking that the folios themselves are no longer necessary, lest these irreplaceable relics of the human past fade into The Nothing before our very eyes. The TEI exists to prevent such loss and should never be used as an excuse to be negligent in our responsibilities to preserve the material objects of the human past.


Scott Paul McGinnis is a Graduate Student Researcher at the Townsend Center for the Humanities.