Thursday, 12 April 2012

Digital Shakespeares: Features of a Database 4


This post is number five in the series of posts dealing with working out a possible methodology for assessing and accounting for databases containing Shakespearean texts. After an introductory post four other ones have been dedicated to listing and explaining, contextualizing questions that might come in handy when pondering about these databases. So far areas of basic facts, transparency and flexibility were covered in the first three posts, and now, as I have promised I am going to meditate and present questions pertaining to what I would like to term as “interdisciplinary openness.”

Most of the databases reduce texts to their linguistic aspect. Queries focus on words, strings of words, linguistic units, grammatical units and verbal statistics. They can also visualize tendencies, create diagrams in a variety of formats about the linguistic construction of the text. All this is fine, as most of the time when reading a Shakespearean play the reader will be interested in the ways a text communicates its layers of meaning through verbal means. There has been, however, a tendency in scholarly circles claiming in a great number of ways that a text does not only reveal layers of meaning via its linguistic construction but that meaning is also a social construct embedded in the material ways a text functions in the world.  So, scholars claim that bibliographical data from the date of publication to publisher, from the typeset to the type of paper, from decoration to page size play their part in the process of constituting meaning. Here, a long list of authors, theoretical and pragmatic may be presented from David Scott Kastan to John N. King, from Woudhuysen to McGann, from Shillingsburg to Hayles, from Marshall McLuhan to Andrew Murphy to mention a few authorities in the field. It is beneficial if a database allows for research other than ones pertaining to the linguistic aspect. The next three questions, thus, explore ways in which a database may cater for interests in aspects other than the linguistic one.


  1. Format of the digital text (txt, xml, jpg, tiff etc.)

Interdisciplinary research presupposes the complexity of possible questions to be asked, and this complexity can only be provided through presenting the texts in a variety of formats. Sometimes the best choice is to have a rather unmarked list of words, e.g. in a txt file, this is sufficient and even more fruitful for some queries, especially when it is not clear how the file is read by a text analysis tool. For another set of questions encoding is needed, say for tokenised or lemmatised queries, other times it is the best if there are images only that may be analyzed in ways unimaginable before. It is the format of the file that enables these differing approaches, so it is fine if the same text is accessible in a variety of formats.

  1. Is it the linguistic, digital or bibliographic aspect that is emphasized?
The linguistic aspect refers to the language, linguistic elements of the digital text. The bibliographical aspect refers to the material aspect, but in this very case, this does not define the digital text, as  digital, but as an outcome of the visual aspect of some original printed material. The digital aspect refers to the computational coding of a text that enables the visual aspect and also the searchable quality of these texts. It is clear that builders of databases have to decide on what they intend to achieve. Unfortunately there is no such database that would/could lay equal emphasis on every aspect of a digital text. Databases vary among paying special attention to the text as a linguistic unit, or to the text as a deeply encoded entity that allows for complex and intelligent queries, or to aspects that are relevant for the historian of the book.

  1. Which aspect of the text is open to queries?
If it is possible to present the text in a variety of formats, thus a variety of disciplinary approaches may be occasioned within the database. If this is so, it is also relevant which aspect of the text is open to queries, as it is a query that makes computer enabled research fruitful. It is the query that makes research faster and more accurate, so it is great if the image file is there that enables research related to the history of the book, but if this aspect of the text is not open to queries, computation is like a disabled giant: it is there but the scholar cannot make use of the power of computer technology. The Text Encoding Initiative enables marking up a text for queries about the visual aspect of a work, and there are even free image mark-up tools, so technologically it is not impossible to prepare a database in which the bibliographical code is open to queries.

* * *

This time, thus, we have seen the remaining three criteria for assessing a database. These questions covered practically an area that I have labeled as “interdisciplinary openness.” The interdisciplinarity of a database manifests itself in the variety of formats of the files, the types of queries that a user may conduct. Naturally, these criteria may or may not be true for each and every database and can only be used as a means of orientation. So neither these three criteria nor the other thirteen should be thought of as complete and compelling ones, but rather as means to be able to discuss critically a database or databases. What follows form this is that a positive assessment does not necessarily mean that one can give the highest possible scores for each and every criterion, as it can easily happen that a database can fruitfully be used even though reviewing it with the help of the above sixteen criteria should suggest that the database is less good. Assessment at its best relies on criteria relevant to the individual database. Having thus finished the meditation about the criteria of assessment, next time I shall start a new series of posts exploring databases one by one.

No comments:

Post a Comment