Thursday, 19 April 2018

Talk to the Scholar (Book)

I have worked on more than fascinating projects this term (besides teaching and administrative duties), all of which may deserve a different post. We worked pretty much with more down than ups on re-establishing Digital Humanities MA programmes in Hungary. At the moment though I do not have a clue about the outcome of these efforts, the documents are in the ministry to be decided upon. I am working on the Hungarian Shakespeare Archive which fills me with joy, though sometimes I am not sure whether the time and energy I invest into this project are useful to anybody. I worked on the board of two Digital Humanities journals, a Hungarian one (Digitális Bölcsészet), and the other more international (Digital Scholar) and reviewed articles for both. Also, I had the opportunity to take part, teach four classes at the Text Analysis Across Disciplines Boot Camp at CEU. For the sake of advertising Digital Humanities in Hungary, I wrote up a longish Wikipedia entry about Digital Humanities. Furthermore, I am working on an online course focusing on Digital cultural memory to be finished by September the latest.

All these are projects that I just enjoy immensely, but all these would like to line up into a more ambitious project, i.e. improving academic life. This terribly, horribly, frighteningly ambitious project consists at the moment in two distinct subprojects. One of them is automating everything that is possible in a literary scholar's job, while the other is understanding and thus making education at an English department more meaningful. To put these more bluntly, I am lazy enough to let the machine do what it is better at than me, and easing my job in a way that I can tell my students why it is beneficial for them to attend my (or for that matter anybody's) classes.

When daydreaming about this ambitious project, I keep looking at the world what other people, teams are doing in this area. Keeping an eye on these is pretty easy with Twitter and RSS feed. The next book on my reading list, for example, is the most inspiring Cathy N. Davidson's new book The New Education: How to Revolutionize the University to Prepare Students for a World in Flux (New York: Basic Books, 2017), which I came across via my Feedly. The other finding is the Talk to Books project announced 5 days ago on the Google research blog (THX, Feedly again). And it is this project that I would like to write about now, as it nicely fits the scholarly aspect of the dream project.

Talk to Books may well give a hand in research if it manages to improve diligently, and there seems to be every chance for this. Talk to Books is a project within Google Books and it promises a semantic search engine. What Talk to Books does is rather fancy: You ask a question, the machine makes sense of the question and searches 100.000 books at the moment and tries to answer the question by leading the researcher to books wherein the answer lies, and highlights the sentence in the book which seems to answer the question. This seems to be similar to WolframAlpha insomuch as semantic search is concerned, and similar to Understanding Shakespeare, the Folger Shakespeare Library and JSTOR cooperation for both are to help scholars with gathering secondary sources. What differentiates the Talk to Books project from  WolframAlpha is that the latter provides information, while the former provides information that is documented. And Talk to Books is more sophisticated than the Folger and JSTOR collaboration to the extent that there is an element of a communicative situation in it. Of course, the communicative situation is in a way a fake one, the machine does not understand the question as a human being would, and the answers are sometimes completely off the track, but the method of faking communication works pretty well.

The model underlying Talk to books relies on Word vectors, a statistically trained model of relating meaning to strings of letters by analysing the context in which words occur. The model, in this case, is trained on the contexts provided by the natural language used and includes a highly complicated set of testing, curation of verbal contexts, filtering mistaken contexts (noise) and reducing the examples to relevant verbal contexts. The sets of code that is under the hood of Talk to Books is Google's machine learning toolkit, Tensorflow. More about this can be found at TensorFlow tutorials.

Now tasting is the ultimate test of this pudding, so let us see how Talk to Books works. I am writing a paper about spectatorship, so it might be interesting to check if Talk to Books may come in handy here. After some trials and specifications -- the user should also adapt to the abilities of the machine -- I came up with this question "what is spectatorship in a theatre?" After hitting the search icon approximately 15 books and quotations from these books and links to these books in Google Books showed up on the screen. Out of these books, seven were closely related to theatre studies and I found rather relevant quotations. Three of the books centred on spectatorship in the cinema, which is understandable as spectatorship studies are closely linked to the movie, and even these books referred to directly or indirectly to the theatre, so these would not be irrelevant either. The rest of the books referred to spectatorship in divers contexts, such as social research, folklore studies, discourse analysis, rhetoric.

The results of this simple search are telling on three accounts. First, the results seem to be rather relevant, so the word vector technology lying at the backend of deep learning technologies in general, and Tensorlfow, in particular, seems to be promising. Second, even the irrelevant hits may well prove beneficial, because they help one look out of the box, bump into scholarly findings semantically but not discipline-wise related to one's research. Thus, if I intend to be really generous, I should admit that the search engine facilitates interdisciplinary studies as well. Third, Talk to Books may well ease and help the scholar's tasks: it is easy to copy and paste the relevant quotation, one can check the context of the quotation via the link to the entire book, or rather to a page in Google Books, get hold of the bibliographical data of the book.

Although Talk to Books is promising I can see room for development in three steps. Talk to Books as it is now, does not have a specific target audience. Judging by these first impressions it seems to me that the target audience is the educated, English-speaking community of intellectuals. This wide user set is understandable from the perspective of the developers, since they need statistically relevant results to test the application. From the scholarly user's point of view though, the target audience should be the scholarly community, and thus the linguistic behaviour of the scholarly community should be more relevant for the textual corpus used for training Tensorflow. Second, again from the scholarly community's perspective harvesting metadata is still more laborious at the moment than it could be. If one intends to use, say, Zotero, one has to click many times, i.e. one has to go to the page in Google Books, find the "information about this book" link, search for the ISBN number, paste it into Zotero. Instead of these numerous clicks, one click would be better... Third, some filtering methodology would help the scholarly user on the one hand and a wider corpus including journals would come in handy, provided the application is to serve scholars. OK, I understand that Talk to BOOKS is about books, but scholars use journals as often as edited volumes or monographs.

In conclusion, I am just overwhelmed by the Talk to Books project. I am overwhelmed, because I can see my dream project, i.e. automating whatever can be automated in scholarly work, come true with this project, or at least one significant aspect of this. I am overwhelmed because I find in this project a promising use of deep learning technologies in ways that are already beneficial. And whatever misgivings I have concerning Google, there are amazing people in their ranks who can and do shape our digital futures.