Bronagh and I recently participated in the Moore Institute’s “Collections as Data Hackathon,” organized by David Kelly. The event brought together nineteen humanities researchers, software developers, and designers. We had two days to create innovative, collaborative projects using digital archives and datasets, and every team took a different, exciting approach. I believe David will also be blogging about the event on the Moore Institute’s blog, so hopefully you’ll be able to read about the full range of projects soon.
Part of the challenge of the event — and what, in the end, made it so exciting — was that we did not start with preassigned projects or teams. Instead, we were asked to present our ideas to the entire group and see what emerged. I had a vague sense that I wanted to work with Folger Digital Texts, a site featuring well-edited, encoded texts of all of Shakespeare’s works. Orlagh O’Brien, a professional graphic designer, wondered if we might be able to visualize the emotions in the plays. This was an exciting idea, since so much interesting work is being done on emotion and affect in the early modern period, but I wasn’t sure how we’d manage it in two days. Thankfully, John McCrae, a lecturer at NUIG’s Insight Centre for Data Analytics, and Omnia Zayed, a PhD student at Insight, had been working with the Mixed Emotions Project and were able to help us use the project’s online “Toolbox” to extract and analyze emotions.
Extracting and analyzing emotions from the Folger Digital Texts
Our study of emotions was based on sixteen plays—four comedies, four histories, four tragedies, and four romances. As the lone Shakespearean on the team, I tried to create some balance between early and late plays, but I’ll admit that some of my choices were based on plays I liked, plays I teach often, or plays that I thought might produce interesting results. Using the Folger Digital Texts API, we downloaded full texts to run through the MixedEmotions Text Emotion Analysis tool. This tool tested for the presence of each of the six emotions identified by Paul Ekman: anger, disgust, enjoyment, fear, sadness, and surprise. We also measured the dimensions of valence (happiness/sadness), arousal (excitement/boredom), and dominance (control/dismissiveness) in each scene.
We quickly found that because the tool only told us whether or not a given emotion was present in a scene, the results weren’t that interesting. Moreover, because the tool was trained on Twitter, it didn’t always handle Shakespeare’s English as well as we might have liked. I was shocked — shocked! — that it did not identify sadness in the first scene of The Merchant of Venice.
It turned out that the tool’s lexicon identified “sadness” and “sadly” (and “saddle”) as sad words, but it did not include “sad.” (Very biased. Sad!)
The valence, arousal, and dominance measures were more revealing, though, and we decided to create a series of visualizations inspired by Edward Tufte’s small multiple sparklines.
You can zoom in on each in the slideshow below, but I think they’re most striking when they’re arranged side-by-side. Orlagh cleverly represented valence with a thick blue-to-orange gradient line. Because dominance tended to track valence, it appears as a fainter white line. Arousal recedes into the background because it was the least interesting measure in our study, probably because Shakespeare rarely lets his plays get too boring.
I was pleased to find that this quantitative analysis and the resulting visualizations were consistent with my sense of the plays. For instance, the first act of Othello reads like a miniature comedy, and the seemingly happy ending of Act I is readily identifiable in our VAD graph.
These results are, of course, preliminary, but they’re suggestive. A similar approach, with a tool trained on early modern texts, could be powerful. We experimented with some other approaches, including measuring the metaphoricity of language related to animals and the presence of each emotion at the level of genre, but these did not have the same immediate, visual impact as the VAD analysis. (We did find that disgust was more prevalent in romance than other genres, but that kind of makes intuitive sense.)
Because of the limitations I’ve described above, and because this is an entirely new approach for me, I wouldn’t want to stand over these results in print. But the context of the hackathon gave me the freedom to try something out and see what I could learn without feeling compelled to produce something final. I could see using similar tools to help students find new ways into early modern texts.
Of course, our project was only possible because each of us brought different skills and expertise to the project. John and Omnia were able to write scripts to accomplish quickly what would have taken me days in Excel. Orlagh made graphs that communicated more effectively and were more visually striking than anything anyone else could produce. I talked about Shakespeare more than I have since I started my postdoc, and I think it was mostly helpful.
At the same time, collaborating revealed some interesting quirks of our individual disciplines. For example, when John introduced the concept of “inexplicability,” or the idea that some machine learning algorithms work in ways that even their creators can’t understand, I wondered whether I could extend this to my own research.
While I probably won’t pursue this particular project any further, I’m glad I had the opportunity to work on it for a few days. And I’ll certainly look forward to participating in a similar event next time I have the chance!