One way in is through data visualizations. Because RECIRC is a collaborative project, there’s more in the database than any one of us could remember. And since we’ll eventually be sharing our data more widely, we’re starting to think about interesting and informative ways to help people get started. In my post today, I’ll share some of the things I’ve tried out lately. They’re not very polished yet, and I certainly wouldn’t quote the data at this point, but I think they’re a pretty good starting point.
Visualizing the reception and circulation of women’s writing in miscellanies
Let’s say I want to know about the reception and circulation of women’s writing in manuscript miscellanies. Specifically, I might ask, “What are all of the transcriptions of works by women that we’ve found in miscellanies?” But our database isn’t AskJeeves — at least not yet.
In fact, we’re still working with NUIG’s digital projects manager, David Kelly (@davkell), who designed the database, to develop our search interface. For now, I’m using Sequel Pro, a relational database management system, to work with a copy of everything in our live web database. So the question I’ve asked above has to be reworded a bit and made into a query in Structured Query Language (SQL):
But it does the trick! I get a handy list of all of the transcriptions, including the reference for the miscellanies they appear in, the titles of the works, and the various ID numbers assigned to each piece of information in the database.
I can’t do much with it here in Sequel Pro, but once it’s exported as a .csv file, I have more options. I could just print it and mark it up by hand, or I could import it into Excel where I could sort it, create a PivotTable, or make a graph.
But we’re trying to think a bit more creatively, right? I wondered what might happen if I tried to visualize the relationships between miscellanies and the texts they contain as networks. After splitting the .csv file I exported from Sequel Pro into two, I was able to import it into Gephi and create this.
Pretty cool, huh? But now what? Well, I might notice that Bodleian MS Add. A. 119 seems to contain more transcriptions than any other reception source and write up a case study about that manuscript. Of course, we’re supposed to be creating a large-scale quantitative analysis of the reception and circulation of women’s writing, so a case study doesn’t really get us too far. Besides, what really interests me about this particular graph is that Gephi’s built-in modularity algorithm seems to have clustered several manuscripts together that are linked by one or more texts. (See the nodes in the lovely seafoam green?) So I might dig a little deeper to see what those manuscripts and the texts they contain have in common.
In this way, data visualizations are not necessarily the end product of research (though they can be). They can also be a way to formulate new questions about the evidence you’ve already collected and to make connections you might not notice otherwise.
I’ve never met a graph I didn’t like
Okay, that subheading might be an overstatement. But the point is that network graphs are better at representing some kinds of data than others, and there’s a whole range of cool things you can do and make when you have this much data. The @infobeautiful Twitter feed, the Data Visualisation Catalogue, and Edward Tufte’s books and Twitter feed (@EdwardTufte) are all inspiring (if addictive!). In the slideshow below, you’ll find some of the things I’ve tried out. Some are kind of silly and some are boring things you might have seen before, but I think a few of them might be worth developing further. These aren’t all based on the same SQL query, but the steps I took are pretty similar.
The problem was a @#$%&! missing hash. I’ve wasted [redacted] hours because of a missing hash, and I still don’t even know why I need it. 😖
— Erin A. McCarthy (@erinannmcc) June 2, 2017
I’ll be working on some new and improved visualizations to share in time for dataAche, the 2017 International Conference on Digital Research in the Humanities and Arts (DRHA) in Plymouth. Stay tuned!
In the meantime, I have some questions for you, our readers. What questions would you ask of this data? What kinds of visualizations would you be interested in seeing? How might you use visualizations of RECIRC data in your own research or teaching?
P.S. Did you know that there’s a term for using a string of characters to represent swearing typographically? It’s called a grawlix. Now I can be sure that you have learned at least one new thing.