GRAND DigHum has six starting sub-projects.
BigLit: Big Data and the Literature of the Humanities
The humanities have pushed the boundaries of the scale and types of data that can be handled by computing since Father Busa’s Index Thomisticus project challenged IBM in the 1950s to imagine the possibilities of “language engineering”. More recently, projects like Google Books and the Internet Archive are making large-scale literary, historical and linguistic corpora available to Humanists. The challenge lies in imagining the new types of questions we can ask of corpora so large we couldn’t possibly read them closely. Franco Moretti has proposed “distant reading” as an alternative to “close reading”. Distant reading steps back from individual texts, and asks diachronic questions of the history of literature. In so doing, it establishes general methods for the pattern analysis and visualization of very complex textual data. We propose a sub-project that will bring together Canadian and other researchers looking into the questions and techniques that can be applied to big data in the literary humanities.
BigViz: Visualization in the Humanities
Text visualization is not new, but only now is visualization gaining acceptance in the humanities. However, most humanities scholars still feel uncomfortable presenting arguments in visual form. How can we support visual argumentation and exploration for content experts? BigViz will develop research prototypes of visualization tools of two sorts:
- Visual argumentation tools that allow processes that map onto research arguments to be represented visually. The idea is to develop ways of showing complex data analysis processes so that humanists don’t need to program to be able to read and manipulate processes;
- Unstructured data visualization tools that allow humanists to explore large-scale cultural databases, be they collections of text, image or both. The idea is to develop scholarly inquiry tools that fit in the research practices of the humanities.
This sub-project will build on a set of high-quality text databases for use in experiments based on use cases. It will develop prototypes of question formation and exploration tools and test them with content communities. It will also adapt visualization techniques to work with our experimental text databases and tool prototypes.
TroFish: Troll Fishing
Troll Fishing will examine patterns in the viciously misogynistic troll behaviour that assaults game-related initiatives that support women. This sub-project will apply data gathering and analysis techniques to the discourse around games. This big data project will adapt scraping and spidering tools to gather a corpus of discourse around gender and games. This will allow us to examine all troll behaviour (comments, blogs, etc.) to find commonalities and patterns to discern who is making these comments and how such activities are organized.
A longer-term objective of TroFish is to conceptualize and implement a prototype analysis engine and visualization interface that would enable researchers, gamers and other stakeholders to track relevant shifts in discourse in real-time. Imagine a dashboard that tracks multiple online forums and that provides synchronous feedback on misogynistic language – this could be accompanied by sonor indicators that would alert users to unusual spikes such that the dashboard could run in background mode. Triggered by the response of one of the forum sentinels, a community of users could coordinate through social media to respond to inappropriate and abusive language.
ScholEd: Distributed and Digital Scholarly Editing
The humanities is in the process of redefinition: with every passing moment, previously unimaginable amounts of cultural materials are added to the cultural record. The Distributed and Digital Scholarly Editing (ScholEd) sub-project will explore the challenges of effectively using these Big Data for producing new, relevant scholarly work. Operating within the framework of the GRAND Digital Humanities (DigHum)Project, ScholEd will build on demonstrated Canadian successes in addressing issues of knowledge representation in digital environments. Indeed, the Canadian digital humanities research community have organized a number of successful projects centred on understanding scholarly editing, communication, and production in the face of the digital turn.
DigiCultH: Engaging with Digital Cultural Heritage Objects
Scholarly projects in the humanities commonly involve the public in the transcription of historical materials (Transcribe Bentham) and the OCR cleaning of scanned documents (eMOP). They have made much less progress in the area of material culture. This sub-project will study the potential for scholarly and public engagement with a full spectrum of digitized cultural heritage objects. It will investigate the skills and knowledge Digital Humanities scholars can bring to the study of material culture. It will also investigate what Digital Humanities scholars can learn from the often very public dissemination practices of professionals working in museums and other public-facing institutions.
InfraDH: Infrastructure for the Digital Humanities
The infrastructure of the humanities has traditionally been the library and the books on its shelves. Now that humanities scholarship is increasingly making use of digital resources (textual and multi-media) and tools, and the methods being used are more reliant upon communications and collaboration, digital infrastructure is also required. Such infrastructure poses significant challenges since, like much else in the digital environment, there are not yet established models and best practices for digital humanities infrastructure. This sub-project works towards evaluating models and establishing best practices by forming a network of researchers who are engaged in building such infrastructure, innovating due to the rapidly changing technical environment, and reflecting on its implications. The questions emerging from this sub-project thus emerge from the intersection of research and infrastructure, and although they relate most immediately to humanities research, they have implications for a wide range of receptor communities with which we will seek relationships during the proposal development stage. The challenges are at least as cultural or social as they are technological, involving the intersections among user literacies, interface design and affordances, and technological capacity.