SC Colloquium: "Small Data: Ancient Greek Text Mining and the Quest for Method in Digital Humanities"
Allen J. Romano
Coordinator, MA Major in Digital Humanities
Program in Interdisciplinary Humanities
Florida State University
Scholars of Ancient Greek and Latin were among the first to deploy computers to study humanities content, beginning in the 1940s and extending to today. Classics (where these ancient languages tend to be studied) is a discipline with a long history of quantitative analysis, beginning in antiquity and amplified especially in the 19th century; as such, digital computing was a logical extension of such activities. While this remains true for many areas of "digital classics" (as a subclass of the so-called "digital humanities" more generally), machine learning, by contrast, does not sit so easily in the Classicists' toolkit. Text mining and machine learning more generally remain areas still under-explored and ripe for further research. In this presentation, I will report on some experiments using text mining to study ancient Greek tragedy and Homeric poetry. The initial research focuses on a series of questions about how to identify and understand different kinds of speakers in these two types of Ancient Greek poetry. The first tests approach the corpus in light of familiar socio-linguistic classification tasks. What distinguishes speakers in terms of gender, status, age, or ethnicity? The second stage tests genre-based classification, for example, by comparing across the highly formalized genres of tragedy and Homeric poetry. Finally, to return to the questions that are most of interest to humanists, what can the results of classification then tell us about the texts themselves? Can the markers of text mining effectively classify speakers or genres in unknown pieces of ancient literature? Can such classification at the corpus level inform the "close-reading" which philology traditionally seeks? Though the material is specifically that of Ancient Greek literature, this particular example of "small data" is presented both for its own methods and results and for the pressing issues facing humanists who seek to use quantitative methods with their data but who may do so with idiosyncratically humanistic lenses and assumptions.