Poetry in Silico: Bringing digital tools to the study of poetry

Poetry in silico cover image

ONCE UPON A MIDNIGHT DREARY, an English professor at Princeton sat in her office, musing over many volumes of forgotten lore about the right way to read a poem.

There were handbooks, essays, letters from one poet to another, and even newspaper articles dedicated to arguments over what rhythms should be used, which syllables should be stressed, and where the reader should pause — all elements of prosody, the study of poetic form.

Meredith Martin, an associate professor and expert on English poetry of the 19th and 20th centuries, had assembled the sources to explore how the thinking about these “rules” for reading poems had changed during the Victorian and early Modernist periods. These rules, found in versification manuals and grammar schoolbooks of the period, sometimes appeared as markings on the poem itself — typically accents on stressed syllables, little u-shaped marks called breves atop non-stressed syllables, and vertical lines to indicate pauses.

The problem Martin faced was how to search across her assembled sources. Although many of the works she had collected already had been digitized by initiatives such as Google Books, others were scattered across databases, and most important, were unsearchable. Letters with prosodic marks are not recognized by typical computer search techniques.

Enlisting the help of computer scientists and librarians, Martin began in 2011 to build the Princeton Prosody Archive, a full-text searchable database of more than 10,000 digitized records published between 1750 and 1923. Currently in beta-testing, the Prosody Archive will be accessible to the public in 2015, with full access to the archive by 2017.

Meredith Martin

Photo by Denise Applewhite

“We are making these texts available in one place for the first time,” Martin said, “and enabling scholars to explore new analytical questions in the study of poetry.”

Bringing searchable books, maps and other historical texts online is a growing movement in the humanities at universities around the world, including Princeton. This fall, Princeton opened the Center for Digital Humanities, directed by Martin, to enable faculty and student researchers to harness the power of computing for research activities that once were only possible during laborious visits to musty archives and libraries.

A number of motivations drive the growing trend in “digital humanities,” defined broadly as the intersection of technology and the humanities. A generation raised on Google has come to expect instant online access to everything. Research budgets for physical trips to far-flung libraries have shrunk. But perhaps the biggest driver of the digital humanities movement is the potential to search widely, deeply and in new ways. From a desktop computer, a researcher can scan large numbers of tomes and search for trends using statistical tools, or select one for a close reading.

“The field of digital humanities has evolved rapidly and there are a lot of different opinions about what the term means,” said Clifford Wulfman, digital initiatives coordinator at the Princeton University Library. “I think of it as the field where the humanities and the more algorithmic and mathematical approaches meet, intersect and intermingle, and sometimes produce practical outcomes like tools that someone can use, but also give rise to new questions and deeper understanding.”

What is meter?In the case of Martin’s research, the Prosody Archive helps her explore how a seemingly objective system for reading a poem can embody issues of national identity, class and patriotism. In her book, The Rise and Fall of Meter: Poetry and English National Culture 1860-1930 (Princeton University Press, 2012), Martin describes how 19th-century English scholars fixated on finding a distinctly English meter — a rhythmic pattern created by stressed and unstressed syllables — as part of a Victorian-era glorification of English culture.

Martin became interested in meter during college. “I was fascinated by the idea that the way in which you read a poem could completely change your relationship to the poetic text,” Martin said.

The Rise and Fall of Meter

The Princeton Prosody Archive began as a quest by English Professor Meredith Martin to bring order and search capability to a large collection of books and manuscripts that she had assembled when writing her book, The Rise and Fall of Meter (Princeton University Press, 2012)

In graduate school at the University of Michigan and then at Princeton, Martin explored how the study of meter had evolved in English poetry. What she found surprised her. Rather than discovering that poems were read in standard and unchanging ways, she found that the “right way to read a poem” changed over time and was the subject of contentious debate. “I found that prosody is incredibly culturally determined,” Martin said. “It has a lot to do with the reader and with his or her sense of language, and the relationship to his or her country.”

The Prosody Archive will allow Martin to make available to other scholars the sources that she used for her book. It also will enable the exploration of new questions, such as why certain poets emerged as the exemplars of their eras and how the emerging science of linguistics influenced debates about meter.

Housed at Princeton, the Prosody Archive’s computer architecture was developed by Travis Brown of the University of Maryland Institute for Technology in the Humanities, a leading digital humanities center. Support for the archive was provided by the Andrew W. Mellon Foundation. Eventually the Prosody Archive will include newspapers and journal articles in addition to printed books, which will be scanned at Princeton. An agreement with Google Books and the nonprofit book repository HathiTrust allowed Princeton to access many of the digitized books in the collection.

Assembling digitized texts and digitizing those that are not in HathiTrust is only the first step, however. When books and papers are scanned, many of the nuances that make physical books so rich become lost, from the dog-eared corners that indicate a wellloved passage to scribbles of inspiration in the margins. “These flourishes are not captured by today’s optical character recognition software, so they don’t show up in the digitized texts,” said Meagan Wilson, the Prosody Archive’s manager and a graduate student in English. To address this issue, the Princeton Prosody Archive will include page images along with each text.

Prosodic marks are also lost. Most prosodic texts use a notation known as scansion — a word that derives from the fact that the reader scans rather than reads the poem, as shown in the opening line of Edgar Allan Poe’s The Raven:

scansion

The digitized version contains the words but not the scansion marks — the accents and breves, for example. Other prosodic marks are even more difficult to capture. Periodically, musical notation is used as a method of scansion:

musical notation

Optical character recognition software doesn’t understand these symbols, and returns:

undescipherable text

For now, the Prosody Archive team will annotate the entries by hand, said Ben Johnston, a founding member of the digital humanities initiative and manager of Princeton’s Humanities Resource Center, which develops technology resources for teaching and research. “We have to go through the entries and indicate the notation — for example a musical note or an accent mark.”

Over the next three years, the team hopes to develop a computer model for encoding scansion as well as tools for natural language-processing techniques, said Martin. One such technique is topic modeling, which yields statistical analyses of word usage and could be employed in looking for prosody-related information. The addition of data visualization software will make the collection more useful to researchers.

As the archive develops, Martin is excited about its prospects, not only to broaden the study of poetics at Princeton and beyond, but also to make possible new ways to study digital texts. The opening of the new Center for Digital Humanities will enable students and faculty members to ponder — more quickly and in greater depth — over many a quaint and curious volume of forgotten lore.

-By Catherine Zandonella