Building Multilingual BookNLP

De 10h à 12h
The next seminar by David Bamman will take place on Wednesday, June 21 from 10am to 12pm at the Ecole normale supérieure (rue d’Ulm, Paris).  Due to security reasons, pre-registration is required. The session will be broadcast on Zoom (if conditions allow) for participants who cannot make it to the venue. A link will be emailed on the day of the event.

Abstract :

BookNLP (Bamman et al. 2014) is a natural language processing pipeline for reasoning about the linguistic structure of text in books, specifically designed for works of fiction.  In addition to its pipeline of part-of-speech tagging, named entity recognition, and coreference resolution, BookNLP identifies the characters in a literary text, and represents them through the actions they participate in, the objects they possess, their attributes, and dialogue.  The availability of this tool has driven much work in the computational humanities, especially surrounding character (Underwood et al. 2018; Kraicer and Piper 2018; Cheng 2020).  At the same time, however, BookNLP has had one major limitation: it currently only supports texts written in English.  In this talk, I will describe our efforts to expand BookNLP to support literature in languages beyond English, and create a blueprint for others to develop it for additional languages in the future.

The talk will be followed with a Q/A session on BookNLP

Bio :

David Bamman is an associate professor in the School of Information at UC Berkeley, where he works in the areas of natural language processing and cultural analytics, applying NLP and machine learning to empirical questions in the humanities and social sciences. His research focuses on improving the performance of NLP for underserved domains like literature (including LitBank and BookNLP) and exploring the affordances of empirical methods for the study of literature and culture. Before Berkeley, he received his PhD in the School of Computer Science at Carnegie Mellon University and was a senior researcher at the Perseus Project of Tufts University. Bamman’s work is supported by the National Endowment for the Humanities, National Science Foundation, the Mellon Foundation and an NSF CAREER award.

Funded by Translitterae :

