Building Multilingual BookNLP

Wednesday 21 June 2023 Wednesday 21 June 2023
De 10h à 12h
forgotten books
ENS, Salle de conférence du centre de science des données

45 rue d'Ulm
75005 Paris

The next seminar by David Bamman will take place on Wednesday, June 21 from 10am to 12pm at the Ecole normale supérieure (rue d’Ulm, Paris).  Due to security reasons, pre-registration is required. The session will be broadcast on Zoom (if conditions allow) for participants who cannot make it to the venue. A link will be emailed on the day of the event.

Abstract :

BookNLP (Bamman et al. 2014) is a natural language processing pipeline for reasoning about the linguistic structure of text in books, specifically designed for works of fiction.  In addition to its pipeline of part-of-speech tagging, named entity recognition, and coreference resolution, BookNLP identifies the characters in a literary text, and represents them through the actions they participate in, the objects they possess, their attributes, and dialogue.  The availability of this tool has driven much work in the computational humanities, especially surrounding character (Underwood et al. 2018; Kraicer and Piper 2018; Cheng 2020).  At the same time, however, BookNLP has had one major limitation: it currently only supports texts written in English.  In this talk, I will describe our efforts to expand BookNLP to support literature in languages beyond English, and create a blueprint for others to develop it for additional languages in the future.

The talk will be followed with a Q/A session on BookNLP

Bio :

David Bamman is an associate professor in the School of Information at UC Berkeley, where he works in the areas of natural language processing and cultural analytics, applying NLP and machine learning to empirical questions in the humanities and social sciences. His research focuses on improving the performance of NLP for underserved domains like literature (including LitBank and BookNLP) and exploring the affordances of empirical methods for the study of literature and culture. Before Berkeley, he received his PhD in the School of Computer Science at Carnegie Mellon University and was a senior researcher at the Perseus Project of Tufts University. Bamman’s work is supported by the National Endowment for the Humanities, National Science Foundation, the Mellon Foundation and an NSF CAREER award.

Wednesday, 21 June 2023, 10am-12pm (Paris time)

Salle de conférence du centre de science des données
(3e étage, couloir entre l'escalier B et l'escalier C)
Ecole normale supérieure 
45 rue d'Ulm 75005 Paris

Pre-registration by Tuesday, 20 June at noon at is mandatory. If you have any trouble with the Google Form, you can also register by emailing the organizer.

En raison du protocole de sécurité en vigueur, une pré-inscription est obligatoire (voir en fin de mail). La session sera en principe retransmise sur Zoom (si les conditions le permettent) pour les participants ne pouvant se rendre dur place. Un lien sera envoyé par mail le jour de l’événement. 

Funded by Translitterae :

Wednesday 21 June 2023