Redefining the cultural history of newspapers with artificial intelligence: the experiments of the Numapresse project

DHAI Seminar 2020-2021
Tuesday 15 December 2020 Tuesday 15 December 2020
12:00 to 02:00 PM

Conference by Pierre-Carl Langlais (Paris-IV Sorbonne)

During the last twenty years, libraries developed massive digitization program. While this shift has significantly enhanced the accessibility cultural digital archives, it has also opened up unprecedented research opportunities. Innovative projects have recently attempted to apply large scale quantitative methods borrowed from computer science to tackle ambitious historical issues. The Numapresse project proposes a new cultural history of French newspaper from 1800, notably through the distant reading of detailed digitization outputs from the French National Library and other partners. It has recently become a pilot project of the future data labs of the French National Library.

This presentation features a series of 'operationalization' of core concepts of the cultural history of the news in the context of a continuous methodological dialog with statistics, data science, and machine learning. Classic methods of text mining have been supplemented with spatial analysis of pages to deal with the complex and polyphonic editorial structures of newspapers in order to retrieve specific formats like signatures or news dispatch. The project has created a library of 'genre models' which made it possible to retrieve large collections of texts belong to leading newspaper genres in different historical settings. This approach has been extended to large collections of newspaper images through the retraining of deep learning models. The automated identification of text and image reprints also makes it possible to map the transforming ecosystem of French networks and its connection to other publication formats. The experimental work of Numapresse aims to foster a modeling ecosystem among research and library communities working on cultural heritage archives.

Tuesday 15 December 2020

DHAI Organizing Team

Fractal - Pixabay

DHAI Seminar 2020-2021

October 9, 2020 - June 8, 2021

When Digital Humanities and Artificial intelligence Meet. 

Organization : Ségolène Albouy, Mathieu Aubry, Jean-Baptiste Camps, Matthieu Husson, Béatrice Joyeux-Prunel, Gabriel Peyré, Thierry Poibeau and Léa Saint-Raymond