Paris Time Machine consortium, working group on addresses and directories

By
Carmen BRANDO (Research/ EHESS)
Frederique MELANIE (Research/ CNRS)
, updated on
17 July 2021
Image
Plan de Truschet et Hoyau, vers 1552
Share

The Paris Time Machine consortium is working on building geo-historical reference data. It is made up of several working groups including that of addresses and directories which aims to collect, list, use and visualize the addresses of Parisian directories.

The addresses and directories working group of the Paris Time Machine consortium is particularly interested in the Directory of owners and properties of Paris and the department of the Seine, an annual publication active between 1894 and 1937, listing Parisian addresses and their owners (see fig. 1 ). The collection of the French National Library has thirty-eight units. So far, there were no scans available.

The working group aims at gathering, digitizing, transcribing, structuring, publishing, spatializing, and analyzing this source with an important spatio-temporal dimension, to better represent a specific moment in the geography of the city of Paris. To do this, it is necessary to implement a processing pipeline making it possible to obtain, from a digitized document, a document in a machine-readable format more suitable for quantitative processing.

Image
Plan de Truschet et Hoyau, vers 1552
Image
fig1
Fig 1. A page from the Annuaire des propriétaires et des propriétés et listes alphabétiques de l’année 1898 (copyright BNF)

Using the Transkribus platform, it was possible to automatically produce the transcriptions of four volumes of the directories (1898, 1903, 1913, 1923) thanks to two models of neural networks (Handwritten text recognition , HTR +) allowing the recognition of printed characters: the first to process the first three editions, the second to process the last volume whose font is very different from the first volumes. The performance of the models in terms of Character Error Rate (CER) is promising, below 1%.

In addition, a geocoding test was made using approximately 12k Parisian addresses, properties and homes, listed in a 149-page sample (1898 edition) using the historical geocoder of the GeohistoricalData team of the EHESS and the national geocoding system of the France’s National Address Database (BAN), both accessible via a REST API. We have seen the importance of the wording of street names. For example, in the case of GeohistoricalData, the addresses of properties which retrieve the official street names are in 92.96% of cases geocoded (locations whose spatial accuracy remains to be verified). This figure is reduced to 78.94% in the case of Parisian domiciles: these addresses are often written in abbreviated form by the editors of the directory, which makes identification difficult by the geocoder. The remaining 13.91% could not be geocoded. In fig 2., it is possible to observe the spatial coverage from today's track wireframe, the locations therefore remain approximate.

Image
fig2
Fig. 2. Spatial coverage (wrt to today’s street network). 149 pages of 1898’s directory, coordinates come from BAN. (Carte réalisée avec QGIS, données PTM, BAN, BNF)

Processing this type of historical source is important for historians specialized in the social and economic history of cities, and in particular of Paris. Several processing operations will be implemented in order to openly disseminate this data in several formats (tables, GIS, ALTO) in order to allow further analysis.

Several avenues for improvement remain to be explored, in particular the automatic segmentation of images, the automatic recognition of addresses and names of persons, and also the automatic geocoding of addresses from geo-historical data of the Parisian street network over the period concerned by the four volumes. In general, the issue of automation and the sequencing of processing inevitably has an impact on the results obtained and therefore on the analyzes, for this it is necessary to communicate effectively on the errors in the data resulting from these processing operations.

 

Group members, addresses and directories

  • Frédérique Mélanie-Becquet (LATTICE)
  • Gabriela Elgarrista (Plateforme Géomatique EHESS)
  • Carmen Brando (CRH UMR 8558/ Plateforme Géomatique EHESS)
  • Eric Mermet (CNRS/Plateforme Géomatique EHESS)
  • Alix Chagué (Inria-ALMAnaCH)
  • Mohamed Khemakhem (Inria-ALMAnaCH)
  • Laurent Romary (Inria-ALMAnaCH)
  • Jean-Luc Pinol (ENS Lyon)

Institutions

  • Lattice (UMR8094, CNRS & ENS/PSL & Université Sorbonne nouvelle)
  • EHESS CRH UMR 8558 / Plateforme Géomatique EHESS
  • INRIA - Equipe ALMAnaCH
  • TGIR Huma-num CNRS

Publications

  • Gabriela Elgarrista, Frédérique Mélanie-Becquet, Carmen Brando. Annuaires de propriétaires de Paris : Vers une analyse socio-économique et spatiale de la population parisienne en 1898. Assises de l’AP en Humanités numériques spatialisées du GDR MAGIS, le 23 juin 2020. 
  • Gabriela Elgarrista, Frédérique Mélanie-Becquet, Carmen Brando, Mohamed Khemakhem, Laurent Romary, Jean-Luc Pinol, Pipeline to process and analyze Paris’s old property address directories (XIXe - XXe), Poster CLARIN Bazaar, CLARIN Conference 2020, le 7 octobre 2020.
  • Mohamed Khemakhem, Carmen Brando, Laurent Romary, Frédérique Mélanie-Becquet, Jean-Luc Pinol. Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories. JADH2018 "Leveraging Open Data", Sep 2018, Tokyo, Japan. Hal-01814189