Breadcrumb
- Home /
- Research /
- Research projects /
- Paris Time Machine consortium, working group on addresses and directories
The Paris Time Machine consortium is working on building geo-historical reference data. It is made up of several working groups including that of addresses and directories which aims to collect, list, use and visualize the addresses of Parisian directories.
The addresses and directories working group of the Paris Time Machine consortium is particularly interested in the Directory of owners and properties of Paris and the department of the Seine, an annual publication active between 1894 and 1937, listing Parisian addresses and their owners (see fig. 1 ). The collection of the French National Library has thirty-eight units. So far, there were no scans available.
The working group aims at gathering, digitizing, transcribing, structuring, publishing, spatializing, and analyzing this source with an important spatio-temporal dimension, to better represent a specific moment in the geography of the city of Paris. To do this, it is necessary to implement a processing pipeline making it possible to obtain, from a digitized document, a document in a machine-readable format more suitable for quantitative processing.
Using the Transkribus platform, it was possible to automatically produce the transcriptions of four volumes of the directories (1898, 1903, 1913, 1923) thanks to two models of neural networks (Handwritten text recognition , HTR +) allowing the recognition of printed characters: the first to process the first three editions, the second to process the last volume whose font is very different from the first volumes. The performance of the models in terms of Character Error Rate (CER) is promising, below 1%.
In addition, a geocoding test was made using approximately 12k Parisian addresses, properties and homes, listed in a 149-page sample (1898 edition) using the historical geocoder of the GeohistoricalData team of the EHESS and the national geocoding system of the France’s National Address Database (BAN), both accessible via a REST API. We have seen the importance of the wording of street names. For example, in the case of GeohistoricalData, the addresses of properties which retrieve the official street names are in 92.96% of cases geocoded (locations whose spatial accuracy remains to be verified). This figure is reduced to 78.94% in the case of Parisian domiciles: these addresses are often written in abbreviated form by the editors of the directory, which makes identification difficult by the geocoder. The remaining 13.91% could not be geocoded. In fig 2., it is possible to observe the spatial coverage from today's track wireframe, the locations therefore remain approximate.
Processing this type of historical source is important for historians specialized in the social and economic history of cities, and in particular of Paris. Several processing operations will be implemented in order to openly disseminate this data in several formats (tables, GIS, ALTO) in order to allow further analysis.
Several avenues for improvement remain to be explored, in particular the automatic segmentation of images, the automatic recognition of addresses and names of persons, and also the automatic geocoding of addresses from geo-historical data of the Parisian street network over the period concerned by the four volumes. In general, the issue of automation and the sequencing of processing inevitably has an impact on the results obtained and therefore on the analyzes, for this it is necessary to communicate effectively on the errors in the data resulting from these processing operations.
Group members, addresses and directories
- Frédérique Mélanie-Becquet (LATTICE)
- Gabriela Elgarrista (Plateforme Géomatique EHESS)
- Carmen Brando (CRH UMR 8558/ Plateforme Géomatique EHESS)
- Eric Mermet (CNRS/Plateforme Géomatique EHESS)
- Alix Chagué (Inria-ALMAnaCH)
- Mohamed Khemakhem (Inria-ALMAnaCH)
- Laurent Romary (Inria-ALMAnaCH)
- Jean-Luc Pinol (ENS Lyon)