The CATMuS initiative: building large and diverse corpora for handwritten text recognition

Séminaire DHAI 2023-2024
Mardi 14 mai 2024 Mardi 14 mai 2024
De 10h à 12h
ENS-PSL, salle de conférences du Centre Sciences des Données / en ligne

45 rue d'Ulm
75005 Paris

48.8418371, 2.3440403

Séance du séminaire DHAI avec Thibault Clérice (INRIA) et Malametenia Vlachou-Efstathiou (IRHT/IMAGINE-ENPC)

The CATMuS (Consistent Approaches to Transcribing ManuScripts) initiative is a set of datasets and guidelines meant for training large and generalizing HTR models. In this presentation, we set out to present the issues behind handwritten text recognition of historical documents over a long time and many languages, the choices we faced and how we addressed them. We'll present the resulting dataset for the Middle Ages, the first one to be published out of the CATMuS Initiative, and will present initial results with some models.

Mardi 14 mai 2024