The CATMuS initiative: building large and diverse corpora for handwritten text recognition

DHAI Seminar 2023-2024
Tuesday 14 May 2024 Tuesday 14 May 2024
From 10 to 12 AM
ENS-PSL, salle de conférences du Centre Sciences des Données / online

45 rue d'Ulm
75005 Paris

48.8418371, 2.3440403

Session of the DHAI Seminar with Thibault Clérice (INRIA) and Malametenia Vlachou-Efstathiou (IRHT/IMAGINE-ENPC)

The CATMuS (Consistent Approaches to Transcribing ManuScripts) initiative is a set of datasets and guidelines meant for training large and generalizing HTR models. In this presentation, we set out to present the issues behind handwritten text recognition of historical documents over a long time and many languages, the choices we faced and how we addressed them. We'll present the resulting dataset for the Middle Ages, the first one to be published out of the CATMuS Initiative, and will present initial results with some models.

Tuesday 14 May 2024