From the automatic transcription of medieval Hebrew manuscripts via scientific editing to the analysis of intertextuality: tools and praxis around eScriptorium

Conference by Daniel Stoekl (École Pratique des Hautes Études)

Following a brief introduction to our open-source HTR infrastructure eScriptorium cum kraken I will demonstrate its application to the automatic layout segmentation, handwritten textsegmentation and paleography of Hebrew manuscripts. Using its rich (but still growing) internal functionalities and API as well as a number of external tools (Decker et alii 2011, Shmidman et alii 2018 and my own), I will deal with automatic text identification, alignment and crowdsourcing (Kuflik et al 2019, Wecker et al 2019) and how these procedures can be used to create different types of generic models for segmentation and transcription. I will show first ideas for automatically passing from a document hierarchy resulting from HTR to a text oriented model with integrated interlinear and marginal additions that can be displayed in tools like TEI-Publisher. While the methods presented are generic and applicable to most languages and scripts, special attention will be given to problems evolving from dealing with non-Latin scripts, RTL and morphologically rich languages.

Bibliography :

  • Dekker, R. H., Middell, G.: Computer-Supported Collation with CollateX: Managing Textual Variance in an Environment with Varying Requirements. Supporting Digital Humanities 2011. University of Copenhagen, Denmark (2011).
  • Kuflik, T. M. Lavee, A. Ohali, V. Raziel-Kretzmer, U. Schor, A. Wecker, E. Lolli, P. Signoret, D. Stökl Ben Ezra (2019) 'Tikkoun Sofrim – Combining HTR and Crowdsourcing for Automated Transcription of Hebrew Medieval Manuscripts', DH2019.
  • Lapin, Hayim and Daniel Stökl Ben Ezra, eRabbinica
  • Meier, Wolfgang, Magdalena Turska, TEI Processing Model Toolbox: Power To The Editor. DH 2016: 936
  • Meier, Wolfgang, Turska, Magdalena, TEI-Publisher.
  • Shmidman, A., Koppel, M., Porat, E.: Identification of parallel passages across a large hebrew/aramaic corpus. Journal of Data Mining and Digital Humanities, 2018
  • Wecker, A. V. Raziel-Kretzmer, U. Schor, T. Kuflik, A. Ohali, D. Elovits, M. Lavee, P. Stevenson, D. Stökl Ben Ezra, (2019) 'Tikkoun Sofrim: A WebApp for Personalization and Adaptation of Crowdsourcing Transcriptions', UMAP’19 Adjunct (Larnaca. New York: ACM Press)
