EMAN, publishing of manuscripts and digital archives

Richard WALTER (Research/ CNRS)

Published on

14 July 2021

, updated on

17 July 2021

EMAN is a digital publishing tool for the dissemination and exploitation of modern manuscripts and archives.

The choice was made not to start from an ex-nihilo and ad-hoc creation, which would have meant heavy developments and a hazardous maintenance. We have chosen to base ourselves on the open software Omeka, a tool that is very much appreciated in the humanities community for the valorization of corpus or scientific collections. It meets the requirements of data durability and interoperability, necessary for the exploitation and valorization of corpora and archives.

On our EMAN platform, based on Omeka, the editorial device and the navigation interface have been the object of a certain number of specific developments in order to adapt to documentary and scientific practices concerning manuscripts and modern archival collections: through this type of project, we are building a digital publishing model.

Émile Zola, L'Assommoir. Manuscrit autographe (BnF)

This model is disseminated to all the projects of the platform. They adapt it according to the specificities of their corpus and/or the purposes of their studies. The template is composed of a specific theme and an EMAN plugin, which gather all the modifications and additions we made from the core of the Omeka software - which has not been modified. From there, further developments in the form of plugins are released to the Omeka community through a GitHub page.

The editorial model is composed of a structuring of the data in collections and sub-collections, with descriptions in Dublin Core and in custom metadata types, tools for virtual exhibitions and for transcription in TEI format, tools for importing and exporting in CSV and XML of all or part of the corpus. This model can evolve according to the decisions of the steering committee of the platform, it is worked on during workshops and study days. All adaptations and manipulations are stored in a collaborative space and a research notebook disseminates the editorial and scientific activity produced from the platform.

To date, the platform includes nearly fifty projects, ranging from genetic or non-genetic "microediting" of a very small corpus to the mass publication of a wide variety of documents, with or without transcription. The periods covered range from the Renaissance to the 21st century, many languages are present in the digitized and published documents, and the typology of media is also very broad, from manuscripts to video documents.

The platform is governed by a steering committee with one representative per project who participates de facto in the construction of the platform. The current projects concern a/ the development of a new version of our transcription tool for distribution within the community and the production of compatible TEI files; b/ the visualization of relations between documents in the form of a graph; c/ multilingual editing with corpora in several languages.

Team

Richard Walter, lagoratoire Thalim (direction)

Steering committee of the platform with 30 representatives of the different projects hosted by the platform & an office of 7 participants (Céline Bohnert, Emmanuelle Bousquet, Charlotte Dessaint, Marie Dupond, Camille Koskas, Jean-Sebastien Macke, Anne Reach-Ngö)

See the carnet de recherche.

Computer Science

Litterature

Digital corpora

Langue

Recherche

EMAN, publishing of manuscripts and digital archives

Team