EScriptorium

{{Short description|OCR software}}

{{Lowercase title}}

{{Infobox software

| name = eScriptorium

| logo = Logo_escriptorium.png

| logo alt = Logo of eScriptorium (feather with eS on yellow background)

| screenshot = EScriptorium v0.13.8 start page (detail).png

| description =

| maintainer =

| released = {{Start date and age|2018}}

| management =

| latest release version = {{wikidata|property|preferred|references|single|Q111218645|P348|P548=Q2804309}}

| latest release date = {{wikidata|qualifier|preferred|single|Q111218645|P348|P548=Q2804309|P577}}

| latest preview version = {{wikidata|property|preferred|references|single|Q111218645|P348|P548=Q51930650}}

| latest preview date = {{wikidata|qualifier|preferred|single|Q111218645|P348|P548=Q51930650|P577}}

| operating system = platform independent

| programming language =

| category =

| license =

| repo =

| website =

}}

eScriptorium is a platform for manual or automated segmentation and text recognition of historical manuscripts and prints.

Details

File:EScriptorium Journal of a Voyage on Board the Resolution 1772-1774 Vol. 1.png's diary Journal of a Voyage on Board the Resolution 1772-1774 Vol. 1]]

The software is an open source software developed at the Paris Sciences et Lettres University as part of the projects Scripta{{Cite web|url=https://www.psl.eu/en/scripta|title=Scripta-PSL. History and practices of writing|language=en|access-date=2022-03-13}} and RESILIENCE{{Cite web|url=https://www.resilience-ri.eu/|title=RESILIENCE - The Religious Studies Research Infrastructure|language=en|access-date=2022-03-13}} with contributions from other institutions, partly funded by the EU's Horizon 2020 funding program and a grant from the Andrew W. Mellon Foundation.

Scanned pages from manuscripts and prints can be imported into eScriptorium and exported as text in various formats (text, ALTO or PAGE XML, TEI). The text areas with text lines in the images are first recognized manually or automatically (segmentation). The text lines are then transcribed manually or automatically.{{Cite web|url=https://escriptorium.readthedocs.io/en/latest/|title=eScriptorium Documentation|access-date=2024-01-21}}

Both automatic segmentation and text recognition can be trained using manually created or corrected examples (ground truth). The new models created in this way can be shared with others and can therefore be easily reused.{{Cite web |url=https://escriptorium.readthedocs.io/en/latest/export/ |title=Export data - eScriptorium Documentation|access-date=2024-01-21}}

eScriptorium is built on top of the free OCR software Kraken by Benjamin Kiessling, a derivative of the OCR software OCRopus, which is suitable for handwritten and printed texts and also supports scripts such as Hebrew and Arabic, which are written from right to left.{{Cite web|url=https://github.com/mittagessen/kraken/|title=lunch/kraken: OCR engine for all the languages|language=en|access-date=2022-03-13}}

Comparable programs that offer similar functions to eScriptorium are OCR4All{{Cite web |url=https://fortext.net/tools/tools/ocr4all |title=OCR4all {{!}} forTEXT |access-date=2023-06-20}} and Transkribus.

Individual references