Modern technology has taken up the challenge of reading old scripts, the domain of palaeography. One of the best known tools, Transkribus, is currently used in a project on legal resources held at the university archives in Greifswald. The project Rechtsprechung im Ostseeraum. Digitization & Handwritten Text Recognition focuses on sources dealing with Germany’s legal history in the region on the borders of the Baltic Sea. The project aims at making accessible 102,000 pages of legal instructions of the Faculty of Law of the Universität Greifswald (Spruchakten der Greifswalder Juristenfakultät, 1580-1800), 130,000 pages of opinions of the judges at the Wismar Tribunal (Relationen der Assessoren am Wismarer Tribunal, 1746-1845) as well as 25,000 pages of opinions of the judges of the Wismar Council Court (Relationen des Wismarer Ratsgericht) (1701-1879). Users will get access to images of these sources and they will be able to perform text searches in this corpus. The Transkribus tool is being trained to recognize Early Modern handwriting of very different scribes. Does it succeed indeed in creating reliable transcriptions? What efforts are necessary to make such sources ready for computerized approaches?
Scribal varieties and the use of computers
At various European universities and archives teams use the Transkribus tool of the READ (Recognition and Enrichment of Archival Documents) project and even a special portable scanning tent for projects with many thousand pages in Early Modern or medieval scripts. Combined with a very active presence on Twitter it can sometimes almost seem Transkribus is virtually the only proven tool in this field. Until now the number of projects with the Transkribus tool for documents specifically dealing with legal history is small. The recent announcement of the project at Greifswald at the Transkribus blog offers an opportunity to see the tool at last at work for legal historians.
At the bilingual project website in Greifswald it becomes quickly clear in the sources overview that you can find currently only images of four registers of Spruchakten from Greifswald shown at the Digitale Bibliothek Mecklenburg-Vorpommern. The initial choice for only four registers was made as a “training set” with a view to the Transkribus tool which has to digest letter forms and writing patterns in order to become a functional reading tool. The registers contain documents from 1586, 1603, 1607 and 1643. The Universitätsarchiv Greifswald has digitized several series, among them matriculation registers and charters, but the Spruchakten are not mentioned in this overview. On the other hand the university archive and library are currently present with the largest collections in the Digitale Bibliothek Mecklenburg-Vorpommern. The other institutional partners in this project are the Universitätsbibliothek Greifswald, the Stadtarchiv Wismar and the Landesarchiv Mecklenburg-Vorpommern in Schwerin. The Stadtarchiv Wismar has a web page about the creation of finding aids for the records of the Wismarer Tribunal in its holdings and also those in other archives, with some references to relevant literature.
One of the reasons to use digital tools for studying these legal materials is their nature. The series of legal instructions and verdicts are organized in chronological order and only indexed for the names of claimants and defendants. The sheer working power in dealing with a massive set of (textual) data can make a huge difference for starting at all with a project concerning documents linked with a particular legal court in some or all of its dimensions.
Using the Transkribus tool
For using the Transkribus tool you need to create a free account. You need to download the tool. There is a succinct user guide (PDF) and an extensive online guide in the Wiki format. The tool is the core of a set of accompanying websites and cloud services. OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) are both possible. You will need to contact the team at the Universität Innsbruck for starting the “training” of the tool, the process of recognizing and correctly deciphering various forms of writing. Among the most interesting results with this tool is the high percentage of correctly resolved texts in Early Modern Dutch archival records. The “model” succeeded not only in reading just one kind of script, but dealt equally successful with several kinds of handwriting. Depending on the number of words fed into the machine the character error rate (CER) can reach very low levels. A recent post at Rechtsprechung im Ostseeraum discusses the difference between word error rate and CER.
On Tuesday October 29, 2019 Annemiek Romein (Universiteit Gent) and Jeroen Vandommele gave a webinar at the Dutch Royal Library about using Transkribus. Provided you can follow Dutch, viewing this webinar gives you a very useful introduction to the practical use of this transcription tool, albeit with a focus on optical character recognition for dealing with printed texts, in particular collections of ordinances and the resolutions of the Staten-Generaal. I was in particular impressed by the way you can zoom in on and select text blocks. Aspects such as the costs of using Transkribus and surely the most asked question, its final reading speed, currently one page within a minute, come also into view.
As for now the project in Greifswald brings only a set of legal instructions by the law faculty of Greifswald. These gain in importance when sets from the other two resources, the opinions of the Wismar Ratsgericht and the Tribunal are added. It will be most interesting to see whether the opinions of the law professors deal with cases heard at one of the two legal courts. Combining them with the verdicts themselves is a logical sequel. I had hoped to report here more about the ins and outs of this project, but on the other hand it is a realistic example of work in progress, not a finalized and fully dressed product.
Despite carefully looking at the project website I could not readily detect the entrance to transcribed records, but I did reach a password protected page. You must forgive me my predilection for websites with site maps and clear navigation! However, the project team gives a very good description of the various stages of preparations needed for the workflow of their project. The team is right in approaching these stages as separate but intertwined projects which all need due attention. In the blog posts at the project website a lot of subjects have been touched upon, and this steadily stream will hopefully continue in coming years. It is certainly useful to get acquainted with this and other tools, to look at its procedures and terminology in order to carefully consider the chances and risks of using such tools.
It seems wise to look in more detail at the Transkribus website and its subdomains. On the main website the overview of pages for the Transkribus tool is essential. The transcription tool itself is hosted at a subdomain. Perhaps surprisingly there is also a page about the palaeography module offered by Transkribus at another subdomain, Transkribus LEARN. Here you can find hundreds of script examples. It is understandable Transkribus focuses at its transcription tool, but this palaeographical resource deserves to be known by anyone wanting to learn reading old scripts. This way of learning by doing it yourself has to be distinguished from the “learning” of the “model”, the process by which the transcription tool digests information about scripts from a set of documents for automatic deciphering. As an extra you might want to visit Famous Hands, a site with documents showing the handwriting of famous European persons. It is a bit amusing to see how Transkribus LEARN and Famous Hands can seem almost hidden from direct view, but Transkribus LEARN is duly listed at the services page. Here, too, a sitemap would be helpful.
The datasets of Transkribus have been put at the Zenodo platform with the title ScriptNet – READ. The fleet of deliverables, the newspeak term for finished products from a project, are listed at a separate page of the main website. Components such as the transcription tool, the portable ScanTent which works with Transkribus’ own DocScan app, the link to Famous hands, the GitHub repository of Transkribus and also the several components of the tools developed by various European teams can be found at this page. The so-called Transkribus KWS interface for keyword spotting brings you to a project for Finnish court records from 1810 to 1870 held at the Kansallisarkisto, the National Archives of Finland (interface Finnish and English), yet another subject touching upon legal history.
At the end of this brief presentation of the Transkribus tool and its current uses for legal history it is fair to mention at least concisely some other available tools, following no particular order. Transcripto is a tool with a German and English interface created at the Universität Trier. Looking at Scripto I thought for some time it might also be a transciption tool like Transkribus, but it is a transcription interface created by the Roy Rosenzweig Center for History and New Media for crowdsourcing projects which can be integrated with several CMS systems. The Università Roma Tre works on the project In Codice Ratio with the aim of automatic text recognition and transcription, in particular for the holdings of the Vatican archives. The French Himanis project has at its core a tool for text recognition used for indexing the text of 68,000 charters and documents in the Trésor des chartes of the Archives nationales in Paris.
TranScriptorium was the earlier incarnation of the READ project. Among the five datasets at the old project website are transcriptions of verdicts given by the German Reichsgericht between 1900 and 1914, a project led by Jan Thiessen (Universität Tübingen). This set of documents in the Kurrent script has been transferred to the document sets of Transkribus; you can access it after free registration. Christian Reul (Würzburg) has created OCR4ALL, a tool for dealing with OCR scanning of historical printed editions. It turns out it is fairly easy to find transcription platforms with various levels of image and transcription integration. In some cases there are even distinct layers for guiding and moderating crowdsourcing projects, but finding a tool for electronic recognition and transcription of historical handwriting and old printed works remains a challenge which certainly deserves a separate contribution.
Within a few days Elisabeth Heigl of the project team at Greifswald kindly sent a comment with the good news of a very useful overview in English for searching and browsing the documents in the Digitale Bibliothek Mecklenburg-Vorpommern. With the search function you will see the result of the HTR done by Transkribus.
For all those curious about Transkribus and wanting to start using you might have a look for example at these blog posts elsewhere, ‘Digitize a Collection of Letters using Transkribus and XSLT‘ at the blog How to of the Austrian Centre for Digital Humanities, ‘How to historical text recognition: A Transkribus Quickstart Guide‘ at LaTex Ninja’ing and the Digital Humanities, and Issue 13: OCR (July 2019) of Europeana Tech.