Project Description

VUTT (Early Modern Text Workspace) is a web application for viewing and editing transcriptions of historical documents. The application displays scanned document images and LLM-recognized text side-by-side, allowing for text correction and annotation.

The project's goal is to make available and searchable texts from the Early Modern University of Tartu printing house (disputations, orations, etc.).

Corpus

The current corpus includes publications from the Academy of Tartu (Academia Gustaviana / Academia Gustavo-Carolina) from the years 1632–1710. The texts originate from the digitized collections of the UT Library and are largely based on copies of publications collected during the compilation of Ene-Lille Jaanson, comp., Druckerei der Universität Dorpat 1632-1710: Geschichte und Bibliographie der Druckschriften (Tartu University Library, 2000).

Recognition Model

The recognition model is page-based. It uses the Qwen3-VL-8B model, which has been fine-tuned on 1500 pages of transcribed text originating from various databases (including inspiration from data at zenodo.org/records/15764161):

The texts have been transcribed both manually and using the Transkribus and eScriptorium environments. Transcription authors and reviewers (in alphabetical order):

Numbers

Technology

VUTT is developed at UT Library. The logo was suggested by Rahel Toomik and comes from the decorative frame of the title page of the sermon Een kort och enfaldigh Lijkpredikan ... Dorpt: J. Vogel, 1642.

Contact

For questions and suggestions, please contact: Meelis Friedenthal

License

Software: GitHub: meelisf/VUTT (MIT License)
Texts and images: according to UT Library terms of use