VUTT (Early Modern Text Workspace) is a web application for viewing and editing transcriptions of historical documents. The application displays scanned document images and LLM-recognized text side-by-side, allowing for text correction and annotation.
The current corpus includes publications from the Academy of Tartu (Academia Gustaviana / Academia Gustavo-Carolina) from the years 1632–1710. The texts originate from the digitized collections of the UT Library and are largely based on copies of publications collected during the compilation of Ene-Lille Jaanson, comp., Druckerei der Universität Dorpat 1632-1710: Geschichte und Bibliographie der Druckschriften (Tartu University Library, 2000).
The recognition model is page-based. It uses the Qwen3-VL-8B model, which has been fine-tuned on 1500 pages of transcribed text originating from various databases (including inspiration from data at zenodo.org/records/15764161):
The texts have been transcribed both manually and using the Transkribus and eScriptorium environments. Transcription authors and reviewers (in alphabetical order):
VUTT is developed at UT Library. The logo was suggested by Rahel Toomik and comes from the decorative frame of the title page of the sermon Een kort och enfaldigh Lijkpredikan ... Dorpt: J. Vogel, 1642.
For questions and suggestions, please contact: Meelis Friedenthal