Automatic Single Page-based Algorithms for Medieval Manuscript Analysis

Ying Yang, Ruggero Pintus, Enrico Gobbetti, Holly Rushmeier
ACM Journal on Computing and Cultural Heritage, Volume 10, Number 2, page 9:1--9:22 - 2017
We propose three automatic algorithms for analyzing digitized medieval manuscripts: text block computation, text line segmentation and special component extraction, by taking advantage of previous clustering algorithms and a template matching technique. These three methods are completely automatic, so that no user intervention or input is required to make them work. Moreover, they are all per-page based; that is, unlike some prior methods which need a set of pages from the same manuscript for training purposes they are able to analyze a single page without requiring any additional pages for input, eliminating the need for training on additional pages with similar layout. We extensively evaluated the algorithms on 1771 images of pages of 6 different publicly available historical manuscripts, which differ significantly from each other in terms of layout structure, acquisition resolution, and writing style, etc. The experimental results indicate that they are able to achieve very satisfactory performance, i.e., the average precision and recall values obtained by the text block computation method can reach as high as 98% and 99%, respectively.

