Abstract
Ancient paper documents and palm leaf manuscripts from the Indian subcontinent have made a
significant contribution to the world literary and culture. These documents often have complex, uneven,
and irregular layouts. The process of digitization and deciphering the content from these documents
without human intervention pose difficulties in a broad range of areas, including language, script, layout,
elements, position, and number of manuscripts per image.
Large-scale annotated Indic manuscript image datasets are needed for this kind of research. In order
to meet this objective, we present Indiscapes, the first dataset containing multi-regional layout annotations for ancient Indian manuscripts. We also adapt a fully convolutional deep neural network architecture for fully automatic, instance-level spatial layout parsing of manuscript images in order to deal
with the challenges such as presence of dense, irregular layout elements, pictures, multiple documents
per image and the wide variety of scripts. Eventually, We demonstrate the effectiveness of proposed
architecture on images from the Indiscapes dataset.
Despite advancements, the segmentation of semantic layout using typical deep network methods is
not resistant to the complex deformations that are observed across semantic regions. This problem is
particularly evident in the domain of Indian palm-leaf manuscripts, which has limited resources. Therefore, we present Indiscapes2, a new expansive dataset of various Indic manuscripts with semantic layout
annotations, to help address the issue. Indiscapes2 is 150% larger than Indiscapes and contains materials
from four different historical collections. In addition, we propose a novel deep network called Palmira
for reliable, deformation-aware region segmentation in handwritten manuscripts. As a performance
metric, we additionally report a boundary-centric measure called Hausdorff distance and its variations.
Our tests show that Palmira offers reliable layouts and outperforms both strong baseline methods and
ablative versions. We also highlight our results on Arabic, South-East Asian and Hebrew historical
manuscripts to showcase the generalization capability of PALMIRA.
Even though we have reliable deep-network based approaches for comprehending manuscript layout,
these models implicitly assume one or two manuscripts per image during the process, whereas in a
real-world scenario there are often cases where multiple manuscripts are typically scanned together
into a scanned image to maximise scanner surface area and reduce manual labour. Now, making sure
that each individual manuscript within a scanned image can be isolated (segmented) on a per-instance
basis became the first essential step in understanding the content of a manuscript. Hence, there is a
need for a precursor system which extracts individual manuscripts before downstream processing. The highly curved and deformed boundaries of manuscripts, which frequently cause them to overlap with
each other, introduce another complexity when confronting issue. We introduce another new document
image dataset named IMMI (Indic Multi Manuscript Images) to address these issues. We also present
a method that generates synthetic images to augment sourced non-synthetic images in order to boost
the efficiency of the dataset and facilitate deep network training. Adapted versions of current document
instance segmentation frameworks are used in our experiments. The results demonstrate the efficacy
of the new frameworks for the task. Overall, our contributions enable robust extraction of individual
historical manuscript pages. This in turn, could potentially enable better performance on downstream
tasks such as region-level instance segmentation, optical character recognition and word-spotting in
historical Indic manuscripts at scale.