Abstract
Annotated datasets of handwriting are a prerequisite to attempt a variety of problems such as building recognizers, developing writer identification algorithms, etc. However, the annotation of large datasets is a tedious and expensive process, especially at the character or stroke level. In this paper we propose a novel, automated method for annotation at the character level, given a parallel corpus of online handwritten data and the corresponding text. The method employs a model-based handwriting synthesis unit to map the two corpora to the same space and the annotation is propagated to the word level and then to the individual characters using elastic matching. The initial results of annotation are used to improve the handwriting synthesis model for the user under consideration, which in turn refines the annotation. The method can take care of errors in the handwriting such as spurious and missing strokes or characters. The output is stored in the UPXInkML format