Abstract
                                                                        We propose an approach to restore severely degraded  document images using a probabilistic context model. Unlike traditional approaches that use previously learned  prior models to restore an image, we are able to learn the  text model from the degraded document itself, making the  approach independent of script, font, style, etc. We model  the contextual relationship using an MRF. The ability to  work with larger patch sizes allows us to deal with severe  degradations including cuts, blobs, merges and vandalized documents. Our approach can also integrate document restoration and super-resolution into a single framework, thus directly generating high quality images from degraded documents. Experimental results show significant improvement in image quality on document images collected from various sources including magazines and books, and comprehensively demonstrate the robustness and adaptability of the approach. It works well with document collections such  as books, even with severe degradations, and hence is ideally suited for repositories such as digital libraries