Academia.eduAcademia.edu

Document Image Analysis

description1,141 papers
group10,638 followers
lightbulbAbout this topic
Document Image Analysis is a field of study focused on the extraction, interpretation, and processing of information from scanned or photographed documents. It encompasses techniques for text recognition, layout analysis, and feature extraction to facilitate the automated understanding and manipulation of document images.
lightbulbAbout this topic
Document Image Analysis is a field of study focused on the extraction, interpretation, and processing of information from scanned or photographed documents. It encompasses techniques for text recognition, layout analysis, and feature extraction to facilitate the automated understanding and manipulation of document images.

Key research themes

1. How can recognition-free word spotting techniques improve document image indexing and retrieval?

This research theme focuses on recognition-free document image retrieval methods, particularly word spotting, which bypass traditional OCR limitations in indexing and searching digitized documents. These methods explore how image-level features and matching can be leveraged to retrieve words without explicit transcription, addressing challenges such as handwriting variability, degraded image quality, unknown fonts, and segmentation errors. Understanding and improving word spotting systems is critical for managing vast archives of historical and handwritten documents where OCR often underperforms.

Key finding: This survey synthesizes a decade of research on word spotting as an alternative to OCR for document image retrieval, highlighting the efficacy of recognition-free retrieval methods based on graphical similarity rather than... Read more
Key finding: This paper reviews recognition-free retrieval techniques, noting that document image retrieval benefits from direct image feature representations rather than OCR reliance, which is prone to high computational cost and... Read more
Key finding: This study introduces appearance-based texture features via saliency maps derived from human visual attention modeling to prioritize document foregrounds for retrieval. Using Gist descriptors on saliency-weighted images... Read more

2. How can morphological and PDE-based image processing enhance document image segmentation and binarization?

This theme investigates advanced image processing techniques—especially morphological operations and partial differential equation (PDE)-based methods—for improving segmentation and binarization of document images. Document image binarization and segmentation are critical preprocessing steps for subsequent content extraction, notably under degradation such as noise, illumination variation, stains, and bleed-through common in historical or handwritten documents. This area explores combining shape and texture analysis, nonlinear diffusion, and variational methods to preserve edges and text integrity while removing noise, which consequentially improves OCR and retrieval tasks.

Key finding: This paper proposes a morphological, multiresolution framework to extract shape and texture features for document segmentation, emphasizing computational efficiency gains via analysis at reduced resolutions. It introduces... Read more
Key finding: The authors develop an adaptive thresholding algorithm tailored for complex document images exhibiting illumination variation, bleed-through, back-to-front interference, and shadows. The two-phase method uses edge detection... Read more
Key finding: This work applies PDE-based nonlinear diffusion combined with active contours and the split-Bregman algorithm for simultaneous denoising, edge enhancement, and segmentation of document images corrupted by various noise types.... Read more
Key finding: This comprehensive review evaluates classical and recent binarization methods, particularly focusing on their suitability for degraded handwritten document images with issues like bleed-through, stains, and noise.... Read more

3. How can multimodal and deep learning approaches improve metadata extraction and script classification in document images, especially for complex and historical scripts?

This theme investigates the integration of computer vision and natural language processing modalities combined with deep learning, including contrastive self-supervised frameworks, to extract metadata and classify scripts in complex documents (e.g., scientific PDFs, ancient Chinese manuscripts). The focus is on overcoming challenges posed by diverse layouts, complex scripts, degraded manuscripts, and limited annotated data by utilizing multimodal data representations and domain-specific augmentations. These approaches advance automated understanding of documents beyond conventional OCR and heuristic methods, enabling scalable digital humanities and document management.

Key finding: This paper proposes a multimodal neural network combining a BiLSTM model processing textual content with a convolutional vision model processing the PDF document as an RGB image. Late fusion of these two sub-models allows the... Read more
Key finding: This comprehensive review outlines recent advances in enhancing and analyzing ancient Chinese documents, including OCR for archaic scripts, image restoration for faded and damaged texts, layout detection for non-standard... Read more
Key finding: The paper demonstrates that combining Local Binary Pattern (LBP) images with texture descriptors GLCM and HOG into hybrid feature sets (LBGLCM and LBHOG) significantly improves word-level script identification accuracy in... Read more
Key finding: This work develops a unified CNN-based model for script identification from bi-script, tri-script, and multi-script camera-captured Indian document images. Evaluated on datasets comprising nine regional scripts plus Hindi and... Read more

All papers in Document Image Analysis

This research proposes the neural network (NN)-fuzzy logic control (FLC)-based methodology, which is designed with two stages of execution. Stage-1 is composed with the NN approach; it takes the input from the scanned image, the input... more
The carbonized Herculaneum scrolls represent a unique challenge for text recovery due to their fragile state and the visual similarity between ink and papyrus substrate. This study presents an iterative, human in-the-loop approach for ink... more
Given the existence of digital scanners, printers and fax machines, documents can undergo a history of sequential reproductions. One of the most important determiners of the quality of the resulting image is the set of underlying... more
A class of shift-variant reduction operations is introduced, that is useful for performing efficient and controllable shape and texture transformations between resolution levels. In their most general form, the operations proceed in three... more
A particularly effective method for analyzing document images, that consist of large numbers of binary pixels, is to generate reduced images whose pixels represent enhancements of textural densities in the full-resolution image. These... more
A vital part of the publication process of ancient cuneiform tablets is creating hand-copies, which are 2D line art representations of the 3D cuneiform clay tablets, created manually by scholars. This research provides an innovative... more
Today’s digital era, the attention towards camera based text processing has increased many folds. This has led to the development of multiple text processing methods. Most of the procedures follow a scene text detection manner and further... more
In this paper, we propose a new approach based on sparse coding for single textual image Super-Resolution (SR). The proposed approach is able to build more representative dictionaries learned from a large training... more
Demographic handwriting-based classification problems, such as gender and handedness categorizations, present interesting applications in disciplines like Forensic Biometrics. This work describes an experimental study on the suitability... more
Demographic handwriting-based classification problems, such as gender and handedness categorizations, present interesting applications in disciplines like Forensic Biometrics. This work describes an experimental study on the suitability... more
Demographic handwriting-based classification problems, such as gender and handedness categorizations, present interesting applications in disciplines like Forensic Biometrics. This work describes an experimental study on the suitability... more
Gérer les périodiques sous la direction de Géraldine Barron BAO #17 (2009) Favoriser la réussite des étudiants sous la direction de Carine El Bekri-Dinoird BAO #18 (2009) Mettre en oeuvre un plan de classement sous la direction de... more
The most dangerous and rapidly spreading disease in the world is Tuberculosis. In the investigating for suspected tuberculosis (TB), chest radiography is the only key techniques of diagnosis based on the medical imaging So, Computer aided... more
In this paper we present a survey of the literature on Arabic writer identification scheme and up-to date techniques employed in identification. The paper begins with an overview of the various writer identification schemes in Arabic and... more
Arad Ostracon 16 is part of the Elyashiv Archive, dated to ca. 600 b.c. It was published as bearing an inscription on the recto only. New multispectral images of the ostracon have enabled us to reveal a hitherto invisible inscription on... more
This article discusses the quality assessment of binary images. The customary, ground truth based methodology, used in the literature is shown to be problematic due to its subjective nature. Several previously suggested alternatives are... more
In this paper we present a hybrid approach to segment and classify contents of document images. A Document Image is segmented into three types of regions: Graphics, Text and Space. The image of a document is subdivided into blocks and for... more
We describe a simple convolutional network for blind unmixing of transient absorption microscopy data along with a model ensembling strategy. Our network is based on an autoencoder previously developed for blind unmixing of hyperspectral... more
In this paper, we present a method for discriminating handwritten and printed text from document images based on shape features. The separation of handwritten and printed text from document image is essential to optimize the OCR accuracy... more
Document identification is used to extract information from a digital document such as Al-Quran, articles, agreement and so on. With increasing digital documents on the internet, it is important to identify that the document is genuine or... more
Abstract: In an automatic document conversion system, which builds digital documents from scanned articles, there is a need to perform various adjustments before the scanned image is fed to the layout analysis system. This is because the... more
Religious minorities, particularly Coptic Christians, represent a vital component of Egypt’s social and historical fabric. Copts are the indigenous inhabitants of Egypt, tracing their origins to pre-Arab conquest periods. They maintain... more
We present a new approach for recognition of complex graphic symbols in technical documents. Graphic symbol recognition is a well known challenge in the field of document image analysis and is at heart of most graphic recognition systems.... more
Handwriting recognition and analysis has been an active area of research in the last two decades. Handwriting analysis is being studied in various fields of science, such as graphology, neurology, psychology, and computer science.... more
Die künstliche Intelligenz schafft völlig neue Möglichkeiten in der Erzeugung von Bildern. Erstmals sind alle Bildgebungstechniken der Kunstgeschichte in einem Werkzeug zusammengeführt, ausführbar von jedem Computerbesitzer ohne besondere... more
By proper exploitation of the structural characteristics existing in a compressed document, it is possible to speed up certain image processing operations. Alternatively, one can derive a compression scheme which would lend itself to an... more
AbstractÐAn algorithm, which performs connected components detection in the course of decoding ITU-T (former CCITT) facsimile Group 3/4, i.e., MH/MR/MMR compressed images is presented. New definitions of mode color and a new transition... more
Incunabulaare the texts printed mainly during the second half of 15th century that are a key cultural element in a revolutionary period of the history and evolution of the book and the printing. In these books, the identification of their... more
Printed documents continue to be the most commonly used media for information transfer in official context. However, such documents may be subject to illegitimate modification or malicious purposes. Therefore, agencies must be able to... more
The rapid growth of digital data has led to the widespread creation and storage of digital images containing text. The extraction and use of textual information might be advantageous for various kinds of domains. Text detection in natural... more
Download research papers for free!