Document Image Analysis

description1,141 papers

group10,638 followers

lightbulbAbout this topic

Document Image Analysis is a field of study focused on the extraction, interpretation, and processing of information from scanned or photographed documents. It encompasses techniques for text recognition, layout analysis, and feature extraction to facilitate the automated understanding and manipulation of document images.

lightbulbAbout this topic

Key research themes

1. How can recognition-free word spotting techniques improve document image indexing and retrieval?

This research theme focuses on recognition-free document image retrieval methods, particularly word spotting, which bypass traditional OCR limitations in indexing and searching digitized documents. These methods explore how image-level features and matching can be leveraged to retrieve words without explicit transcription, addressing challenges such as handwriting variability, degraded image quality, unknown fonts, and segmentation errors. Understanding and improving word spotting systems is critical for managing vast archives of historical and handwritten documents where OCR often underperforms.

A Survey of Document Image Word Spotting Techniques

by Angelos P Giotis and

2017

Key finding: This survey synthesizes a decade of research on word spotting as an alternative to OCR for document image retrieval, highlighting the efficacy of recognition-free retrieval methods based on graphical similarity rather than... Read more

articleView Paper downloadDownload

A Brief Review of Document Image Retrieval Methods: Recent Advances

by Fahimeh Alaei

2016

Key finding: This paper reviews recognition-free retrieval techniques, noting that document image retrieval benefits from direct image feature representations rather than OCR reliance, which is prone to high computational cost and... Read more

articleView Paper downloadDownload

Document Image Retrieval Based on Visual Saliency Maps

by Alireza Alaei

2023, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)

Key finding: This study introduces appearance-based texture features via saliency maps derived from human visual attention modeling to prioritize document foregrounds for retrieval. Using Gist descriptors on saliency-weighted images... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can morphological and PDE-based image processing enhance document image segmentation and binarization?

This theme investigates advanced image processing techniques—especially morphological operations and partial differential equation (PDE)-based methods—for improving segmentation and binarization of document images. Document image binarization and segmentation are critical preprocessing steps for subsequent content extraction, notably under degradation such as noise, illumination variation, stains, and bleed-through common in historical or handwritten documents. This area explores combining shape and texture analysis, nonlinear diffusion, and variational methods to preserve edges and text integrity while removing noise, which consequentially improves OCR and retrieval tasks.

Multiresolution Morphological Approach to Document Image Analysis

by Dan Bloomberg

2025

Key finding: This paper proposes a morphological, multiresolution framework to extract shape and texture features for document segmentation, emphasizing computational efficiency gains via analysis at reduced resolutions. It introduces... Read more

articleView Paper downloadDownload

An adaptive thresholding algorithm based on edge detection and morphological operations for document images

by Cleber Zanchettin

2016

Key finding: The authors develop an adaptive thresholding algorithm tailored for complex document images exhibiting illumination variation, bleed-through, back-to-front interference, and shadows. The two-phase method uses edge detection... Read more

articleView Paper downloadDownload

Active Contour based Document Image Segmentation and Restoration using Split-Bregman and Edge Enhancement Diffusion

by Poornima Rajan

2023

Key finding: This work applies PDE-based nonlinear diffusion combined with active contours and the split-Bregman algorithm for simultaneous denoising, edge enhancement, and segmentation of document images corrupted by various noise types.... Read more

articleView Paper downloadDownload

Binarization of Document Images: A Comprehensive Review

by MOHAMED ABDUL KADER

2023, Journal of Physics: Conference Series

Key finding: This comprehensive review evaluates classical and recent binarization methods, particularly focusing on their suitability for degraded handwritten document images with issues like bleed-through, stains, and noise.... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can multimodal and deep learning approaches improve metadata extraction and script classification in document images, especially for complex and historical scripts?

This theme investigates the integration of computer vision and natural language processing modalities combined with deep learning, including contrastive self-supervised frameworks, to extract metadata and classify scripts in complex documents (e.g., scientific PDFs, ancient Chinese manuscripts). The focus is on overcoming challenges posed by diverse layouts, complex scripts, degraded manuscripts, and limited annotated data by utilizing multimodal data representations and domain-specific augmentations. These approaches advance automated understanding of documents beyond conventional OCR and heuristic methods, enabling scalable digital humanities and document management.

Vision and natural language for metadata extraction from scientific PDF documents

by Azeddine Bouabdallah

2025, Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Key finding: This paper proposes a multimodal neural network combining a BiLSTM model processing textual content with a convolutional vision model processing the PDF document as an RGB image. Late fusion of these two sub-models allows the... Read more

articleView Paper downloadDownload

Ancient Chinese Document Enhancements Analysis: Techniques, Challenges, and Future Directions

by HOU HUIZE

2025, International Journal Document Analysis and Recognitions

Key finding: This comprehensive review outlines recent advances in enhancing and analyzing ancient Chinese documents, including OCR for archaic scripts, image restoration for faded and damaged texts, layout detection for non-standard... Read more

articleView Paper downloadDownload

Hybridization of Texture Features for Identification of Bi-Lingual Scripts from Camera Images at Wordlevel

by Satish Kumar

2024, Computer Vision and Machine Intelligence Paradigms for SDGs, Lecture Notes in Electrical Engineering

Key finding: The paper demonstrates that combining Local Binary Pattern (LBP) images with texture descriptors GLCM and HOG into hybrid feature sets (LBGLCM and LBHOG) significantly improves word-level script identification accuracy in... Read more

articleView Paper downloadDownload

SCRIPT IDENTIFICATION FROM CAMERA CAPTURED INDIAN DOCUMENT IMAGES WITH CNN MODEL

by Satish Kumar

2024, ICTACT JOURNAL ON SOFT COMPUTING

Key finding: This work develops a unified CNN-based model for script identification from bi-script, tri-script, and multi-script camera-captured Indian document images. Evaluated on datasets comprising nine regional scripts plus Hindi and... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Document Image Analysis

Decision Making and Recognition of String Characters using NN-Fuzzy Implication System

by Dr. Sanjeev K U M A R Mandal

2026, Elsevier B.V.

This research proposes the neural network (NN)-fuzzy logic control (FLC)-based methodology, which is designed with two stages of execution. Stage-1 is composed with the NN approach; it takes the input from the scanned image, the input... more

descriptionView Paper arrow_downwardDownload

Label the Invisible: AI-Aided Label Enhancement and Ink Residue Exposure

by Hannes K.

2026

The carbonized Herculaneum scrolls represent a unique challenge for text recovery due to their fragile state and the visual similarity between ink and papyrus substrate. This study presents an iterative, human in-the-loop approach for ink... more

descriptionView Paper arrow_downwardDownload

Determining the resolution of scanned document images

by Dan Bloomberg

2026, Proceedings of SPIE

Given the existence of digital scanners, printers and fax machines, documents can undergo a history of sequential reproductions. One of the most important determiners of the quality of the resulting image is the set of underlying... more

descriptionView Paper arrow_downwardDownload

Image analysis using threshold reduction

by Dan Bloomberg

2026, Image Algebra and Morphological Image Processing II

A class of shift-variant reduction operations is introduced, that is useful for performing efficient and controllable shape and texture transformations between resolution levels. In their most general form, the operations proceed in three... more

descriptionView Paper arrow_downwardDownload

Textured reductions for document image analysis

by Dan Bloomberg

2026, SPIE Proceedings

A particularly effective method for analyzing document images, that consist of large numbers of binary pixels, is to generate reduced images whose pixels represent enhancements of textural densities in the full-resolution image. These... more

descriptionView Paper arrow_downwardDownload

Cuneiform Stroke Recognition and Vectorization in 2D Images

by Avital Romach

2026, DHQ: Digital Humanities Quarterly

A vital part of the publication process of ancient cuneiform tablets is creating hand-copies, which are 2D line art representations of the 3D cuneiform clay tablets, created manually by scholars. This research provides an innovative... more

descriptionView Paper arrow_downwardDownload

MERCHANTS AS RULERS OF TOWNS IN ANCIENT AND MEDIEVAL KARNATAKA

by Sr J is

2026, AMITESH PUBLISHER & COMPANY

descriptionView Paper arrow_downwardDownload

Text Line Detection Using Connected Components

by Ojus Thomas Lee

2026

Today’s digital era, the attention towards camera based text processing has increased many folds. This has led to the development of multiple text processing methods. Most of the procedures follow a scene text detection manner and further... more

descriptionView Paper arrow_downwardDownload

Single Textual Image Super-Resolution Using Multiple Learned Dictionaries Based Sparse Coding

by Rim Walha

2026

In this paper, we propose a new approach based on sparse coding for single textual image Super-Resolution (SR). The proposed approach is able to build more representative dictionaries learned from a large training... more

descriptionView Paper arrow_downwardDownload

Gender and Handedness Prediction from Offline Handwriting Using Convolutional Neural Networks

by José Vélez

2026, Complexity

Demographic handwriting-based classification problems, such as gender and handedness categorizations, present interesting applications in disciplines like Forensic Biometrics. This work describes an experimental study on the suitability... more

descriptionView Paper arrow_downwardDownload

Gender and Handedness Prediction from Offline Handwriting Using Convolutional Neural Networks

by José Vélez

2026, Complexity

descriptionView Paper arrow_downwardDownload

Gender and Handedness Prediction from Offline Handwriting Using Convolutional Neural Networks

by José Vélez

2026, Complex.

descriptionView Paper arrow_downwardDownload

2. Les relations internationales dans les universités

by Jeanjacques Wunenburger

2026, Mener un projet international

Gérer les périodiques sous la direction de Géraldine Barron BAO #17 (2009) Favoriser la réussite des étudiants sous la direction de Carine El Bekri-Dinoird BAO #18 (2009) Mettre en oeuvre un plan de classement sous la direction de... more

descriptionView Paper arrow_downwardDownload

A Comparison of Image Segmentation Techniques, Otsu and Watershed for X-Ray Images

by VIJAY SAI

2026, International Journal of Research in Engineering and Technology

The most dangerous and rapidly spreading disease in the world is Tuberculosis. In the investigating for suspected tuberculosis (TB), chest radiography is the only key techniques of diagnosis based on the medical imaging So, Computer aided... more

descriptionView Paper arrow_downwardDownload

Arabic Writer Identification: A Review of Literature

by Abdullah Ahmed

2026, Journal of theoretical and applied information technology

In this paper we present a survey of the literature on Arabic writer identification scheme and up-to date techniques employed in identification. The paper begins with an overview of the various writer identification schemes in Arabic and... more

descriptionView Paper arrow_downwardDownload

A Brand New Old Inscription: Arad Ostracon 16 Rediscovered via Multispectral Imaging

by Barak Sober

2026, Bulletin of the American Schools of Oriental Research

Arad Ostracon 16 is part of the Elyashiv Archive, dated to ca. 600 b.c. It was published as bearing an inscription on the recto only. New multispectral images of the ostracon have enabled us to reveal a hitherto invisible inscription on... more

descriptionView Paper arrow_downwardDownload

Beyond the Ground Truth: Alternative Quality Measures of Document Binarizations

by Barak Sober

2026, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)

This article discusses the quality assessment of binary images. The customary, ground truth based methodology, used in the literature is shown to be problematic due to its subjective nature. Several previously suggested alternatives are... more

descriptionView Paper arrow_downwardDownload

A Texture-based Method for Document Segmentation and Classification

by Jules-Raymond Tapamo

2026, Revue Africaine de Recherche en Informatique et Mathématiques Appliquées

In this paper we present a hybrid approach to segment and classify contents of document images. A Document Image is segmented into three types of regions: Graphics, Text and Space. The image of a document is subdivided into blocks and for... more

descriptionView Paper arrow_downwardDownload

Blind hyperspectral unmixing of pump-probe images with maximum likelihood selection of a convolutional network from an ensemble of models

by Arya Chowdhury Mugdha

2026

We describe a simple convolutional network for blind unmixing of transient absorption microscopy data along with a model ensembling strategy. Our network is based on an autoencoder previously developed for blind unmixing of hyperspectral... more

descriptionView Paper arrow_downwardDownload

Word Level Handwritten and Printed Text Separation Based on Shape Features

by Upasana Patil

2025, ijetae.com

In this paper, we present a method for discriminating handwritten and printed text from document images based on shape features. The separation of handwritten and printed text from document image is essential to optimize the OCR accuracy... more

descriptionView Paper arrow_downwardDownload

Document Feature Extraction Based on Unoccupied Space Using Triangle Model: A Preliminary Work

by Norashikin Ahmad

2025, Journal of Telecommunication, Electronic and Computer Engineering

Document identification is used to extract information from a digital document such as Al-Quran, articles, agreement and so on. With increasing digital documents on the internet, it is important to identify that the document is genuine or... more

descriptionView Paper arrow_downwardDownload

Local Thresholding Algorithm Based on Variable Window Size Statistics

by Alexandru Stefanescu

2025, mail.cs.pub.ro

Abstract: In an automatic document conversion system, which builds digital documents from scanned articles, there is a need to perform various adjustments before the scanned image is fed to the layout analysis system. This is because the... more

descriptionView Paper arrow_downwardDownload

Religious Minorities in Egypt: Historical Roots, Citizenship, and Contemporary Challenges

by Mariam Alsayegh

2025, Mariam Alsayegh

Religious minorities, particularly Coptic Christians, represent a vital component of Egypt’s social and historical fabric. Copts are the indigenous inhabitants of Egypt, tracing their origins to pre-Arab conquest periods. They maintain... more

descriptionView Paper arrow_downwardDownload

Graphic Symbol Recognition Using Graph Based Signature and Bayesian Network Classifier

by Thierry BROUARD

2025

We present a new approach for recognition of complex graphic symbols in technical documents. Graphic symbol recognition is a well known challenge in the field of document image analysis and is at heart of most graphic recognition systems.... more

descriptionView Paper arrow_downwardDownload

Review of age and gender detection methods based on handwriting analysis

by Fahimeh Alaei

2025, Neural Computing and Applications

Handwriting recognition and analysis has been an active area of research in the last two decades. Handwriting analysis is being studied in various fields of science, such as graphology, neurology, psychology, and computer science.... more

descriptionView Paper arrow_downwardDownload

Die Welt der Neuen Bilder – Dokumentarische Fotografie und KI

by Bernd Arnold

2025, Die Welt der Neuen Bilder

Die künstliche Intelligenz schafft völlig neue Möglichkeiten in der Erzeugung von Bildern. Erstmals sind alle Bildgebungstechniken der Kunstgeschichte in einem Werkzeug zusammengeführt, ausführbar von jedem Computerbesitzer ohne besondere Fachkenntnisse. Erstellen lassen sich damit Bilder, die aussehen wie Fotografien, aber keine sind. Was bedeutet das für die Fotografie im Allgemeinen und die dokumentarische Fotografie im Besonderen? Eine Diskussion über diese Fragen ist überfällig, wenn man bedenkt, dass das Vertrauen in die Authentizität von Fotos fast 200 Jahre lang die Menschheitsgeschichte mitgeprägt hat. Bernd Arnold, Autor und Fotograf von „Das Kölner Heil“ und „Wahl Kampf Ritual“, ist überzeugt, dass wir eine Neue Welt betreten werden, wenn authentische Fotos und KI-generierte Fotografie-Imitationen bei der Bildung unseres Welt- und Geschichtsbildes miteinander konkurrieren. In drei Essays analysiert er die Technik- und Wahrnehmungsgeschichte der Fotografie, erklärt das Neue und Innovative der KI und befasst sich mit der zukünftigen Rolle der dokumentarischen Fotografie und deren Produzenten.  „Das Vertrauen in die Authentizität  einer dokumentarischen Fotografie  ist fundamental für eine Demokratie“  „Es sind nicht die sensationellen Bilder, die das  Vertrauen untergraben, sondern die Bilder, die wie  alltäglich gewohnte Fotografien erscheinen, aber Fotografie imitieren, Realität modulieren und in den  gewohnten Kanälen im Laufe der Jahre zunehmend  verbreitet werden.“

The World of New Images: Documentary photography and AI Artificial intelligence creates completely new possibilities in the creation of images. For the first time, all the imaging techniques of art history have been brought together in a single tool that can be used by any computer user without any special expertise. A discussion about these issues is overdue, considering that trust in the authenticity of photographs has helped shape human history for almost 200 years. Images can be created that look like photographs but are not. What does this mean for photography in general and documentary photography in particular?
They can be used to create images that look like photographs but are not. What does this mean for photography in general and documentary photography in particular? A discussion about these questions is overdue, considering that trust in the authenticity of photographs has helped shape human history for almost 200 years.
Bernd Arnold (author and photographer) is convinced that we will enter a New World when authentic photos and AI-generated photographic imitations compete with each other in forming our view of the world and history. In three essays, he analyzes the technological and perceptual history of photography, explains the new and innovative nature of AI and looks at the future role of documentary photography and its producers. For the phenomenon of photographic imitations, he introduces the term “dichography”, which describes a parallel universe of fictional reality that will soon be indistinguishable from the photographic light traces of pastreality.

descriptionView Paper arrow_downwardDownload

Document Image Analysis Using a New Compression Algorithm

by S. Latifi

2025, Document Analysis Systems: Theory and Practice

By proper exploitation of the structural characteristics existing in a compressed document, it is possible to speed up certain image processing operations. Alternatively, one can derive a compression scheme which would lend itself to an... more

descriptionView Paper arrow_downwardDownload

An algorithm with reduced operations for connected components detection in ITU-T group 3/4 coded images

by S. Latifi

2025, IEEE Transactions on Pattern Analysis and Machine Intelligence

AbstractÐAn algorithm, which performs connected components detection in the course of decoding ITU-T (former CCITT) facsimile Group 3/4, i.e., MH/MR/MMR compressed images is presented. New definitions of mode color and a new transition... more

descriptionView Paper arrow_downwardDownload

Tracing the origins of incunabula through the automatic identification of fonts in digitised documents

by Manuel-José Pedraza-Gracia

2025, Multimedia Tools and Applications

Incunabulaare the texts printed mainly during the second half of 15th century that are a key cultural element in a revolutionary period of the history and evolution of the book and the printing. In these books, the identification of their... more

descriptionView Paper arrow_downwardDownload

Computational Methods for Forgery Detection in Printed Official Documents

by Mohammed Alameri

2025

Printed documents continue to be the most commonly used media for information transfer in official context. However, such documents may be subject to illegitimate modification or malicious purposes. Therefore, agencies must be able to... more

descriptionView Paper arrow_downwardDownload

Enhancing Machine Learning Methods for Robust Real-Time Text Classification of Bilingual Documents

by santhosh SG

2025, INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY

The rapid growth of digital data has led to the widespread creation and storage of digital images containing text. The extraction and use of textual information might be advantageous for various kinds of domains. Text detection in natural... more

descriptionView Paper arrow_downwardDownload

Document Image Analysis

Key research themes

1. How can recognition-free word spotting techniques improve document image indexing and retrieval?

2. How can morphological and PDE-based image processing enhance document image segmentation and binarization?

3. How can multimodal and deep learning approaches improve metadata extraction and script classification in document images, especially for complex and historical scripts?

Related Topics

All papers in Document Image Analysis