Key research themes
1. How can recognition-free word spotting techniques improve document image indexing and retrieval?
This research theme focuses on recognition-free document image retrieval methods, particularly word spotting, which bypass traditional OCR limitations in indexing and searching digitized documents. These methods explore how image-level features and matching can be leveraged to retrieve words without explicit transcription, addressing challenges such as handwriting variability, degraded image quality, unknown fonts, and segmentation errors. Understanding and improving word spotting systems is critical for managing vast archives of historical and handwritten documents where OCR often underperforms.
2. How can morphological and PDE-based image processing enhance document image segmentation and binarization?
This theme investigates advanced image processing techniques—especially morphological operations and partial differential equation (PDE)-based methods—for improving segmentation and binarization of document images. Document image binarization and segmentation are critical preprocessing steps for subsequent content extraction, notably under degradation such as noise, illumination variation, stains, and bleed-through common in historical or handwritten documents. This area explores combining shape and texture analysis, nonlinear diffusion, and variational methods to preserve edges and text integrity while removing noise, which consequentially improves OCR and retrieval tasks.
3. How can multimodal and deep learning approaches improve metadata extraction and script classification in document images, especially for complex and historical scripts?
This theme investigates the integration of computer vision and natural language processing modalities combined with deep learning, including contrastive self-supervised frameworks, to extract metadata and classify scripts in complex documents (e.g., scientific PDFs, ancient Chinese manuscripts). The focus is on overcoming challenges posed by diverse layouts, complex scripts, degraded manuscripts, and limited annotated data by utilizing multimodal data representations and domain-specific augmentations. These approaches advance automated understanding of documents beyond conventional OCR and heuristic methods, enabling scalable digital humanities and document management.