Academia.eduAcademia.edu

String Matching

description1,775 papers
group995 followers
lightbulbAbout this topic
String matching is a computational problem that involves finding occurrences of a substring (pattern) within a larger string (text). It is a fundamental concept in computer science, particularly in algorithms, data structures, and text processing, with applications in areas such as search engines, data retrieval, and bioinformatics.
lightbulbAbout this topic
String matching is a computational problem that involves finding occurrences of a substring (pattern) within a larger string (text). It is a fundamental concept in computer science, particularly in algorithms, data structures, and text processing, with applications in areas such as search engines, data retrieval, and bioinformatics.

Key research themes

1. How can string matching algorithms be optimized for biological sequence analysis, particularly DNA pattern matching?

This body of research focuses on developing and evaluating string matching algorithms tailored for bioinformatics applications, especially DNA and protein sequence analysis. These algorithms aim to efficiently handle huge biological datasets, improve exact and approximate pattern matching accuracy, and reduce computational time and resource usage, addressing key challenges arising from the complexity and volume of biological sequence data.

Key finding: This paper evaluates the performance and accuracy of two classical string matching algorithms, Longest Common Substring (LCS) and Longest Common Subsequence (LCSS), specifically applied to DNA sequence comparison. The study... Read more
Key finding: Introduces two indexing data structures (CPI-I and CPI-II) to efficiently handle the circular pattern matching (CPM) problem, where DNA sequences or patterns are circular—critical in contexts like viral genome analysis. CPI-I... Read more
by Osman Ali and 
1 more
Key finding: Proposes two pattern matching algorithms (EFLPM and EPAPM) using word-level rather than character-level processing, significantly improving the runtime for DNA sequence searches. Experimental results demonstrate up to 54%... Read more
Key finding: Develops a novel hashing-based exact pattern matching algorithm tailored for biological sequences that reduces both preprocessing time and hash collisions compared to existing Efficient Hashing Method (EHM) algorithms. The... Read more
Key finding: Presents efficient algorithms for exact pattern matching in highly similar DNA sequences, accommodating k allowed variations over pattern-length windows. The single pattern matching algorithm utilizes approximate matching in... Read more

2. What advances in data structures and indexing methods can improve multiple and approximate string matching performance?

This research cluster explores algorithmic and data structural innovations for multiple pattern matching and approximate matching. The work aims to speed up searches across large texts or databases by reducing redundant computations, using compact indexing techniques, and combining automata with transformation approaches. Emphasis is placed on optimizing runtime, memory efficiency, and scalability, with applications spanning bioinformatics, text retrieval, and network security.

Key finding: Proposes a novel method combining the Burrows-Wheeler Transform (BWT) with a pattern matching machine (PMM) for efficient multiple string pattern matching. The method constructs PMM over the pattern set and searches it... Read more
Key finding: Introduces data structures that answer queries regarding occurrences of patterns from an internal dictionary—where patterns are substrings of a given text—within fragments of the same text. The data structures allow... Read more
Key finding: Presents a linear-time, constant-space two-way string matching algorithm which balances the characteristics of Knuth-Morris-Pratt and Boyer-Moore algorithms. The method relies on the Critical Factorization Theorem and... Read more
Key finding: Describes a hardware-accelerated implementation of regular expression matching via non-deterministic finite automata (NFA) tailored for network intrusion detection and stream categorization. The approach converts regular... Read more
Key finding: Develops a secure and verifiable semantic search scheme over encrypted cloud data using optimal matching formulated as a Word Transportation (WT) problem cast into random linear programming (LP) problems. The method leverages... Read more

3. How can string matching algorithms be effectively applied in practical systems such as customer data management and healthcare text analysis?

This theme covers the application-driven development of string matching algorithms designed to optimize search accuracy and speed within real-world systems. Research here focuses on addressing noisy, unstructured, or approximate matching challenges in domains like customer databases and clinical pathology reports. These applied studies adapt and enhance classical algorithms to accommodate domain-specific needs including fuzzy matching, phonetic similarity, and workflow integration.

Key finding: The study implements a fuzzy string matching system combining approximate and exact matching to enhance customer data retrieval for PT. PLN, addressing challenges of inconsistent name spellings. By integrating brute force and... Read more
Key finding: Proposes a novel mapping method using the normalized correlation coefficient to convert non-numeric characters into numerical vectors for Dynamic Time Warping (DTW)-based string matching. This conversion enables DTW to... Read more
Key finding: Develops a framework combining syntactic and semantic string matching techniques—including weighted string similarity measures and NLP-based API integration—to match user identities across Facebook, LinkedIn, and Twitter. The... Read more
Key finding: Reports the empirical evaluation of various string similarity metrics (including Hamming, Jaccard, and Levenshtein distances) in linking administrative business records with census data. The study demonstrates that data... Read more

All papers in String Matching

We address the problem of clustering of string patterns, in an Ensemble Methods perspective. In this approach different partitionings of the data are combined attempting to find a better and more robust partition. In this study we cover... more
We show that transitive closure logic (FO + TC) is strictly more powerful than deterministic transitive closure logic (FO + DTC) on nite (unordered) structures. In fact, on certain classes of graphs, such as hypercubes or regular graphs... more
Semantic searching over encrypted data is a crucial task for secure information retrieval in public cloud. It aims to provide retrieval service to arbitrary words so that queries and search results are flexible. In existing semantic... more
Regular expressions (regex) have long served as a foundational tool for pattern matching in computer science. This paper introduces Regex as Deterministic Aliases (RDA), a novel framework in which regex patterns are formalized as... more
Large-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons efficiently, BLAST [3, and other widely used tools use seeded alignment, which compares only... more
We study a scenario in which a machine acts upon input from its environment by changing the environment, hence affecting its own future input. The machine is a string pattern matcher; the environment varies -for example it can be a... more
This paper presents an industrial-grade intelligent purification system for industrial wastewater, integrated as a single-file unified SCADA (Supervisory Control and Data Acquisition) solution with comprehensive functional modules. The... more
Deep Packet Inspection (DPI) involves searching a packet's header and payload against thousands of rules to detect possible attacks. The increase in Internet usage and growing number of attacks which must be searched for has meant... more
We describe the first .efiicient algorithm for simultaneously matching multiple rectangular patterns of varying sizes and aspect, ratios in a rectangular text. Efficient means significantly better asymptotically than known al,qorithrns... more
Pertumbuhan informasi digital yang pesat menyebabkan meningkatnya kebutuhan akan sistem pencarian yang efektif dan efisien. Mesin pencari berfungsi sebagai sistem pengambilan informasi yang memungkinkan pengguna untuk menemukan informasi... more
Pertumbuhan informasi digital yang pesat menyebabkan meningkatnya kebutuhan akan sistem pencarian yang efektif dan efisien. Mesin pencari berfungsi sebagai sistem pengambilan informasi yang memungkinkan pengguna untuk menemukan informasi... more
Due to the huge surge of digital information and the task of mining valuable information from huge amount of data, text processing tasks like string search has gained importance. Earlier techniques for text processing relied on following... more
This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous... more
Bit parallel string matching algorithms are the latest and the efficient class of algorithm for string matching. It uses the intrinsic parallelism inside a computer word is known as bit parallelism. BNDM is the most popular bit parallel... more
Download research papers for free!