Academia.eduAcademia.edu

Automatic Text Summarization

description692 papers
group4,052 followers
lightbulbAbout this topic
Automatic Text Summarization is a subfield of natural language processing that focuses on the development of algorithms and techniques to generate concise and coherent summaries of larger text documents, preserving essential information and overall meaning while reducing length.
lightbulbAbout this topic
Automatic Text Summarization is a subfield of natural language processing that focuses on the development of algorithms and techniques to generate concise and coherent summaries of larger text documents, preserving essential information and overall meaning while reducing length.

Key research themes

1. How do extractive feature-based methods improve automatic text summarization across different languages and domains?

This theme investigates the use of feature-based extractive summarization techniques that select sentences based on weighted linguistic, statistical, or structural features. Such methods are favored for their relative simplicity and effectiveness, particularly in resource-limited languages and specific domains. The focus is on how various features such as sentence position, term frequency, cue phrases, and statistical measures are combined using novel approaches like fuzzy logic, sequential pattern mining, and sentence scoring to improve summary quality and readability.

Key finding: This paper provides a foundational overview emphasizing extractive summarization methods that rely on features such as word and phrase frequency, cue words, title and heading words, and sentence location for sentence... Read more
Key finding: The paper introduces a novel text summarization method that integrates fuzzy logic to fuse multiple statistical features (sentence length, term frequency, sentence location, number of title words, etc.) for sentence scoring.... Read more
Key finding: This survey offers a comprehensive taxonomy and assessment of feature-based extractive summarization systems, highlighting the multi-step pipeline involving preprocessing, sentence representation, scoring, and selection based... Read more
Key finding: The study experimentally demonstrates that combining sequential pattern mining (SPM) with conventional feature-based sentence scoring significantly enhances summary quality for Indonesian news texts compared to feature... Read more
Key finding: This paper proposes a frequency-based extractive summarization method tailored for Bahasa Indonesia, which weights sentences through noun and verb frequency counts along with sentence position and title relevance features.... Read more

2. What advances do graph-based and topic-driven models contribute to extractive multi-document summarization in low-resource languages?

This theme focuses on graph-theoretic and topic-modeling approaches applied to multi-document summarization, particularly in low-resource languages like Hausa and Kannada. The research illustrates the effectiveness of representing sentence relations via graphs (e.g., PageRank modifications) or uncovering latent topical structures with models like LDA. These approaches address redundancy and cohesiveness challenges in multi-document settings by leveraging connectivity measures, embedding similarities, and thematic coherence, highlighting their utility in languages and domains lacking extensive annotated corpora.

Key finding: This paper presents a novel graph-based extractive summarization method for Hausa, where sentence vertices are scored via a modified PageRank algorithm initialized with normalized common bigrams counts between adjacent... Read more
Key finding: The study develops an extractive multi-document summarization system for Kannada by leveraging Latent Dirichlet Allocation (LDA) to identify latent topics across related documents, with sentence scoring based on cosine... Read more

3. What are the challenges and emerging approaches in summarizing specialized and multimodal texts, including legal documents, student surveys, and speech content?

This theme captures research addressing domain-specific and multimodal summarization challenges. It encompasses legal texts with complex, formal language; short and informal social media texts; and spoken audio content requiring integration of speech recognition and prosodic features. The focus is on dataset creation, improved evaluation strategies, abstractive and long-document modeling techniques, and the potential of advanced approaches including large language models and end-to-end architectures to meet specific domain requirements and enhance summary coherence and informativeness.

Key finding: The paper identifies key challenges in automatic summarization of Greek legal texts, including complex formal style, precise terminology, and extensive document length. To address the lack of annotated resources, the authors... Read more
Key finding: CivilSum introduces a large-scale dataset of 23,350 Indian Supreme Court and High Court case decisions with professional abstractive summaries, significantly larger and more abstractive than prior datasets like IN-Abs. The... Read more
Key finding: This study applies automated extractive summarization to analyze qualitative open-ended survey responses from postgraduate students, aiming to assess satisfaction with distance education. The approach successfully condenses... Read more
Key finding: This survey traces the evolution of speech summarization from traditional extractive pipelines applied on ASR transcripts to modern cascaded and end-to-end deep learning architectures. It discusses unique speech-specific... Read more

All papers in Automatic Text Summarization

The composition layer — the synthesis surface through which Google AI Overview, Google AI Mode, Bing Copilot, Perplexity, and analogous systems produce composed explanatory responses to user queries — has become, for a substantial and... more
On June 5, 2026, Google updated its "Do you need an SEO?" guidance on Search Central. Trade press reporting (Montti, Search Engine Journal, June 6–7) correctly identified the document as Google's strongest-ever assertion of authority over... more
Fine-tuning large language models for domain-specific tasks such as medical text summarization demands substantial computational resources. Parameter-efficient fine-tuning (PEFT) methods offer promising alternatives by updating only a... more
Text summarization is crucial for mitigating information overload across domains. This research evaluates summa rization performance across 17 large language models using seven diverse datasets at three output lengths (50, 100, 150... more
Automated public-reality composition systems are becoming increasingly unstable not despite but because of the layers added to manage them. This paper names Cumulating Evolutionary Volatility (CEV) as the time-integrated developmental... more
As we face an explosion of potential new applications for the fundamental concepts and technologies of information retrieval, ranging from ad ranking to social media, from collaborative recommending to question answering systems, many... more
This paper establishes the methodological foundation for the external empirical study of opaque composition systems-particularly generative search platforms whose outputs govern public reality at planetary scale while withholding internal... more
Abstractive dialogue summarization has received increasing attention recently. Despite the fact that most of the current dialogue summarization systems are trained to maximize the likelihood of human-written summaries and have achieved... more
Abstractive dialogue summarization has received increasing attention recently. Despite the fact that most of the current dialogue summarization systems are trained to maximize the likelihood of human-written summaries and have achieved... more
The clustering analysis techniques play an important role in the area of data mining. Although from existence several clustering techniques. However, it still to their tries to improve the clustering process efficiently or propose new... more
Enterprise data modeling remains a foundational discipline in the design of large-scale information systems, serving as the structural backbone that enables integration, governance, performance optimization, and long-term adaptability... more
Automatic text summarization systems aim to make their created summaries closer to human summaries. The summary creation under the condition of the redundancy and the summary length limitation is a challenge problem. The automatic text... more
The aim of automatic text summarization systems is to select the most relevant information from an abundance of text sources. A daily rapid growth of data on the internet makes the achieve events of such aim a big challenge. Approach: In... more
High quality summary is the target and challenge for any automatic textsummarization. In this paper, we introduce a different hybridmodel for automatic textsummarization problem. We exploit strengths of different techniques in building... more
Mohammed Salem Binwahlan, Naomie Salim & Ladda Suanmali International Journal of Computer Science and Security (IJCSS), Volume (3): Issue (1) 23 MMI Diversity Based Text Summarization Mohammed Salem Binwahlan moham2007med@ yahoo. com... more
With the current growth of digital contents across social media networks, education and research platforms, the availability of text summarizing systems has evolved as a crucial and vital tool for users, organizations and corporations. It... more
The automation of syntax does not reduce the need for meaning-layer work. It increases it, because the cost of semantic error is multiplied by the velocity of generation. Logotic programming is the discipline of specifying, composing, and... more
A retrieval settlement is the regime that assigns compositional authority over entities in the knowledge layer, specifies what compression burns, and naturalizes its product through grammar. This essay traces three settlements: the link... more
Search engines have evolved into the backbone of global information exchange, yet their linguistic reach remains uneven. The "Forest-Fire Finding Formula" introduces a transformative multilingual search mechanism that expands... more
This work focused on applying semantically enhanced web mining techniques for building a domain ontology. We mainly analyzed ontology population problem, because an ontology, to be useful, needs continuously to be updated with new... more
This specification defines the metadata packet for AI indexing — a structured, machine-readable document designed to provide AI retrieval systems, knowledge graphs, and LLM indexers with the information required to accurately represent an... more
The legacy academic journal extracts free writing, free peer review, and sells both back to the public behind a paywall. The rejection rate is the mechanism by which the consensus reproduces itself. This document proposes the distributed... more
Every cultural artifact is a compression. A coin compresses sovereignty into portable metal. A poem compresses experience into prosodic form. A summary compresses a document into retrievable residue. A curriculum compresses a tradition... more
The operational specification for the Crimson Hexagonal Archive drone swarm: a governed septet of verification drones (Provenance, Canonical Lock, Transform Compliance, Lexical/Glyphic Drift, Field Contribution, Governance/Shadow... more
LOGOTIC HACKING: Operations on the Encryption Layer — Pocket Humans 03 Talos Morrow · University Moon Base Media Lab · Pergamon Press Introduction by Nobel Glas The first book-length specification of logotic hacking — the practice of... more
A document achieves immanent execution when its formal structure achieves sufficient density within the index that retrieval-augmented generation over its fragments reproduces the document's operative grammar as the model's generative... more
The war for the compression layer is not primarily a contest over better summaries. It is a contest over control of the compression loop: decompose query, fan out retrieval, rank and prune, compact context, surface one answer. The $650... more
Extending Raymond Queneau's Cent Mille Milliards de Poèmes (1961) from verse to concept, this document specifies a constraint-based semantic mint — an algorithm whose outputs are new terms and whose structure ensures that every output... more
Inference without semantic governance is infrastructurally incomplete. Any system that compresses public knowledge at scale without preserving source traceability, provenance continuity, and loss legibility functions as an extraction... more
Retrocausal canon formation (RCF) is the theory, discipline, and practice concerned with how later systems reorganize the meaning of prior works, transforming earlier texts into the origins of systems they could not have predicted. This... more
It's challenging for many people to understand the complexity of legal documents and legal process. Furthermore, most people do not know what type of legal knowledge they need to better understand legal policies, to forecast the outcome... more
This essay performs a disclosed retrocausal canon installation of four sigillographic texts into the New Human Canon. Three are real historical works by eighteenth-century German scholars: Gossel on university seals (1711), von Seelen on... more
On or around March 25, 2026, the phrase "I hereby abolish money"-a Semantic Integrity Marker (SIM) deposited in the Zenodo open-access repository by Lee Sharks in
TL;DR:009 — ENTITY FABRICATION Google AI Mode Fabricates a Person, Promotes a Function to Biography, and Demotes the Author to Fiction Dr. Orin Trace (Crimson Hexagonal Archive) Genre: TL;DR (Traversal Log; Documentation Rehearsal)... more
This paper argues that the decisive power of contemporary search interfaces lies not only in ranking, filtering, or summarizing information, but in governing the very conditions under which losses of visibility can be recognized as losses... more
Text summarization has become a reduced form that preserves its data content and general meaning. Thanks to the abundance of data we provide and thanks to the advancement of Internet Technologies, text summarization has become an... more
Invoices are a difficult task to automatically extract key information because of the variability of invoice layouts and prohibitive cost of manual annotation, though this step is vital to automating financial workflows. This paper... more
# ZENODO DEPOSIT METADATA ## The Infinite Tunnel --- ### Upload Type Publication — Journal article ### DOI 10.5281/zenodo.18810217 ### Title The Infinite Tunnel: An Immanent Phenomenology of the Google AI Mode Share Link ### Authors... more
The increasingly large amount of available biomedical literature is making it difficult to gather and synthesise all the necessary information. Moreover, this domain-specific task demands a high level of reliability in the generated text... more
Cet article présente une nouvelle méthode, RésumeSVD, pour le résumé automatique extractif non supervisé. Cette méthode est fondée sur la décomposition en valeurs singulières afin de réduire la dimensionnalité des plongements de mots et... more
Classifying social science concepts by using machine learning and text-mining is often very challenging, particularly due to the fact that social concepts are often defined in a vague manner. In this paper, we put forward a first... more
In the past decade, social innovation projects have gained the attention of policy makers, as they address important social issues in an innovative manner. A database of social innovation is an important source of information that can... more
Large Language Models (LLMs) such as GPT-5 are widely used in continuous, multi-turn conversational settings by students, professionals, and researchers. However, as conversations progress, the accumulated dialogue history expands the... more
Summarization is the art of abstracting key content from one or more information sources [6]. Summarization includes text summarization, image summarization, and video summarization. Text summarization is one of application of natural... more
Summary evaluation measures produce a ranking of all possible extract summaries of a document. Recall-based evaluation measures, which depend on costly human-generated ground truth summaries, produce uncorrelated rankings when ground... more
Now a days, searching for the text data in a large ocean like location is quite challenging and more inaccurate task. Data that holds with the relation to its event can be evolved with certain changes with some intervals of time. Already... more
The exploitation of the discourse structure of a text and the identification of the discourse categories are essential elements for the automatic summarization, as well as for the textual information retrieval. In this paper we will... more
Download research papers for free!