Academia.eduAcademia.edu

Topic Models

description623 papers
group2,430 followers
lightbulbAbout this topic
Topic models are statistical algorithms used in natural language processing and text mining to discover abstract topics within a collection of documents. They analyze word co-occurrence patterns to identify clusters of words that frequently appear together, enabling the extraction of themes and the organization of large text corpora.
lightbulbAbout this topic
Topic models are statistical algorithms used in natural language processing and text mining to discover abstract topics within a collection of documents. They analyze word co-occurrence patterns to identify clusters of words that frequently appear together, enabling the extraction of themes and the organization of large text corpora.

Key research themes

1. How can topic models be effectively adapted to represent and analyze short and multimodal texts, such as social media messages and memes?

Short texts (e.g., tweets, microblogs) and multimodal content (e.g., memes combining text and images) present unique challenges to traditional topic modeling approaches due to information sparsity, multimodality, and semantic ambiguity. This theme investigates methods to adapt topic models for better semantic capture, improved interpretability, and accurate topic discovery in such domains, addressing the need for aggregation, semantic augmentation, and multimodal integration.

Key finding: This paper empirically demonstrates that training standard topic models on aggregated Twitter messages (such as aggregated user posts) yields higher quality topics and significantly better classification performance compared... Read more
Key finding: Introduces the Latent Concept Topic Model (LCTM), which incorporates word embeddings and latent concepts as Gaussian distributions in vector space to overcome data sparsity in short texts (e.g., SNS posts). By modeling topics... Read more
Key finding: Proposes PromptMTopic, a novel multimodal prompt-based topic modeling framework that leverages large language models and visual description extraction to jointly model text and image modalities in memes. The model effectively... Read more
Key finding: Assesses neural topic models (NTMs) combined with pretrained word embeddings as superior approaches for extracting coherent and interpretable topics from short social texts compared to traditional topic models. The study... Read more
Key finding: Demonstrates that using word embeddings (Word2Vec) significantly improves topic modeling quality on political-linguistic short-text datasets compared to latent Dirichlet allocation (LDA) alone, particularly when accompanied... Read more

2. How can semantic knowledge and external ontologies be integrated into topic models to enhance semantic coherence and disambiguation?

Traditional probabilistic topic models rely largely on word co-occurrence statistics, often ignoring underlying semantic relationships between words and their contextual meanings. This theme explores the integration of semantic resources like ontologies, knowledge bases, and concept mappings into topic modeling frameworks to better capture word meanings, handle ambiguity, and produce more interpretable, semantically coherent topics.

Key finding: Introduces Semantic-LDA, a topic model that incorporates external ontologies (e.g., Probase) by computing word-concept relationship strengths directly from the input text collection rather than fixed ontology-derived weights.... Read more
Key finding: Develops OCTVis, a visual analytics framework that maps topics from multiple topic models onto domain ontologies to facilitate qualitative evaluation and interpretability. By aligning topic terms with ontology concepts and... Read more
Key finding: Presents a non-Bayesian framework called Additive Regularization of Topic Models (ARTM) that simplifies modeling by adding domain-specific regularizers to stochastic matrix factorization for topic learning. ARTM enables... Read more

3. What are the methodological advances and limitations of variational inference and dynamic topic modeling approaches in large-scale and temporally evolving corpora?

This theme investigates advanced inference methods (including stochastic gradient variational Bayes) and dynamic topic models designed to capture correlations and temporal evolution in topics across large-scale datasets. It further examines inherent limitations such as instability, non-conjugacy issues, and challenges in topic interpretability over time, aiming to refine modeling techniques for reliable, scalable, and temporally aware topic discovery.

Key finding: Proposes an efficient stochastic gradient variational Bayes (SGVB) inference method for the Correlated Topic Model (CTM), which models correlations among latent topics using a logistic normal prior. The method avoids explicit... Read more
Key finding: Demonstrates that naive mean field variational inference for Latent Dirichlet Allocation (LDA) can produce misleading non-trivial topic decompositions even when the data contain no information about the true topic structure,... Read more
Key finding: Introduces a novel approach combining word embeddings with dynamic network clustering to identify and model temporal evolution of topics in large corpora, circumventing challenges of embedding alignment and stochasticity in... Read more

4. How have topic models evolved in research, and what are their applications and methodological trends across different disciplines?

This theme maps the historical development, key methodologies, and cross-disciplinary applications of topic modeling, emphasizing bibliometric and scientometric analyses. It helps researchers understand influential models, predominant research areas, and how topic modeling tools are tailored to various data types and research questions.

Key finding: Provides a comprehensive scientometric analysis of topic modeling literature, revealing the dominance of Latent Dirichlet Allocation (LDA) and its widespread application in short-text domains like social networks and blogs.... Read more
Key finding: Delivers a detailed introduction and survey of LDA and its probabilistic foundations, discussing variations like PLSA and extensions including hierarchical and multilingual topic models. Provides insights into model... Read more
Key finding: Proposes combining outputs from multiple topic modeling frameworks (e.g., LDA and doc2vec neural embeddings) to gain richer semantic interpretations beyond single model outputs. The study shows how mapping topic term... Read more
Key finding: Utilizes latent Dirichlet allocation topic models on a corpus of flagship astrobiology journals over five decades to identify thematic researcher communities, revealing how semantic profiles group authors by shared research... Read more

All papers in Topic Models

This study investigates consumer perceptions of sustainable fashion by analyzing online discussions. It aims to bridge the gap between consumer understanding and industry practices by providing insights to guide companies and policymakers... more
This study investigates consumer perceptions of sustainable fashion by analyzing online discussions. It aims to bridge the gap between consumer understanding and industry practices by providing insights to guide companies and policymakers... more
Introduction: Effective flood early warning systems (FEWS) are crucial to mitigating flood impacts. Yet, their governance is often hindered by numerous systemic barriers. These systemic barriers reinforce social inequities, which in turn... more
According to the textbook definition, a topic model aims to uncover the underlying topics of a corpus. Despite its widespread use across disciplines, the nature of these 'topics' has remained relatively underdefined. This research note... more
Background: One of the most comprehensive approaches to depression is the biopsychosocial model. From this wider perspective, social sciences have criticized the reductionist biomedical discourse, which has been dominating expert... more
This article investigates how the Russia-Ukraine war has reshaped the European Union's defence priorities and its pursuit of strategic autonomy. Using advanced text analytics and machine learning methods of over 26,000 European External... more
The rapid expansion of Artificial Intelligence (AI) in higher education has reshaped teaching, assessment, and academic writing practices, exposing the limitations of traditional plagiarism detection models. At the at the same time, the... more
Prior to and during the pandemic, social media platforms such as Twitter and Facebook emerged as dynamic online spaces for diverse communities facilitating engagement and learning. The authors of this article have explored the use of... more
International trade is one of the classic areas of study in economics. Its empirical analysis is a complex problem, given the amount of products, countries and years. Nowadays, given the availability of data, the tools used for the... more
Cuando comenzó la pandemia de COVID 19, las plataformas sociales tuvieron un rol central en la producción y acceso a la información. Este estudio identifica los tópicos de mayor interés y sus sentimientos asociados en Twitter en la... more
In this article we propose a novel methodology, which uses text similarity techniques to infer precise citations from the judgments of the Court of Justice of the European Union (CJEU), including their content. We construct a complete... more
While science is often portrayed as producing reliable knowledge, scientists tend to express caution about their claims, acknowledging nuances and doubt, all the more so in novel domains of research paved with unknowns. Uncertainty is an... more
Founded in 1974, Philosophiques has established itself as a leading journal of French-speaking philosophy in Quebec. Its fiftieth anniversary provides an opportunity to look back on half a century of articles, book reviews and thematic... more
This study explores how computing science students (n = 335) use ChatGPT, their trust in its information, their navigation of plagiarism issues, and their confidence in addressing plagiarism and academic integrity. A mixed-methods... more
This systematic review aims to understand what research has been done on the use of AI as a classroom tutor and how that body of work should shape future research. A systematic review was conducted using key term searches in four major,... more
Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Recent studies have found that while there are... more
Student gender differences in technology acceptance and use have persisted for years, giving rise to equity concerns in higher education (HE). To explore if such differences extend to generative artificial intelligence (genAI) chatbot... more
This paper presents an overview of the work carried out at the HMI group of the University of Twente in the domain of multi-party interaction. The process from automatic observations of behavioral aspects through interpretations resulting... more
The increasing use of artificial intelligence (AI) in education has raised questions about the implications of ChatGPT for teaching and learning. A systematic literature review was conducted to answer these questions, analyzing 112... more
Many educators and professionals in different industries may need to become more familiar with the basic concepts of artificial intelligence (AI) and generative artificial intelligence (Gen-AI). Therefore, this paper aims to introduce... more
The recent emergence of generative AI (GenAI) tools such as ChatGPT, Midjourney, and Gemini have introduced revolutionary capabilities that are predicted to transform numerous facets of society fundamentally. In higher education (HE), the... more
Activity mining in traffic scenes aims to automatically explain the complex interactions among moving objects recorded with a surveillance camera. Traditional machine learning algorithms generate a model and validate it with manually... more
Recent approaches in traffic and crowd scene analysis make extensive use of non-parametric hierarchical Bayesian models for intelligent clustering of features into activities. Although this has yielded impressive results, it requires the... more
In various real-world applications of distributed and multi-view vision systems, the ability to learn unseen actions in an online fashion is paramount, as most of the actions are not known or sufficient training data is not available at... more
The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new... more
The needs of travellers vary across cultures. When it comes to culinary aspects, there is a strong connection between gastronomy and culture. To optimise service offerings, investigation of the essential aspects of dining experiences in... more
As a result of travel activities, overtourism has become a global issue. Even after the COVID-19 pandemic, the topic of overtourism would benefit localized overcrowding as a new occurrence in the tourism industry. Since there is no... more
The rapid advancement of generative artificial intelligence (GAI) has opened new pathways for enhancing educational technologies, particularly through the integration of learning analytics (LA). This paper explores the dynamic... more
This dialogue sketches a research program that links brains, texts, and literary history using the conceptual toolkit of complex dynamics. We began from the asymmetry between whole-brain cognition and the linguistic traces available to... more
This article describes and evaluates the application of the supervised sentiment analysis in political communication through a real-time classifier of political opinions in Spanish tweets using machine learning techniques, both on a local... more
Doctor en Comunicación, Cultura y Educación y licenciado en Economía por la Usal. Ha desarrollado su trabajo en investigación de audiencias, metodologías de investigación, estructura del sistema audiovisual y las industrias culturales,... more
Download research papers for free!