Academia.eduAcademia.edu

Mixture models

description764 papers
group1,872 followers
lightbulbAbout this topic
Mixture models are statistical models that represent a distribution as a combination of multiple component distributions, each associated with a specific probability. They are used to capture heterogeneity in data by allowing for the existence of subpopulations within an overall population, facilitating the analysis of complex datasets.
lightbulbAbout this topic
Mixture models are statistical models that represent a distribution as a combination of multiple component distributions, each associated with a specific probability. They are used to capture heterogeneity in data by allowing for the existence of subpopulations within an overall population, facilitating the analysis of complex datasets.

Key research themes

1. How can Expectation-Maximization algorithms be adapted and extended to estimate and generalize mixture models across diverse data and model structures?

A large body of work focuses on advancing the EM algorithm for estimating mixture models under various settings: enabling modular extensibility, handling partially observed data, tailoring to skewed/heavy-tailed components, and adapting to semiparametric/nonparametric frameworks. This theme is crucial because EM remains the computational backbone for finite mixture estimation, yet classical EM requires modifications to address practical challenges like label switching, component initialization, and incorporation of covariate-dependent mixture weights.

Key finding: FlexMix implements a flexible EM-based R package architecture that supports a modular M-step allowing users to define new mixture components, supports multivariate and grouped data, and integrates mixtures of linear and... Read more
Key finding: The authors propose a modified EM-type algorithm to estimate semi-parametric mixtures combining parametric and nonparametric components while solving the label switching problem common in localized EM implementations. By... Read more
Key finding: This work extends mixture regression models by assuming component errors follow scale mixtures of skew-normal distributions, enabling joint modeling of skewness and heavy tails. The EM algorithm is adapted utilizing a... Read more
Key finding: The paper develops an EM algorithm for maximum likelihood estimation in log-normal mixture models, introducing augmentation of incomplete data through latent variables. By alternating between expectation and maximization... Read more
Key finding: The authors design a ρ-estimator within an EM-like scheme for mixtures of densities belonging to VC-subgraph classes. The method achieves robustness to model misspecification and achieves near-parametric convergence rates for... Read more

2. What strategies improve model selection and initialization in mixture models to enhance estimation accuracy and cluster recovery?

Choosing the number of mixture components and properly initializing parameters are longstanding challenges due to multimodality, label switching, and local maxima of likelihood surfaces. Research has sought to develop statistical priors, initialization heuristics, and post-processing methods to improve model parsimony, avoid overfitting, and ensure convergence to meaningful solutions in finite mixture model fitting, particularly important in clustering and latent class analysis applications.

Key finding: This study formalizes the use of non-local priors (NLPs) in Bayesian mixture model selection, showing that NLPs enforce well-separated components with meaningful weights and induce sparsity by effectively penalizing... Read more
Key finding: The paper introduces a post-processing step combining Gaussian mixture model fitting with spectral clustering based on Bhattacharyya distances between components to merge overlapping Gaussians. This approach overcomes... Read more
Key finding: This work integrates consensus clustering and Bayesian mixture modeling, leveraging MCMC sampling and product partition models to robustly estimate cluster memberships in complex, high-dimensional big data scenarios. The... Read more
Key finding: The paper illustrates model-based clustering approaches such as growth mixture modeling and latent profile analysis, emphasizing strategies for enumerating latent classes and selecting mixture components. The authors discuss... Read more

3. How are mixture models adapted for handling complex data characteristics such as skewness, heavy tails, heterogeneous covariate effects, and zero/double inflation?

Many applied problems require mixture models that accommodate deviations from Gaussian assumptions, including skewed or heavy-tailed components, covariate-dependent mixing proportions, semiparametric or nonparametric regression relationships, and count data with excess zeros or inflated counts. Research in this theme develops and fits novel mixture formulations tailored to these data complexities, often integrating novel distributions, hierarchical models, or multi-phase modeling frameworks.

Key finding: This paper proposes a finite mixture model where components follow a multivariate restricted skew-normal scale mixture of Birnbaum–Saunders distributions, enabling modeling of asymmetric and heavy-tailed data. A... Read more
Key finding: The authors develop a Bayesian approach to semiparametric finite mixture of regression models where both component weights and conditional means depend on covariates through smooth functions represented by Bayesian P-splines.... Read more
Key finding: This study addresses zero- and k-inflated Poisson (ZkIP) models for count data exhibiting excess zeros and additional inflated counts at a value k > 0, common in health and social sciences. They develop a computationally... Read more
Key finding: The authors develop a novel two-component mixture model of log-Bilal distributions to model bounded data on (0,1), offering closed-form expressions for key properties without special functions. They investigate parameter... Read more
Key finding: Employing a two-phase Eulerian mixture model, this numerical study simulates nanofluid (Al2O3-water) heat transfer from an unconfined heated square cylinder across a range of Reynolds and Richardson numbers and nanoparticle... Read more

All papers in Mixture models

In recent years, video monitoring and surveillance systems have been widely used in traffic management. The image sequences for traffic scenes are recorded by a stationary camera. The video clip is sent to LabVIEW program to convert into... more
A suboptimal algorithm to fixed-interval smoothing for nonlinear Markovian switching systems is proposed. It infers a Gaussian mixture approximation to the posterior smoothing pdf by combining the statistics produced by an IMM filter into... more
A suboptimal algorithm to fixed-interval smoothing for nonlinear Markovian switching systems is proposed. It infers a Gaussian mixture approximation to the posterior smoothing pdf by combining the statistics produced by an IMM filter into... more
Whereas previous research has shown that either tree or spatial representations of dissimilarity judgments may be appropriate, focussing on the comparative fit at the aggregate level, we investigate whether there is heterogeneity among... more
Aphasia is the loss of the ability to produce and/or comprehend language, due to injury to brain areas responsible for these functions. Aphasic patients' performance on comprehension tests has traditionally been related both to the... more
by A To z
L'immunoprécipitation de la chromatine (ChIP) permet d'étudier les interactions entre les protéines et l'ADN ainsi que différents états chromatiniens. Le ChIP-chip est une technique combinant l'immunoprécipitation de la chromatine avec le... more
Psychopathology, diagnosis, and classification of mental disorders have traditionally been based on a biomedical perspective. With the aim of defining and classifying mental disorders, the American Psychiatric Association (APA) and the... more
1. Capture-recapture mixture models are important tools in evolution and ecology to estimate demographic parameters and abundance while accounting for individual heterogeneity. A key step is to select the correct number of mixture... more
Managing large carnivores is one of the most controversial issues in wildlife conservation, as the sociopolitical challenges it raises are as important as the biological ones. Such controversial issues in wildlife conservation require... more
Currently, diagnoses of psychopathology rely on discrete labels. The assumption is that each instance of a given disorder is importantly similar to other instances of that disorder and importantly distinct from instances of any other... more
Poverty measurement and the analysis of the progress (or otherwise) of the poor is beset with difficulties and controversies surrounding the definition of a poverty line or frontier. Here, using ideas from the partial identification... more
Poverty measurement and the analysis of the progress (or otherwise) of the poor is beset with difficulties and controversies surrounding the definition of a poverty line or frontier. Here, using ideas from the partial identification... more
Covariance matrices of multivariate data capture feature correlations compactly, and being very robust to noise, they have been used extensively as feature descriptors in many areas in computer vision, like, people appearance tracking,... more
Phylogenetic analyses of DNA sequences were conducted to evaluate four alternative hypotheses of phrynosomatine sand lizard relationships. Sequences comprising 2871 aligned base pair positions representing the regions spanning ND1-COI and... more
In Discrete Discriminant Analysis dimensionality problems often occur. In this context, we propose a combining models approach, taking profit from several potential models. In the bi-class case, a single combination coefficient is... more
We make an analogy between Man and the Grail because the former is the receptacle of energies of all kinds. Just as the Grail receives inexhaustible sources (water or light), so too the human being is a fine craftsman of the reception and... more
This paper introduces a novel method for feature selection in statistical models, combining the Minimum Average Variance Estimator (MAVE) with the Reciprocal Adaptive Bridge (RAB) penalty. The primary goal is to enhance variable selection... more
Electromagnetic sensors such as ground penetrating radar and electromagnetic induction sensors are among the most widely used methods for the detection of buried land mines and unexploded ordnance. However, the performance of these... more
We introduce here the AdaptSgenoLasso, a new penalized likelihood method for gene mapping and for genomic prediction, which is an extended version of the SgenoLasso. The AdaptSgeno-Lasso relies on the original concept of a selective... more
Laplace mixture model is widely used in lifetime applications. The estimation of model parameters is required to analyze the data. In this paper, the expectation maximization algorithm is used to obtain the estimates of parameters. The... more
The limited availability of marine ingredients means that new and improved raw materials with high potential to replace fishmeal (FM) are required. Faba bean (Vicia faba) is a legume with good potential that has previously been tested in... more
High-quality sources of protein for the formulation of feeds of carnivorous fish species such as Atlantic salmon are currently being sought. In an earlier screening trial we evaluated for the first time in Atlantic salmon (Salmo salar)... more
The limited availability of marine ingredients means that new and improved raw materials with high potential to replace fishmeal (FM) are required. Faba bean (Vicia faba) is a legume with good potential that has previously been tested in... more
In the wake of the novel coronavirus, SARS-CoV-19, the world has undergone a critical situation in which grave threats to global public health emerged. Among human populations across the planet, travel restraints, border enforcement... more
General presentation of Coptic anaphoras in today's liturgy
This paper considers the issue of modeling fractional data observed on [0,1), (0,1] or [0,1]. Mixed continuous-discrete distributions are proposed. The beta distribution is used to describe the continuous component of the model since its... more
Recent research on machine learning focuses on audio source identification in complex environments. They rely on extracting features from audio signals and use machine learning techniques to model the sound classes. However, such... more
Recent research on machine learning focuses on audio source identification in complex environments. They rely on extracting features from audio signals and use machine learning techniques to model the sound classes. However, such... more
Computer network technology is developing quickly, and the advancement of internet techniques is growing faster. Furthermore, people and companies have became more aware of the importance of network security. To protect the network from... more
Impact experiments were performed at the french-german research Institute of Saint-Louis on three energetic materials composed of 70 % in weight of RDX particles embedded in a wax matrix. These materials differ by the microstructural... more
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. Additionally, in the field of sound direction-of-arrival (DOA) estimation, the binaural... more
Wavelet-based demosaicing techniques have the advantage of being computationally relatively fast, while having a reconstruction performance that is similar to state-of-the-art techniques. Because the demosaicing rules are linear, it is... more
To explore the "Perturb and Combine" idea for estimating probability densities, we study mixtures of tree structured Markov networks derived by bagging combined with the Chow and Liu maximum weight spanning tree algorithm and we try to... more
Download research papers for free!