Academia.eduAcademia.edu

Acoustic Modeling

description2,405 papers
group43 followers
lightbulbAbout this topic
Acoustic modeling is the process of creating mathematical representations of sound production and propagation in various environments. It involves analyzing sound waves, their interactions with materials, and the effects of different acoustic conditions to improve applications in fields such as speech recognition, audio engineering, and environmental acoustics.
lightbulbAbout this topic
Acoustic modeling is the process of creating mathematical representations of sound production and propagation in various environments. It involves analyzing sound waves, their interactions with materials, and the effects of different acoustic conditions to improve applications in fields such as speech recognition, audio engineering, and environmental acoustics.

Key research themes

1. How can machine learning improve acoustic modeling for robust feature extraction and surrogate modeling?

This research theme explores the integration of machine learning (ML), including deep learning, to enhance acoustic modeling by learning robust representations from raw or frequency-domain acoustic data. It focuses on improving the generalization of acoustic models across varied environments, as well as creating surrogate models that efficiently approximate complex vibroacoustic simulations. Such approaches aim to overcome the limitations of traditional handcrafted features and expensive computational methods, enabling better performance in speech recognition, sound transmission loss predictions, and environmental noise conditions.

Key finding: The paper surveys the transformative impact of ML in diverse acoustics applications, establishing that data-driven representation learning can discover complex acoustic phenomena such as human speech and reverberation... Read more
Key finding: This study proposes a vicinal risk minimization framework for learning robust acoustic models directly from raw waveforms, addressing significant mismatches between training and test environments. By modeling local... Read more
Key finding: The paper introduces a frequency-domain feature learning layer that integrates a Fourier transform inside the network to enable acoustic model training directly from raw waveforms. By incorporating a novel normalization layer... Read more
Key finding: This work investigates multiple ML methods, including Gaussian Process Regression, Radial Basis Functions, and Neural Networks, to create surrogate models approximating sound transmission loss (STL) simulations, which are... Read more

2. What numerical and physics-informed modeling approaches enable efficient and accurate simulation of acoustic wave propagation and wave-based systems?

This theme covers advanced modeling methods for acoustic wave propagation that balance computational efficiency with physical accuracy, especially in complex and large-scale acoustic domains like rooms, resonators, and coupled subsystems. It includes the development of wave-based multipole models, state-space approaches for networked acoustic elements, digital filter design for reflections and air absorption, and reduced-order models for visco-thermal losses. These methods provide practical frameworks for sound propagation simulations, offering causal, compact representations of boundary conditions and subsystem interconnections that are essential for accurate acoustic predictions and real-time applications.

Key finding: The paper presents a Bayesian inference framework to estimate both the order and parameters of multipole acoustic admittance models from experimentally measured frequency-dependent admittance data. By incorporating maximum... Read more
Key finding: The study introduces a generalized linear state space framework to model interconnected acoustic subsystems, combining low-order 1D models, 3D linearized perturbation-based models, and data-driven models into a unified... Read more
Key finding: This paper proposes low-order minimum-phase digital filter design techniques to model acoustic reflection and air absorption effects based on measured absorption coefficients and impedance data. The method addresses the... Read more
Key finding: The authors develop a computationally lightweight hybrid model combining lossless Helmholtz equations with viscous and thermal boundary layer perturbation theory to predict sound absorption in resonators with large... Read more

All papers in Acoustic Modeling

A sharable software repository for Japanese LVCSR (Large Vocabulary Continuous Speech Recognition) is introduced. It is designed as a baseline platform for research and developed by researchers of different academic institutes under a... more
Real-time multilingual interaction during mobile video calls still difficult to achieve due to strict latency, fluctuating network conditions, and the limited resources capacity of handheld devices. Although recent speech translation... more
Acoustic Modeling in today's emotion recognition engines employs general models independent of the spoken phonetic content. This seems to work well enough given sufficient instances to cover for a broad variety of phonetic structures and... more
Nonverbal vocalizations are one of the characteristics of spontaneous speech distinguishing it from written text. These phenomena are sometimes regarded as a problem in language and acoustic modeling. However, vocalizations such as filled... more
In the search for a standard unit for use in recognition of emotion in speech, a whole turn, that is the full section of speech by one person in a conversation, is common. Within applications such turns often seem favorable. Yet, high... more
Recognition of emotion in speech usually uses acoustic models that ignore the spoken content. Likewise one general model per emotion is trained independent of the phonetic structure. Given sufficient data, this approach seemingly works... more
This study has introduced the design of a Hidden Markov Model based LVCSR system in a new target language based on a different source language and without the need of a large speech databases on the target language. The Tigrinya LVCSR was... more
The quality of seismic images obtained by reverse time migration ͑RTM͒ strongly depends on the imaging condition. We propose a new imaging condition that is motivated by stationary phase analysis of the classical crosscorrelation imaging... more
We investigate and compare several techniques for automatic recognition of unconstrained context-independent phoneme strings from TIMIT and NTIMIT databases. Among the compared techniques, the technique based on TempoRAl Patterns (TRAP)... more
Phoneme Recognizers followed by Language Modeling (PRLM) have consistently yielded top performance in language identification (LID) task. Parallel ordering of PRLMs (PPRLM) improves performance even more. Since tokenizer is the most... more
Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of "universal phone set" that... more
We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian... more
We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state transducers (using the freely available OpenFst), together with detailed... more
The AMADEUS (ANTARES Modules for the Acoustic Detection Under the Sea) system which is described in this article aims at the investigation of techniques for acoustic detection of neutrinos in the deep sea. It is integrated into the... more
This paper deals with binaural sound localization. An active strategy is proposed, relying on a precise model of the dynamic changes induced by motion on the auditive perception. The proposed framework allows motions of both the sound... more
The aim of this research is to develop a speech synthesis model tailored towards Nigerian languages by leveraging natural language processing tool such as FastSpeech 2 and meta-tts for high-quality, non-autoregressive text-to-speech (TTS)... more