Academia.eduAcademia.edu

Sound (Signal Processing)

description42 papers
group2,933 followers
lightbulbAbout this topic
Sound signal processing is the analysis, manipulation, and synthesis of sound signals using algorithms and mathematical techniques. It encompasses various methods for enhancing, transforming, and interpreting audio data, facilitating applications in fields such as telecommunications, music production, and acoustic engineering.
lightbulbAbout this topic
Sound signal processing is the analysis, manipulation, and synthesis of sound signals using algorithms and mathematical techniques. It encompasses various methods for enhancing, transforming, and interpreting audio data, facilitating applications in fields such as telecommunications, music production, and acoustic engineering.

Key research themes

1. How do physical modeling and wave equation solutions enhance spatial sound synthesis and instrument acoustics?

This research area explores the application of physical acoustics modeling, particularly solutions to the wave equation and physical vibration models, to understand and synthesize the spatial characteristics of sound produced by musical instruments. It is significant because accurately modeling sound radiation and propagation enables high-fidelity audio synthesis and spatial sound reproduction, which enhances realism in musical applications and psychoacoustic sound field synthesis.

Key finding: This work provides a comprehensive theoretical foundation by deriving the homogeneous wave equation, Helmholtz equation, plane wave solutions, and introduces the complex point source model as a simplification of sound... Read more
Key finding: This article surveys digital sound synthesis approaches, emphasizing physical modeling methods that simulate vibrating structures via partial differential equations. It discusses discrete-time implementations suitable for... Read more
Key finding: The study integrates modal synthesis (based on modal frequencies of objects) with granular synthesis to produce continuous, interactive tapping and scratching sounds that vary dynamically with user input such as force and... Read more

2. What are effective machine learning and signal processing approaches for automatic sound classification and speech emotion recognition using spectral decomposition?

This theme focuses on the application of machine learning techniques, especially deep neural networks, combined with advanced signal processing strategies such as variational mode decomposition and acoustic feature extraction, to automatically classify natural and environmental sounds and to perform speech emotion recognition. The importance lies in developing robust automated systems that extract meaningful features from complex and nonstationary audio signals, enabling applications in human-computer interaction, surveillance, and multimedia retrieval.

Key finding: This work evaluates environmental sound classification through Mel Frequency Cepstral Coefficients (MFCCs) and neural network classifiers. It employs spectrogram-derived feature extraction mimicking human auditory frequency... Read more
Key finding: The study proposes VGG-optiVMD, an enhanced variational mode decomposition technique that autonomously optimizes the number of modes and balancing parameters to extract informative signal components relevant for speech... Read more
Key finding: This paper combines deep-learning monophonic spectral separation with multichannel complex NMF source separation informed by direction-of-arrival (DOA) constraints. The Masker-Denoiser Twin Network (MaD TwinNet) estimates... Read more
Key finding: The paper presents a contrastive loss-based framework aligning latent representations of audio spectrograms and associated tags via co-aligned autoencoders. This multimodal embedding approach captures both semantic and... Read more

3. How can sonification techniques and auditory display theories be advanced through nonlinear sound propagation models and interdisciplinary aesthetics?

This research area investigates the enhancement of sonification methods (data-to-sound transformations) by incorporating nonlinear acoustics models, such as solutions based on Burgers equation, and by examining the aesthetic, musical, and interdisciplinary aspects of auditory display. The focus includes improving inverse problems in sonification, expanding the theoretical foundations, and addressing how sound as a medium conveys information across scientific and artistic domains, which is crucial for applications in medical imaging, scientific data analysis, education, and electroacoustic music.

Key finding: Introducing a novel sonification operator grounded in the nonlinear Burgers equation rather than traditional linear sound propagation, this paper demonstrates improved inverse sonification capable of enhancing medical images... Read more
Key finding: This editorial synthesizes diverse perspectives on sonification, emphasizing the continuum between faithful auditory display and artistic composition. It foregrounds the necessity of aesthetic decision-making in transforming... Read more
Key finding: The paper revisits Michel Chion's seminal work theorizing the complex interplay between sound and image in film, introducing foundational terminology such as the 'audiovisual contract' and 'added value' of sound. By framing... Read more

All papers in Sound (Signal Processing)

In this paper, we address the tasks of audio source counting and separation for a stereo anechoic mixture of audio signals. This will be achieved in two stages. In the first stage, a novel approach is introduced for estimating the number... more
This project explores the feasibility of detecting and tracking long-range combustion-engine UAVs using a distributed network of acoustic IoT sensors and multi-hop wireless communications. The idea is inspired by the increasing use of... more
This paper introduces the ongoing ElectroAcoustic Resource Site Pedagogical Project, or EARS II, in some detail. EARS II is to become an online educational resource for two groups of users: children of ca. 11-14 years of age as well as... more
Music genre classification is one of the sub-disciplines of music information retrieval (MIR) with growing popularity among researchers, mainly due to the already open challenges. Although research has been prolific in terms of number of... more
In recent years, deep networks have led to dramatic improvements in speech enhancement by framing it as a data-driven pattern recognition problem. In many modern enhancement systems, large amounts of data are used to train a deep network... more
Web-based recommendation strategy implemented in a cadastre information system is presented in the paper. This method forms the list of page profiles recommended to a given user. The idea of page recommendation uses the concept of a page... more
Introduction-Since the outbreak began in January 2020, Covid-19 has affected more than 161 million people worldwide and resulted in about 3.3 million deaths. Despite efforts to detect human infection with the virus as early as possible,... more
Emotion is a complicated notion present in music that is hard to capture even with fine-tuned feature engineering. In this paper, we investigate the utility of state-of-the-art pre-trained deep audio embedding methods to be used in the... more
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with... more
Vehicles generate dissimilar sound patterns under different health conditions. The sound generated by the vehicles gives a clue of some of the faults. Automotive experts diagnose the faults in vehicles based on the produced sound. This... more
Motorcycles generate different sound patterns under dissimilar working conditions. The generated sound pattern gives a clue of the fault. Mainly the parts of the engine that lead to change in sound are cylinder kit, crank, timing chain,... more
In this paper we propose two generic mechanisms implemented in a cadastre internet information system. The first one is the list of last queries submitted by a given user and the second one is the list of page profiles recommended to a... more
Introduction-Since the outbreak began in January 2020, Covid-19 has affected more than 161 million people worldwide and resulted in about 3.3 million deaths. Despite efforts to detect human infection with the virus as early as possible,... more
In this work, we aim to improve the expressive capacity of waveform-based discriminative music networks by modeling both sequential (temporal) and hierarchical information in an efficient end-to-end architecture. We present MuSLCAT, or... more
Hit Song Science aims to predict a songs popularity based on song structure and externalfeatures. To help provide an efficient and accurate tool for Annual Top-100 Billboard SongClassification, we apply fine-tuned BERT transformer and a... more
This paper proposes an effective sequential initialization for multichannel nonnegative matrix factorization to address the difficulty of initial value dependency of the conventional method. The proposed method sets initial values of... more
A design methodology of an all-digital phaselocked loop based on standard library cells is presented. The design route includes development of a scalable architecture to enable migration to various technology libraries. The design... more
В ХХ веке, наряду с традиционными музыкальными складами – монодическим, полифоническим и гомофонно-гармоническим, возникли и три новых – сонористика, электроакустическая музыка и мультимедиа. Музыка, написанная в первом из них,... more
Anomalous sound detection (ASD) is, nowadays, one of the topical subjects in machine listening discipline. Unsupervised detection is attracting a lot of interest due to its immediate applicability in many fields. For example, related to... more
Audio datasets support the training and validation of Machine Learning algorithms in audio classification problems. Such datasets include different, arbitrarily chosen audio classes. We initially investigate a unifying approach, based on... more
This article describes a computationally-efficient statistical approach to joint (semi-)blind source separation and dereverberation for multichannel noisy reverberant mixture signals. A standard approach to source separation is to... more
This paper describes a semi-supervised multichannel speech enhancement method that uses clean speech data for prior training. Although multichannel nonnegative matrix factorization (MNMF) and its constrained variant called independent... more
Visualizations help decipher latent patterns in music and garner a deep understanding of a song's characteristics. This paper offers a critical analysis of the effectiveness of various state-of-the-art Deep Neural Networks in visualizing... more
In many ways, all non-representational arts have distanced themselves to a greater or lesser extent from their potential public over the centuries due to the fact that art and life have been largely separated. For example, those who have... more
A hybrid classifier obtained by hybridizing Support Vector Machines (SVM) and Artificial Neural Network (ANN) classifiers is presented here for diagnosis of gear faults. The distinctive features obtained from vibration signals of a... more
The extremely challenging nature of passive acoustic surveillance makes it a key area of research in Naval Non-Co-operative Target Recognition especially in Anti-Submarine Warfare systems. In shallow waters, the complex acoustics due to... more
The current work is devoted to the analysis of the sound paradigm in video games of the horror genre. The sound in computer games is an important component, since the game is a syncretic medium. The aim of the work is an attempt to... more
As digital music production has become mainstream, the selection of appropriate virtual instruments plays a crucial role in determining the quality of music. To search the musical instrument samples or virtual instruments that make one's... more
As digital music production has become mainstream, the selection of appropriate virtual instruments plays a crucial role in determining the quality of music. To search the musical instrument samples or virtual instruments that make one's... more
Experiment presented in this study, used vibration data obtained from a four-stroke, 295 diesel engine. Fault of the internal-combustion engine was detected by using the vibration signals of the cylinder head. The fault diagnosis system... more
Despite there being clear evidence for top-down (e.g., attentional) effects in biological spatial hearing, relatively few machine hearing systems exploit top-down model-based knowledge in sound localisation. This paper addresses this... more
Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are... more
We have built a music similarity search engine that lets video producers search by listenable music excerpts, as a complement to traditional full-text search. Our system suggests similar sounding track segments in a large music catalog by... more
Emotion recognition (ER) from speech signals is a robust approach since it cannot be imitated like facial expression or text based sentiment analysis. Valuable information underlying the emotions are significant for human-computer... more
В данной работе с помощью техники регистрации событийно-связанных потенциалов (ССП), исследованы особенности нейрокогнитивных процессов при обработке расстояния тональной модуляции. В исследовании приняли участие 20 добровольцев (6... more
This paper presents a way of doing large-scale audio understanding without traditional state-of-the-art neural architectures. Ever since the introduction of deep learning for understanding audio signals in the past decade, convolutional... more
In existing production plants, sensor systems and other sources provide information about the plant condition. This paper presents methods for how data can be conveniently summarized, treated, and evaluated to retain characteristic... more
The need for loudness compensation is a well known fact arising from the nonlinear behavior of human sound perception. Music and other sounds are mixed and mastered at a certain loudness level, usually louder than the level at which they... more
Smartphones, wearables, and Internet of Things (IoT) devices produce a wealth of data that cannot be accumulated in a centralized repository for learning supervised models due to privacy, bandwidth limitations, and the prohibitive cost of... more
The SepFormer architecture shows very good results in speech separation. Like other learned-encoder models, it uses short frames, as they have been shown to obtain better performance in these cases. This results in a large number of... more
Саундскейпы (звуковые ландшафты) города как культурно-символические пространства, опосредующие наше восприятие звуков и шумов, динамичны, изменчивы и неоднородны. Специфика повседневной жизни, культурный опыт, потребляемые аудиовизуальные... more
Considers a creation task of program and hardware components of generators of accidental and psevdoaccidental numbers, built in information defense systems, describe the componental generators models of accidental and psevdoaccidental... more
Recent general-purpose audio representations show state-of-the-art performance on various audio tasks. These representations are pre-trained by self-supervised learning methods that create training signals from the input. For example,... more
This paper proposes a robust deep learning framework used for classifying anomaly of respiratory cycles. Initially, our framework starts with front-end feature extraction step. This step aims to transform the respiratory input sound into... more
Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we... more
In this work, we use a deep convolutional neural network (DCNN) trained with a public dataset, the Million Song Dataset, as a feature extractor. We trained the network from audio mel-spectrogram using artist labels in a discriminative... more
The increasing amount of online videos brings several opportunities for training self-supervised neural networks. The creation of large scale datasets of videos such as the YouTube-8M allows us to deal with this large amount of data in... more
Download research papers for free!