PapersFlow Research Brief

Speech and Audio Processing
Research Guide

What is Speech and Audio Processing?

Speech and Audio Processing is the field of signal processing that analyzes, synthesizes, and modifies speech and audio signals using techniques such as filtering, modeling, and machine learning.

Speech and Audio Processing encompasses methods for tasks including emitter location, noise suppression, and speech recognition, with 105,069 works published in the field. Foundational techniques include spectral subtraction for noise reduction as in Boll (1979) and hidden Markov models introduced by Rabiner and Juang (1986). Deep neural networks advanced acoustic modeling, replacing Gaussian mixture models in speech recognition systems as shown by Hinton et al. (2012).

105.1K

Papers

N/A

5yr Growth

1.0M

Total Citations

Research Sub-Topics

Array Signal Processing for Speech

This sub-topic covers beamforming, DOA estimation, and parametric methods like MUSIC for microphone arrays. Researchers study multiple emitter localization and reverberant environment challenges.

15 papers

Deep Neural Networks for Acoustic Modeling

This sub-topic focuses on DNN-HMM hybrids, end-to-end models, and feature extraction for large-vocabulary ASR. Researchers benchmark architectures on corpora like LibriSpeech.

15 papers

Speech Enhancement Using Spectral Subtraction

This sub-topic examines noise suppression algorithms, Wiener filtering, and spectral restoration techniques. Researchers address musical noise artifacts and non-stationary interference.

15 papers

Hidden Markov Models in Speech Recognition

This sub-topic covers HMM topology, Viterbi decoding, and acoustic-phonetic modeling fundamentals. Researchers extend to hybrid systems and duration modeling.

15 papers

Speech Recognition Toolkits and Datasets

This sub-topic develops open-source frameworks like Kaldi and public corpora like LibriSpeech for reproducible ASR research. Researchers contribute recipes, recipes, and benchmark evaluations.

15 papers

Why It Matters

Speech and Audio Processing enables practical applications in speech recognition systems trained on corpora like Librispeech, which provides 1000 hours of 16 kHz English speech from public domain audiobooks (Panayotov et al., 2015). Noise suppression via spectral subtraction improves speech clarity in noisy environments (Boll, 1979), supporting real-time voice AI platforms such as Deepgram, which raised $130 million in Series C funding at a $1.3 billion valuation to expand enterprise deployments. Open-source models like aiOla's Drax handle over 100 languages, including jargon and accents in noise, outperforming competitors in speed and accuracy.

Reading Guide

Where to Start

"An introduction to hidden Markov models" by Rabiner and Juang (1986) as it provides foundational theory for speech modeling applied explicitly to processing problems.

Key Papers Explained

Rabiner and Juang (1986) introduced hidden Markov models for speech, foundational for systems later enhanced by Gaussian mixtures. Hinton et al. (2012) built on this by replacing GMMs with deep neural networks for acoustic modeling, as shared by four groups. Panayotov et al. (2015) supported these advances with Librispeech, a large clean corpus for training modern ASR models like those in Kaldi by Povey (2024).

Paper Timeline

100%

graph LR P0["Multiple emitter location and si...
1986 · 13.9K cites"] P1["An introduction to hidden Markov...
1986 · 4.7K cites"] P2["Savitzky-Golay Smoothing Filters
1990 · 11.6K cites"] P3["Adaptive Mixtures of Local Experts
1991 · 4.7K cites"] P4["Deep Neural Networks for Acousti...
2012 · 10.1K cites"] P5["Librispeech: An ASR corpus based...
2015 · 5.6K cites"] P6["Kaldi Speech Recognition Toolkit
2024 · 4.9K cites"] P0 --> P1 P1 --> P2 P2 --> P3 P3 --> P4 P4 --> P5 P5 --> P6 style P0 fill:#DC5238,stroke:#c4452e,stroke-width:2px

Scroll to zoom • Drag to pan

Most-cited paper highlighted in red. Papers ordered chronologically.

Advanced Directions

Recent preprints cover "Automatic Speech Recognition: A Comprehensive Survey" and "Audio Signal Processing in the Artificial Intelligence Era," focusing on AI integration for speech tasks. News highlights aiOla's Drax open-source model supporting 100+ languages and Deepgram's $1.3B valuation for real-time voice AI.

Papers at a Glance

#	Paper	Year	Venue	Citations	Open Access
1	Multiple emitter location and signal parameter estimation	1986	IEEE Transactions on A...	13.9K	✕
2	Savitzky-Golay Smoothing Filters	1990	Computers in Physics	11.6K	✓
3	Deep Neural Networks for Acoustic Modeling in Speech Recogniti...	2012	IEEE Signal Processing...	10.1K	✕
4	Librispeech: An ASR corpus based on public domain audio books	2015	—	5.6K	✕
5	Kaldi Speech Recognition Toolkit	2024	—	4.9K	✓
6	An introduction to hidden Markov models	1986	IEEE ASSP Magazine	4.7K	✕
7	Adaptive Mixtures of Local Experts	1991	Neural Computation	4.7K	✕
8	Two decades of array signal processing research: the parametri...	1996	IEEE Signal Processing...	4.6K	✕
9	Suppression of acoustic noise in speech using spectral subtrac...	1979	IEEE Transactions on A...	4.6K	✕
10	Some Experiments on the Recognition of Speech, with One and wi...	1953	The Journal of the Aco...	4.5K	✕

In the News

aiOla unveils Drax, an open-source speech model with ...

Nov 2025 prnewswire.com aiOla

Supporting over 100 languages and accurately interpreting jargon, accents, abbreviations, and acronyms even in noisy environments, aiOla, backed by $58 million in funding from New Era, Hamilton Lan...

AI speech model aiOla Drax outpaces OpenAI & Alibaba

Nov 2025 aiola.ai Gil Hetz

Deepgram raises $130 million Series C at $1.3 billion ...

Jan 2026 roboticsandautomationnews.com Sam Francis

**Voice AI company Deepgram has raised $130 million in Series C funding at a valuation of $1.3 billion, as it looks to expand its real-time voice AI platform and scale deployments across enterprise...

aiOla unveils Drax, an open-source speech model with state-of-the-art accuracy and up to 5× faster than models from direct competitors

Nov 2025 prnewswire.com aiOla

SupraBet Reports Breakthrough: Step-Audio-R1.1 Speech ...

Jan 2026 weareiowa.com

* More () » local # Plea agreement reached in Des Moines murder trial DES MOINES –A plea deal has been reached in the murder trial of a woman accused of killing her stepfather.

Code & Tools

GitHub - alibaba/unified-audio: Unify Audio Processing and ...

github.com

This project contains a series of works developed for audio (including speech, music, and general audio events) processing and generation, which he...

X-LANCE/SLAM-LLM: A Framework for Speech, Language ...

github.com

**SLAM-LLM** is a deep learning toolkit that allows researchers and

open-mmlab/Amphion - Toolkit for Audio, Music, and Speech ...

github.com

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior research...

speechbrain/speechbrain: A PyTorch-based Speech Toolkit

github.com

Speech

GitHub - espnet/espnet: End-to-End Speech Processing Toolkit

github.com

ESPnet is an end-to-end speech processing toolkit covering end-to-end speech recognition, text-to-speech, speech translation, speech enhancement, s...

Recent Preprints

Audio and Speech Processing

Feb 2026 arxiv.org Preprint

All fieldsTitleAuthorAbstractCommentsJournal referenceACM classificationMSC classificationReport numberarXiv identifierDOIORCIDarXiv author IDHelp pagesFull text Search arXiv logo Cornell Univer...

Speech and Audio Processing - Recent articles and discoveries

Jan 2026 link.springer.com Preprint

- The aim of EURASIP Journal on Audio, Speech, and Music Processingis to bring together researchers, scientists and engineers working on the theory... Publishing modelOpen Access Jo...

Automatic Speech Recognition: A Comprehensive Survey

Aug 2025 researchgate.net Preprint

SEEU Review Volume 15 Issue 2 87 INTRODUCTION Speech recognition is an interdisciplinary subfield of natural language processing (NLP) that enables the recognition and translation of spoken languag...

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Oct 2025 ieeexplore.ieee.org Preprint

Audio Signal Processing in the Artificial Intelligence Era

merl.com Preprint

Artificial intelligence ( AI ) has seen significant advancement in recent years, leading to increasing interest in integrating these techniques to solve both existing and emerging problems in audi...

Latest Developments

Recent developments in speech and audio processing research as of February 2026 include advancements in noise-robust speech inversion through multi-task learning, high-fidelity generative speech enhancement via latent diffusion transformers, and the development of a unified SSL framework for learning speech and audio representations, among others (arXiv, Google Research, IEEE Transactions).

Sources

Audio and Speech Processing - arXiv

arxiv.org

Speech Processing - Google Research

research.google

IEEE Transactions on Audio, Speech and Language Proc...

signalprocessingsociety.org

Speech and Audio Processing - Recent articles and di...

link.springer.com

Recent Advances in Audio, Speech and Music Processin...

mdpi.com

Speech & Audio | Mitsubishi Electric Research Labora...

merl.com

Electrical Engineering and Systems Science > Audio a...

arxiv.org

Omnilingual ASR: Open-Source Multilingual Speech Rec...

arxiv.org

Frequently Asked Questions

What is spectral subtraction in speech processing?

Spectral subtraction reduces acoustically added noise in speech by estimating and subtracting the noise spectrum from the noisy speech spectrum. Boll (1979) presented this stand-alone algorithm for digital speech processors in practical environments. It effectively suppresses noise effects without requiring additional training data.

How do hidden Markov models apply to speech?

Hidden Markov models model the temporal variability of speech sequences in recognition systems. Rabiner and Juang (1986) introduced their use in speech processing, building on Markov chain theory applied to acoustic states. They pair with Gaussian mixture models to fit acoustic frames to HMM states.

What role do deep neural networks play in speech recognition?

Deep neural networks replace Gaussian mixture models for acoustic modeling in speech recognition, capturing complex patterns in audio frames. Hinton et al. (2012) shared views from four groups showing DNNs outperform traditional HMM-GMM systems. This shift improved accuracy in large-vocabulary continuous speech recognition.

What is the Librispeech corpus?

Librispeech is a 1000-hour corpus of read English speech sampled at 16 kHz, derived from public domain LibriVox audiobooks. Panayotov et al. (2015) made it freely available for training and evaluating ASR systems. It supports research without licensing restrictions.

What is Kaldi?

Kaldi is a free open-source toolkit for speech recognition research using finite-state transducers from OpenFst. Povey (2024) described its design with documentation and scripts for complete recognition systems. It facilitates reproducible experiments in speech processing.

Open Research Questions

? How can array signal processing methods like those in Schmidt (1986) integrate with modern deep learning for robust multi-emitter localization in dynamic environments?
? What improvements in noise suppression beyond spectral subtraction (Boll, 1979) can leverage DNNs for real-time speech enhancement in extreme noise?
? How do mixtures of local experts (Jacobs et al., 1991) extend to hierarchical acoustic modeling surpassing shared views in Hinton et al. (2012)?

Recent Trends

Deepgram raised $130 million Series C at $1.3 billion valuation to scale real-time voice AI.

2026

aiOla unveiled Drax, an open-source speech model 5× faster than competitors, supporting 100+ languages in noise ($58M funding, 2025).

Preprints include "Automatic Speech Recognition: A Comprehensive Survey" and ongoing arXiv submissions in Audio and Speech Processing.

2025

Research Speech and Audio Processing with AI

PapersFlow provides specialized AI tools for your field researchers. Here are the most relevant for this topic:

AI Literature Review

Automate paper discovery and synthesis across 474M+ papers

Deep Research Reports

Multi-source evidence synthesis with counter-evidence

Paper Summarizer

Get structured summaries of any paper in seconds

AI Academic Writing

Write research papers with AI assistance and LaTeX support

Start Researching Speech and Audio Processing with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

Try PapersFlow Free See AI Literature Review