DmitryRyumin / ICASSP-2023-Papers

ICASSP 2023 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023 conference. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ICASSP-2023-Papers

General Information Awesome Conference Version License: MIT
Repository Size and Activity GitHub repo size GitHub commit activity (branch)
Contribution Statistics GitHub contributors GitHub closed issues GitHub issues GitHub closed pull requests GitHub pull requests
Other Metrics GitHub last commit GitHub watchers GitHub forks GitHub Repo stars Visitors

ICASSP 2023 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023 conference. Explore the latest advancements in acoustics, speech and signal processing. Code included. ⭐ the repository to support the advancement of audio and signal processing!

ICASSP 2023


PDF version of the ICASSP 2023 Conference Programme, which lists all accepted full papers along with their presentation mode and time.


Other collections of the best AI conferences

❗ Conference table will be up to date all the time.

Conference Year
2023 2024
Computer Vision (CV)
CVPR
ICCV  
ECCV
WACV βž–  
Speech/Signal Processing (SP/SigProc)
ICASSP
INTERSPEECH  
ISMIR   βž–
Natural Language Processing (NLP)
EMNLP
Machine Learning (ML)
AAAI βž–
ICLR βž–
ICML βž–
NeurIPS βž–

Contributors



Contributions to improve the completeness of this list are greatly appreciated. If you come across any overlooked papers, please feel free to create pull requests, open issues or contact me via email. Your participation is crucial to making this repository even better.


Papers

List of sections

Audio for Multimedia and Multimodal Processing

πŸ†” Title Repo Paper
647 Diverse and Vivid Sound Generation from Text Descriptions GitHub Page IEEE Xplore
arXiv
2248 EPIC-SOUNDS: A Large-Scale Dataset of Actions that Sound GitHub Page
GitHub
IEEE Xplore
arXiv
784 I See What You Hear: A Vision-inspired Method to Localize Words βž– IEEE Xplore
arXiv
6119 Incorporating Lip Features Into Audio-Visual Multi-Speaker DOA Estimation by Gated Fusion βž– IEEE Xplore
6787 UAVM: Towards Unifying Audio and Visual Models (SPS Journal Paper) GitHub IEEE Xplore
arXiv

Drone-vs-Bird Detection Grand Challenge at ICASSP23

πŸ†” Title Repo Paper
6834 High-Speed Drone Detection based on Yolo-v8 βž– IEEE Xplore
6863 S-Feature Pyramid Network and Attention Model for Drone Detection βž– IEEE Xplore
6881 Drone-vs-Bird: Drone Detection using Yolov7 with CSRT Tracker βž– IEEE Xplore

Human Identification and Face Recognition

πŸ†” Title Repo Paper
530 EMCLR: Expectation Maximization Contrastive Learning Representations βž– IEEE Xplore
711 Boosting Person Re-Identification with Viewpoint Contrastive Learning and Adversarial Training βž– IEEE Xplore
812 Top-K Visual Tokens Transformer: Selecting Tokens for Visible-infrared Person Re-Identification βž– IEEE Xplore
2531 Frequency-aware Attentional Feature Fusion for Deepfake Detection βž– IEEE Xplore
5309 Recursive Joint Attention for Audio-Visual Fusion in Regression based Emotion Recognition GitHub IEEE Xplore
arXiv
3475 Multi-Stream Facial Adaptive Network for Expression Recognition from a Single Image GitHub IEEE Xplore

Self-Supervised Learning Methods

πŸ†” Title Repo Paper
429 PointACL: Adversarial Contrastive Learning for Robust Point Clouds Representation under Adversarial Attack GitHub IEEE Xplore
arXiv
2579 Enhancing Representation Learning with Deep Classifiers in Presence of Shortcut GitHub IEEE Xplore
730 K2NN: Self-Supervised Learning with Hierarchical Nearest Neighbors for Remote Sensing βž– IEEE Xplore
4453 TriNet: Stabilizing Self-Supervised Learning from Complete or Slow Collapse GitHub IEEE Xplore
arXiv
1629 On Minimal Variations for Unsupervised Representation Learning βž– IEEE Xplore
arXiv
740 Adaptive Data Augmentation for Contrastive Learning βž– IEEE Xplore
arXiv

ASR with Constrained Resource

πŸ†” Title Repo Paper
690 De'HuBERT: Disentangling Noise in a Self-Supervised Model for Robust Speech Recognition βž– IEEE Xplore
arXiv
1948 Masked Token Similarity Transfer for Compressing Transformer-based ASR Models βž– IEEE Xplore
2888 Unsupervised Fine-Tuning Data Selection for ASR using Self-Supervised Speech Models βž– IEEE Xplore
arXiv
3250 CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition βž– IEEE Xplore
arXiv
3712 Context-aware Fine-Tuning of Self-Supervised Speech Models βž– IEEE Xplore
arXiv
6449 Data2vec-Aqc: Search for the Right Teaching Assistant in the Teacher-Student Training Setup GitHub IEEE Xplore
arXiv

ASR: Multilingual Speech Recognition

πŸ†” Title Repo Paper
2417 Hierarchical Softmax for End-to-End Low-Resource Multilingual Speech Recognition GitHub IEEE Xplore
arXiv
4510 Improving Massively Multilingual ASR With Auxiliary CTC Objectives GitHub Page
GitHub
IEEE Xplore
arXiv
4777 Massively Multilingual Shallow Fusion with Large Language Models βž– IEEE Xplore
arXiv
5465 UML: A Universal Monolingual Output Layer for Multilingual ASR βž– IEEE Xplore
arXiv
5744 Investigation Into Phone-based Subword Units for Multilingual End-to-End Speech Recognition βž– IEEE Xplore
6221 Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities βž– IEEE Xplore
arXiv

Adaptive Signal Processing

πŸ†” Title Repo Paper
1224 A Compensated Shrinkage Affine Projection Algorithm for Debiased Sparse Adaptive Filtering βž– IEEE Xplore
1761 Dynamic Selection of p-Norm in Linear Adaptive Filtering via Online Kernel-based Reinforcement Learning βž– IEEE Xplore
arXiv
2511 Neural Network Models with Integrated Training and Adaptation for Nonlinear Acoustic System Identification βž– IEEE Xplore
3895 Neural Mode Estimation βž– IEEE Xplore
5352 Adaptive ECCM for Mitigating Smart Jammers βž– IEEE Xplore
arXiv
6529 Differentiable Adaptive Short-Time Fourier Transform with Respect to the Window Length βž– IEEE Xplore

6G Integrated Sensing and Communication (ISAC) from Theory to Practice - A Signal Processing Perspective

πŸ†” Title Repo Paper
3049 6G Integrated Sensing and Communication - Sensing Assisted Environmental Reconstruction and Communication βž– IEEE Xplore
3325 Neurally Augmented State Space Model for Simultaneous Communication and Tracking with Low Complexity Receivers βž– IEEE Xplore
3456 Multi-View Millimeter-Wave Imaging Over Wireless Cellular Network βž– IEEE Xplore
3803 Joint Data Association, NLOS Mitigation, and Clutter Suppression for Networked Device-Free Sensing in 6G Cellular Network βž– IEEE Xplore
arXiv
4255 Integrating the Sensing and Radio Communications Channel Modelling from Radar Mutual Interference βž– IEEE Xplore
5326 Active Beam Tracking with Reconfigurable Intelligent Surface βž– IEEE Xplore

Applications to Physiological Signals, Audio, and Speech

πŸ†” Title Repo Paper
5872 ClassA Entropy for the Analysis of Structural Complexity of Physiological Signals βž– IEEE Xplore
1034 Unobtrusive Respiratory Monitoring System for Intensive Care βž– IEEE Xplore
4381 Improved WiFi-based Respiration Tracking via Contrast Enhancement βž– IEEE Xplore
4851 Joint Angle and Respiration Estimation for Passive and Device-Free Respiration Monitoring βž– IEEE Xplore
3418 Implementing Continuous HRTF Measurement in Near-Field βž– IEEE Xplore
arXiv
5094 SeliNet: A Lightweight Model for Single Channel Speech Separation βž– IEEE Xplore
5196 Adaptive Time-Scale Modification for Improving Speech Intelligibility based on Phoneme Clustering for Streaming Services βž– IEEE Xplore
3109 Cutting through the Noise: An Empirical Comparison of Psychoacoustic and Envelope-based Features for Machinery Fault Detection βž– IEEE Xplore
arXiv
4835 Cochlear Decomposition: A Novel Bio-Inspired Multiscale Analysis Framework βž– IEEE Xplore
2458 Design and Performance of the Low-Power Noise Reduction Algorithm of the Med-EL Sonnet 2TM Cochlear Implant Audio Processor βž– IEEE Xplore
6491 Modulo EEG Signal Recovery using Transformers βž– IEEE Xplore
454 Knowledge-Graph Augmented Music Representation for Genre Classification βž– IEEE Xplore

Super Resolution

πŸ†” Title Repo Paper
275 PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution GitHub IEEE Xplore
arXiv
326 Raising the Limit of Image Rescaling using Auxiliary Encoding βž– IEEE Xplore
arXiv
1431 Kernel Estimation and Deconvolution for Blind Image Super-Resolution βž– IEEE Xplore
1555 A Comprehensive Comparison of Projections in Omnidirectional Super-Resolution βž– IEEE Xplore
arXiv
1900 Long-Short Attention Network for the Spectral Super-Resolution of Multispectral Images GitHub IEEE Xplore
2363 Multi-Level Fusion for Burst Super-Resolution with Deep Permutation-Invariant Conditioning βž– IEEE Xplore
2684 Frequency Reciprocal Action and Fusion for Single Image Super-Resolution βž– IEEE Xplore
2777 FCIR: Rethink Aerial Image Super Resolution with Fourier Analysis GitHub IEEE Xplore
Pdf
2962 A Content-based Multi-Scale Network for Single Image Super-Resolution βž– IEEE Xplore
3053 Learning to Explain: A Gradient-based Attribution Method for Interpreting Super-Resolution Networks βž– IEEE Xplore
3140 CNN Filter for RPR-based SR in VVC with Wavelet Decomposition βž– IEEE Xplore
3555 Local to Global Prior Learning for Blind Unsupervised Image Super-Resolution βž– IEEE Xplore

Denoising

πŸ†” Title Repo Paper
5974 Rain2Avoid: Self-Supervised Single Image Deraining βž– IEEE Xplore
5479 Aprogressive Image Dehazing Framework with Inter and Intra Contrastive Learning βž– IEEE Xplore
5267 Graph-based Point Cloud Color Denoising with 3-Dimensional Patch-based Similarity βž– IEEE Xplore
2310 CAENet: using Collaborative Attention Transformer and Add-Boost Strategy for Single Image Deraining βž– IEEE Xplore
1791 SFEMGN: Image Denoising with Shallow Feature Enhancement Network and Multi-Scale ConvGRU βž– IEEE Xplore
1554 Affinity Learning with Blind-Spot Self-Supervision for Image Denoising βž– IEEE Xplore
1473 SAR Image Despeckling with Residual-in-Residual Dense Generative Adversarial Network βž– IEEE Xplore
1211 Uncer2Natural: Uncertainty-aware Unsupervised Image Denoising βž– IEEE Xplore
553 HPFTN: Hierarchical Progressive Fusion Transformer Network for Video Denoising βž– IEEE Xplore
398 Subspace Modeling enabled High-Sensitivity X-Ray Chemical Imaging βž– IEEE Xplore
arXiv
274 MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing βž– IEEE Xplore
arXiv
117 Hyperspectral Image Denoising via Nonlocal Rank Residual Modeling GitHub IEEE Xplore

Semantic Segmentation

πŸ†” Title Repo Paper
190 LoG-CAN: Local-Global Class-aware Network for Semantic Segmentation of Remote Sensing Images GitHub IEEE Xplore
arXiv
406 WUDA: Unsupervised Domain Adaptation based on Weak Source Domain Labels GitHub IEEE Xplore
arXiv
555 Class-aware Contextual Information for Semantic Segmentation βž– IEEE Xplore
1132 Semi-Supervised Semantic Segmentation with Structured Output Space Adaption βž– IEEE Xplore
1170 PRRD: Pixel-Region Relation Distillation for Efficient Semantic Segmentation βž– IEEE Xplore
2521 Spatial Correlation Fusion Network for Few-Shot Segmentation βž– IEEE Xplore
3306 Exploring Vision Transformer Layer Choosing for Semantic Segmentation βž– IEEE Xplore
arXiv
3941 Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation βž– IEEE Xplore
6357 Progressive Refinement Learning based on Feature Cross Perception for Residential Areas Semantic Segmentation βž– IEEE Xplore
1599 Lightweight Portrait Segmentation via Edge-optimized Attention GitHub IEEE Xplore
3857 A Dynamic Cross-Scale Transformer with Dual-Compound Representation for 3D Medical Image Segmentation βž– IEEE Xplore
3793 LABANet: Lead-Assisting Backbone Attention Network for Oral Multi-Pathology Segmentation βž– IEEE Xplore

Object Segmentation

πŸ†” Title Repo Paper
3473 Robust Video Object Segmentation with Restricted Attention βž– IEEE Xplore
3501 Stacking-based Attention Temporal Convolutional Network for Action Segmentation βž– IEEE Xplore
2436 VLKP: Video Instance Segmentation with Visual-Linguistic Knowledge Prompts βž– IEEE Xplore
4867 Automatic Error Detection in Integrated Circuits Image Segmentation: A Data-Driven Approach βž– IEEE Xplore
arXiv
3745 TransWnet: Integrating Transformers Into CNNs via Row and Column Attention for Abdominal Multi-Organ Segmentation βž– IEEE Xplore
5844 Active Perception System for Enhanced Visual Signal Recovery using Deep Reinforcement Learning βž– IEEE Xplore
302 OAFormer: Learning Occlusion Distinguishable Feature for Amodal Instance Segmentation βž– IEEE Xplore
698 Encoder-Decoder Graph Convolutional Network for Automatic Timed-Up-and-Go and Sit-to-Stand Segmentation βž– IEEE Xplore
758 Meta++ Network for Few-Shot Aerospace Crack Segmentation βž– IEEE Xplore
1764 IAST: Instance Association Relying on Spatio-Temporal Features for Video Instance Segmentation GitHub IEEE Xplore
2469 Continual Cell Instance Segmentation of Microscopy Images βž– IEEE Xplore

Deep Learning for Image and Video Processing

πŸ†” Title Repo Paper
5397 Spammer Detection on Short Video Applications: A New Challenge and Baselines βž– IEEE Xplore
814 Weakly- and Semi-Supervised Object Localization βž– IEEE Xplore
2503 Balanced Mixup Loss for Long-Tailed Visual Recognition βž– IEEE Xplore
4130 On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks βž– IEEE Xplore
arXiv
2813 Invariant Adversarial Imitation Learning from Visual Inputs βž– IEEE Xplore
6423 SPECTRANET-SO(3): Learning Satellite Orientation from Optical Spectra by Implicitly Modeling Mutually Exclusive Probability Distributions on the Rotation Manifold βž– IEEE Xplore
3097 Structured-Anchor Projected Clustering for Hyperspectral Images βž– IEEE Xplore
140 Learning Sparse Auto-Encoders for Green AI Image Coding βž– IEEE Xplore
arXiv
643 Learning to Generate 3D Representations of Building Roofs using Single-View Aerial Imagery βž– IEEE Xplore
arXiv
4843 Robust Monocular Localization of Drones by Adapting Domain Maps to Depth Prediction Inaccuracies βž– IEEE Xplore
arXiv
5940 Large Dimensional Analysis of LS-SVM Transfer Learning: Application to PolSAR Classification βž– IEEE Xplore
Pdf
5062 SMUG: Towards Robust MRI Reconstruction by Smoothed Unrolling GitHub IEEE Xplore
arXiv

Graph based Learning

πŸ†” Title Repo Paper
715 Graph-Graph Context Dependency Attention for Graph Edit Distance βž– IEEE Xplore
3882 Topology Uncertainty Modeling for Imbalanced Node Classification on Graphs βž– IEEE Xplore
589 CPD-GAN: Cascaded Pyramid Deformation GAN for Pose Transfer βž– IEEE Xplore
5321 Space-Time Graph Neural Networks with Stochastic Graph Perturbations βž– IEEE Xplore
arXiv
6793 Untrained Graph Neural Networks for Denoising βž– IEEE Xplore
arXiv
5846 Learning on Graphs under Label Noise βž– IEEE Xplore
arXiv
2906 Select the Best: Enhancing Graph Representation with Adaptive Negative Sample Selection βž– IEEE Xplore
2586 Learning with Multigraph Convolutional Filters βž– IEEE Xplore
arXiv
2164 Self-Supervised Guided Hypergraph Feature Propagation for Semi-Supervised Classification with Missing Node Features βž– IEEE Xplore
arXiv
3752 Incorporating Reliability in Graph Information Propagation by Fluid Dynamics Diffusion: a Case of Multimodal Semi-Supervised Deep Learning βž– IEEE Xplore
5159 GraphMAD: Graph Mixup for Data Augmentation using Data-Driven Convex Clustering GitHub IEEE Xplore
arXiv
3724 Time-Varying Signals Recovery via Graph Neural Networks βž– IEEE Xplore
arXiv

Learning from Multimodal Data

πŸ†” Title Repo Paper
3546 Multimodal Knowledge Distillation for Arbitrary-Oriented Object Detection in Aerial Images βž– IEEE Xplore
1234 Hierarchical Spatial-Temporal Transformer with Motion Trajectory for Individual Action and Group Activity Recognition βž– IEEE Xplore
693 Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-Linked Inputs βž– IEEE Xplore
arXiv
1571 Towards Robust Audio-based Vehicle Detection via Importance-Aware Audio-Visual Learning βž– IEEE Xplore
841 Hierarchical Multi-Task Learning for Fabric Component Analysis Based on NIR Spectral Signals βž– IEEE Xplore
1706 Cross Modality Knowledge Distillation for Robust Pedestrian Detection in Low Light and Adverse Weather Conditions βž– IEEE Xplore
6375 Data Leakage in Cross-Modal Retrieval Training: A Case Study βž– IEEE Xplore
arXiv
5825 Difficulty-Aware Data Augmentor for Scene Text Recognition βž– IEEE Xplore
461 TinyOOD: Effective Out-of-Distribution Detection for TinyML βž– IEEE Xplore
4211 A Principled Approach to Model Validation in Domain Generalization GitHub IEEE Xplore
arXiv
4220 Scale-Adaptive Tiny Object Detection Enhanced by Across-Scale and Shape-Preserved Semantic Location βž– IEEE Xplore
3735 Audio-Visual Inpainting: Reconstructing Missing Visual Information with Sound βž– IEEE Xplore

Matrix/Tensor Factorization and Completion

πŸ†” Title Repo Paper
507 Learn Topological Representation with Flexible Manifold Layer GitHub IEEE Xplore
1438 Tensorized LSSVMs for Multitask Regression βž– IEEE Xplore
arXiv
3571 A Bayesian Perspective for Determinant Minimization based Robust Structured Matrix Factorization βž– IEEE Xplore
arXiv
5045 Volume-Regularized Nonnegative Tucker Decomposition with Identifiability Guarantees βž– IEEE Xplore
687 Transductive Matrix Completion with Calibration for Multi-Task Learning βž– IEEE Xplore
arXiv
1668 Projected Hierarchical ALS for Generalized Boolean Matrix Factorization βž– IEEE Xplore
2934 Robust Binary Component Decompositions βž– IEEE Xplore
3897 Multi-Resolution Convolutional Dictionary Learning for Riverbed Dynamics Modeling βž– IEEE Xplore
2388 PARAFAC2-based Coupled Matrix and Tensor Factorizations GitHub IEEE Xplore
ResearchGate
arXiv
6088 Deep Plug-and-Play for Tensor Robust Principal Component Analysis βž– IEEE Xplore
6125 Geometric Matrix Completion with Collaborative Routing between Capsules βž– IEEE Xplore
3256 Enrollment Rate Prediction in Clinical Trials based on CDF Sketching and Tensor Factorization Tools βž– IEEE Xplore

ASR - Improve Latency, Efficiency, and Accuracy

πŸ†” Title Repo Paper
900 Multi-blank Transducers for Speech Recognition GitHub IEEE Xplore
arXiv
1642 Diagonal State Space Augmented Transformers for Speech Recognition βž– IEEE Xplore
arXiv
1661 TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty βž– IEEE Xplore
arXiv
3385 Towards Accurate and Real-Time End-of-Speech Estimation βž– IEEE Xplore
Amazon Science
3999 Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization βž– IEEE Xplore
arXiv
4330 Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding GitHub IEEE Xplore
arXiv
5058 Powerful and Extensible WFST Framework for RNN-Transducer Losses βž– IEEE Xplore
arXiv
5337 Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation βž– IEEE Xplore
arXiv
5434 Improving Non-Autoregressive Speech Recognition with Autoregressive Pretraining βž– IEEE Xplore
5558 Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture βž– IEEE Xplore
arXiv
5607 Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition βž– IEEE Xplore
arXiv
5824 Fast and Parallel Decoding for Transducer GitHub IEEE Xplore
arXiv

ASR: Domain Adaptation and Robust Training

πŸ†” Title Repo Paper
505 SAN: A Robust End-to-End ASR Model Architecture βž– IEEE Xplore
arXiv
1604 Explanations for Automatic Speech Recognition βž– IEEE Xplore
arXiv
1674 On-the-Fly Text Retrieval for End-to-End ASR Adaptation βž– IEEE Xplore
Amazon Science
arXiv
2397 Unsupervised Model-based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition βž– IEEE Xplore
arXiv
3258 Domain Adaptation with External Off-Policy Acoustic Catalogs for Scalable Contextual End-To-End Automated Speech Recognition βž– IEEE Xplore
Amazon Science
3600 Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR βž– IEEE Xplore
arXiv
3973 WeavSpeech: Data Augmentation Strategy for Automatic Speech Recognition via Semantic-aware Weaving βž– IEEE Xplore
4139 Joint Discriminator and Transfer based Fast Domain Adaptation for End-to-End Speech Recognition βž– IEEE Xplore
5424 Improving Fairness and Robustness in End-to-End Speech Recognition Through Unsupervised Clustering βž– IEEE Xplore
arXiv
5491 Improving Fast-Slow Encoder based Transducer with Streaming Deliberation βž– IEEE Xplore
arXiv
5496 Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy βž– IEEE Xplore
arXiv
5902 Improving Accented Speech Recognition with Multi-Domain Training βž– IEEE Xplore
arXiv

ASR: New Models

πŸ†” Title Repo Paper
179 UCONV-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition GitHub Code IEEE Xplore
arXiv
876 A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale βž– IEEE Xplore
arXiv
1356 Improving Contextual Biasing with Text Injection βž– IEEE Xplore
1655 Structured State Space Decoder for Speech Recognition and Synthesis βž– IEEE Xplore
arXiv
3365 JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition βž– IEEE Xplore
arXiv
3368 Variable Attention Masking for Configurable Transformer Transducer Speech Recognition βž– IEEE Xplore
arXiv
3499 Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers βž– IEEE Xplore
arXiv
3926 Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames βž– IEEE Xplore
arXiv
4365 Understanding Shared Speech-Text Representations βž– IEEE Xplore
arXiv
4534 Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition βž– IEEE Xplore
arXiv
2237 Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR βž– IEEE Xplore
arXiv
5384 Modular Conformer Training for Flexible End-to-End ASR βž– IEEE Xplore

ASR: Noise Robustness

πŸ†” Title Repo Paper
1897 On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems GitHub IEEE Xplore
arXiv
1919 Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition GitHub IEEE Xplore
arXiv
1929 MADI: Inter-Domain Matching and Intra-Domain Discrimination for Cross-Domain Speech Recognition βž– IEEE Xplore
arXiv
1971 Robust Data2vec: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning βž– IEEE Xplore
arXiv
2040 Robust Audio-Visual ASR with Unified Cross-Modal Attention βž– IEEE Xplore
3292 HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit BERT for Robust Speech Recognition βž– IEEE Xplore
4124 Speech and Noise Dual-Stream Spectrogram Refine Network with Speech Distortion Loss for Robust Speech Recognition GitHub IEEE Xplore
arXiv
4680 RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness βž– IEEE Xplore
arXiv
5455 Cleanformer: A Multichannel Array Configuration-Invariant Neural Enhancement Frontend for ASR in Smart Speakers βž– IEEE Xplore
arXiv
5504 On the Effectiveness of Monoaural Target Source Extraction for Distant End-to-End Automatic Speech Recognition βž– IEEE Xplore
6389 Noise-aware Target Extension with Self-Distillation for Robust Speech Recognition βž– IEEE Xplore

Audio Signal Restoration and Editing

πŸ†” Title Repo Paper
5003 AERO: Audio Super Resolution in the Spectral Domain WEB Page
GitHub
IEEE Xplore
arXiv
1768 UPGLADE: Unplugged Plug-and-Play Audio Declipper based on Consensus Equilibrium of DNN and Sparse Optimization βž– IEEE Xplore
Pdf
2121 Improving Performance of Real-Time Full-Band Blind Packet-Loss Concealment with Predictive Network GitHub IEEE Xplore
arXiv
4388 Faster than Fast: Accelerating the Griffin-Lim Algorithm βž– IEEE Xplore
arXiv
3726 Improving Phase-Vocoder-based Time Stretching by Time-Directional Spectrogram Squeezing GitHub Page IEEE Xplore
Pdf
6288 Extreme Audio Time Stretching using Neural Synthesis βž– IEEE Xplore
arXiv

Epilepsy Detection Grand Challenge

πŸ†” Title Repo Paper
7015 Lightweight Machine Learning for Seizure Detection on Wearable Devices βž– IEEE Xplore
Pdf
7021 Pretrained Transformers for Seizure Detection βž– IEEE Xplore
7022 Towards Interpretable Seizure Detection using Wearables βž– IEEE Xplore
7033 Optimization of the Deep Neural Networks for Seizure Detection βž– IEEE Xplore

Deep Learning Theory

πŸ†” Title Repo Paper
2465 MSFormer: Multi-Scale Transformer with Neighborhood Consensus for Feature Matching βž– IEEE Xplore
3498 Decoupled Visual Causality for Robust Detection βž– IEEE Xplore
2500 Semantics-Disentangled Contrastive Embedding for Generalized Zero-Shot Learning βž– IEEE Xplore
4730 Dynamic Scalable Self-Attention Ensemble for Task-Free Continual Learning βž– IEEE Xplore
2125 Ultimate Negative Sampling for Contrastive Learning βž– IEEE Xplore
3936 An Application of Quantum Mechanics to Attention Methods in Computer Vision βž– IEEE Xplore

Neural Architecture Search

πŸ†” Title Repo Paper
3492 Search for Efficient Deep Visual-Inertial Odometry Through Neural Architecture Search GitHub IEEE Xplore
4072 Receptive Field Reliant Zero-Cost Proxies for Neural Architecture Search βž– IEEE Xplore
4346 ZO-DARTS: Differentiable Architecture Search with Zeroth-Order Approximation βž– IEEE Xplore
2675 Performing Neural Architecture Search without Gradients GitHub IEEE Xplore
796 Neural Architecture of Speech βž– IEEE Xplore
1461 BHE-DARTS: Bilevel Optimization based on Hypergradient Estimation for Differentiable Architecture Search βž– IEEE Xplore

Expressive and Controllable TTS

πŸ†” Title Repo Paper
2625 Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts GitHub Page IEEE Xplore
arXiv
4768 Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis GitHub Page IEEE Xplore
arXiv
4776 Ensemble Prosody Prediction for Expressive Speech Synthesis WEB Page IEEE Xplore
arXiv
5782 Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features GitHub Page IEEE Xplore
arXiv
5970 High-Acoustic Fidelity Text to Speech Synthesis with Fine-Grained Control of Speech Attributes βž– IEEE Xplore
6203 Embedding a Differentiable Mel-Cepstral Synthesis Filter to a Neural Speech Synthesis System GitHub IEEE Xplore
arXiv

Keyword Spotting

πŸ†” Title Repo Paper
1848 Disentangled Training with Adversarial Examples for Robust Small-Footprint Keyword Spotting βž– IEEE Xplore
Facebook
Pdf
3578 Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition βž– IEEE Xplore
arXiv
5025 Fixed-Point Quantization Aware Training for On-Device Keyword-Spotting βž– IEEE Xplore
arXiv
5106 To Wake-Up or Not to Wake-Up: Reducing Keyword False Alarm by Successive Refinement βž– IEEE Xplore
arXiv
5584 Transcription Free Filler Word Detection with Neural Semi-CRFs GitHub IEEE Xplore
arXiv
6078 The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis GitHub IEEE Xplore
arXiv

Detection and Classification

πŸ†” Title Repo Paper
657 Passive Detection of Rank-One Gaussian Signals for Known Channel Subspaces and Arbitrary Noise βž– IEEE Xplore
Pdf
2389 False Alarm Regulation for Off-Grid Target Detection with the Matched Filter βž– IEEE Xplore
2536 Data-Driven Quickest Change Detection in Markov Models βž– IEEE Xplore
arXiv
3510 Quickest Change Detection with Leave-one-Out Density Estimation βž– IEEE Xplore
arXiv
4778 Identifying Coordination in a Cognitive Radar Network - A Multi-Objective Inverse Reinforcement Learning Approach IEEE Xplore
arXiv
4815 Improved Small Sample Hypothesis Testing using the Uncertain Likelihood Ratio βž– IEEE Xplore

Advances in Signal Processing and Machine Learning for Non-Intrusive Load Monitoring

πŸ†” Title Repo Paper
2170 A Wavelet Scattering Approach for Load Identification with Limited Amount of Training Data βž– IEEE Xplore
2653 Applying Symmetrical Component Transform for Industrial Appliance Classification in Non-Intrusive Load Monitoring βž– IEEE Xplore
Pdf
3326 ContiNILM: A Continual Learning Scheme for Non-Intrusive Load Monitoring βž– IEEE Xplore
5853 Improving Knowledge Distillation for Non-Intrusive Load Monitoring through Explainability Guided Learning βž– IEEE Xplore
Pdf
6414 Improved Appliance Transient Feature Extraction via Template Matching βž– IEEE Xplore

Machine Learning Applications

πŸ†” Title Repo Paper
6355 Causal Discovery and Causal Inference based Counterfactual Fairness in Machine Learning βž– IEEE Xplore
4965 Benchmarking Convolutional Neural Network Inference on Low-Power Edge Devices βž– IEEE Xplore
1115 Code-Enhanced Fine-Grained Semantic Matching for Tag Recommendation in Software Information Sites βž– IEEE Xplore
394 Robust Dominant Periodicity Detection for Time Series with Missing Data βž– IEEE Xplore
arXiv
3994 Dynamic Split Computing for Efficient Deep Edge Intelligence WEB Page IEEE Xplore
arXiv
5723 Dense Adversarial Transfer Learning based on Class-Invariance βž– IEEE Xplore
4620 VAN-ICP: GPU-Accelerated Approximate Nearest Neighbor Search for ICP Registration via Voxel Dilation GitHub IEEE Xplore
5776 Clustering-based Supervised Contrastive Learning for Identifying Risk Items on Heterogeneous Graph βž– IEEE Xplore
4052 Multiresolution Signal Processing of Financial Market Objects βž– IEEE Xplore
arXiv
1752 Hierarchical Multi-Agent Reinforcement Learning with Intrinsic Reward Rectification βž– IEEE Xplore
3493 An Antispoofing Approach in Biometric Authentication System for a Smartcard βž– IEEE Xplore
3576 Unsupervised Domain Adaptation via Subspace Interpolating Deep Dictionary Learning: A Case Study in Machine Inspection βž– IEEE Xplore

Classification

πŸ†” Title Repo Paper
283 Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification βž– IEEE Xplore
1056 Hierarchical Transformer for Multi-Label Trailer Genre Classification βž– IEEE Xplore
1236 S3I-PointHop: SO(3)-Invariant PointHop for 3D Point Cloud Classification βž– IEEE Xplore
arXiv
1302 Sample-Aware Knowledge Distillation for Long-Tailed Learning βž– IEEE Xplore
1562 Laryngeal Leukoplakia Classification via Dense Multiscale Feature Extraction in White Light Endoscopy Images βž– IEEE Xplore
1904 Long-Tailed Recognition with Causal Invariant Transformation βž– IEEE Xplore
2199 STACKMAPS: A Visualization Technique for Diabetic Retinopathy Grading βž– IEEE Xplore
ResearchGate
2904 Gender-Cartoon: Image Cartoonization Method based on Gender Classification βž– IEEE Xplore
3167 Extracting the Brain-Like Representation by an Improved Self-Organizing Map for Image Classification GitHub IEEE Xplore
arXiv
3888 DDN: Dynamic Aggregation Enhanced Dual-Stream Network for Medical Image Classification βž– IEEE Xplore
4696 LGViT: Local-Global Vision Transformer for Breast Cancer Histopathological Image Classification βž– IEEE Xplore
5583 Learning a Weight Map for Weakly-Supervised Localization βž– IEEE Xplore

Human Posture Estimation

πŸ†” Title Repo Paper
301 Interweaved Graph and Attention Network for 3D Human Pose Estimation GitHub IEEE Xplore
arXiv
3696 Learning 3D Human Pose and Shape Estimation using Uncertainty-Aware Body Part Segmentation βž– IEEE Xplore
3841 Monocular 3D Human Pose Estimation based on Global Temporal-Attentive and Joints-Attention in Video GitHub IEEE Xplore
4380 EVOPOSE: A Recursive Transformer for 3D Human Pose Estimation with Kinematic Structure Priors βž– IEEE Xplore
arXiv
142 HTNet: Human Topology Aware Network for 3D Human Pose Estimation GitHub IEEE Xplore
arXiv
1107 Improving Occluded Human Pose Estimation via Linked Joints βž– IEEE Xplore
5121 Efficient and Effective Multi-Camera Pose Estimation with Weighted M-Estimate Sample Consensus βž– IEEE Xplore
5668 AMPose: Alternately Mixed Global-Local Attention Model for 3D Human Pose Estimation GitHub IEEE Xplore
arXiv
5750 FlowPose: Conditional Normalizing Flows for 3D Human Pose and Shape Estimation from Monocular Videos βž– IEEE Xplore
6050 Animal Re-Identification Algorithm for Posture Diversity GitHub IEEE Xplore
6322 Retrieval-based Natural 3D Human Motion Generation βž– IEEE Xplore
2453 Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-Temporal Masked Transformers βž– IEEE Xplore
arXiv

Human Reconstruction

πŸ†” Title Repo Paper
4237 Time-Frequency Awareness Network for Human Mesh Recovery from Videos GitHub IEEE Xplore
2028 Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion Model βž– IEEE Xplore
arXiv
4667 GATOR: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose GitHub IEEE Xplore
arXiv
5538 Real-Time Human Reconstruction based on Human Pose Prior and Epipolar Refinement βž– IEEE Xplore
642 Efficient Feature Fusion for Learning-based Photometric Stereo βž– IEEE Xplore
2442 Volumetric 3D Reconstruction with Window-Wise Global Feature Aggregation βž– IEEE Xplore
4008 Stereoscopic Video Retargeting based on Camera Motion Classification βž– IEEE Xplore
4893 Detail-Aware Uncalibrated Photometric Stereo βž– IEEE Xplore
Pdf
5712 SDRNet: Shape Decoupled Regression Network for 3D Face Reconstruction βž– IEEE Xplore
1119 Binary Image Fast Perfect Recovery from Sparse 2D-DFT Coefficients βž– IEEE Xplore
1175 HQP-MVS: High-Quality Plane Priors Assisted Multi-View Stereo for Low-Textured Areas βž– IEEE Xplore
3183 Dynamic Multi-View Scene Reconstruction using Neural Implicit Surface βž– IEEE Xplore
arXiv

Face Recognition

πŸ†” Title Repo Paper
3959 LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition βž– IEEE Xplore
arXiv
4254 Quaternion Orthogonal Transformer for Facial Expression Recognition in the Wild GitHub IEEE Xplore
arXiv
3490 Privacy Preserving Face Recognition with Lensless Camera βž– IEEE Xplore
3649 MaskDUL: Data Uncertainty Learning in Masked Face Recognition GitHub IEEE Xplore
4814 Cov Loss: Covariance-based Loss for Deep Face Recognition βž– IEEE Xplore
5674 Boosting Face Recognition Performance with Synthetic Data and Limited Real Data βž– IEEE Xplore
2762 A Dual-Branch Adaptive Distribution Fusion Framework for Real-World Facial Expression Recognition GitHub IEEE Xplore
4199 Efficient Practices for Profile-to-Frontal Face Synthesis and Recognition βž– IEEE Xplore
4208 Learning Causal Representations for Generalizable Face Anti-Spoofing βž– IEEE Xplore
2767 Self-Paced Partial Domain-aware Learning for Face Anti-Spoofing βž– IEEE Xplore
746 Context-aware Face Clustering with Graph Convolutional Networks βž– IEEE Xplore

Source Separation, ICA, and Sparsity

πŸ†” Title Repo Paper
193 A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments βž– IEEE Xplore
arXiv
524 On the Minimum Perimeter Criterion for Bounded Component Analysis βž– IEEE Xplore
4129 Joint Unmixing and Demosaicing Methods for Snapshot Spectral Images βž– IEEE Xplore
5036 Identifiable Bounded Component Analysis via Minimum Volume Enclosing Parallelotope βž– IEEE Xplore
5587 Balanced Deep CCA for Bird Vocalization Detection GitHub IEEE Xplore
arXiv
1692 Independent Vector Analysis with Multivariate Gaussian Model: A Scalable Method by Multilinear Regression βž– IEEE Xplore
3184 Activity-Informed Industrial Audio Anomaly Detection via Source Separation βž– IEEE Xplore
6717 Double Nonstationarity: Blind Extraction of Independent Nonstationary Vector/Component from Nonstationary Mixtures - Algorithms βž– IEEE Xplore
arXiv
6798 Towards Flexible Sparsity-Aware Modeling: Automatic Tensor Rank Learning using the Generalized Hyperbolic Prior βž– IEEE Xplore
arXiv
5426 MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation WEB Page
GitHub
IEEE Xplore
arXiv
674 Hybrid Transformers for Music Source Separation GitHub IEEE Xplore
arXiv
5141 Dictionary Learning on Graph Data with Weisfieler-Lehman Sub-Tree Kernel and KSVD βž– IEEE Xplore

Neural Sound Synthesis and Representation

πŸ†” Title Repo Paper
2678 GANStrument: Adversarial Instrument Sound Synthesis with Pitch-Invariant Instance Conditioning GitHub Page
GitHub
IEEE Xplore
arXiv
2555 I Hear Your True Colors: Image Guided Audio Generation WEB Page
GitHub
IEEE Xplore
arXiv
1261 Grad-StyleSpeech: Any-Speaker Adaptive Text-to-Speech Synthesis with Diffusion Models GitHub Page IEEE Xplore
arXiv
3085 Voice Conversion using Feature Specific Loss Function based Self-Attentive Generative Adversarial Network GitHub IEEE Xplore
1268 TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion GitHub Page
GitHub
IEEE Xplore
arXiv
6748 Decorrelating Feature Spaces for Learning General-Purpose Audio Representations GitHub
GitHub
IEEE Xplore
4904 Continuous Descriptor-based Control for Deep Audio Synthesis GitHub Page
GitHub
IEEE Xplore
arXiv
5786 Rigid-Body Sound Synthesis with Differentiable Modal Resonators GitHub Page
GitHub
IEEE Xplore
arXiv
5349 Exploring Approaches to Multi-Task Automatic Synthesizer Programming βž– IEEE Xplore
6710 Speech Time-Scale Modification with GANs βž– IEEE Xplore
4339 Full-Band General Audio Synthesis with Score-based Diffusion GitHub Page IEEE Xplore
arXiv
4443 Is Quality EnoughΖ’ Integrating Energy Consumption in a Large-Scale Evaluation of Neural Audio Synthesis Models βž– IEEE Xplore

Deep Learning for Audio and Music Applications

πŸ†” Title Repo Paper
896 Controllable Music Inpainting with Mixed-Level and Disentangled Representation GitHub IEEE Xplore
1991 HIPI: A Hierarchical Performer Identification Model based on Symbolic Representation of Music βž– IEEE Xplore
207 Chord-Conditioned Melody Harmonization with Controllable Harmonicity GitHub IEEE Xplore
arXiv
1878 Jazznet: A Dataset of Fundamental Piano Patterns for Music Audio Machine Learning Research GitHub IEEE Xplore
arXiv
5273 Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects GitHub Page
GitHub
IEEE Xplore
arXiv
1442 An Improved Optimal Transport Kernel Embedding Method with Gating Mechanism for Singing Voice Separation and Speaker Identification βž– IEEE Xplore
3448 Tempo vs. Pitch: Understanding Self-Supervised Tempo Estimation GitHub IEEE Xplore
arXiv
1995 Adversarial Permutation Invariant Training for Universal Sound Separation WEB Page IEEE Xplore
arXiv
1379 Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining βž– IEEE Xplore
arXiv
4727 Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming GitHub IEEE Xplore
arXiv
1375 SPADE: Self-Supervised Pretraining for Acoustic Disentanglement βž– IEEE Xplore
arXiv
1615 On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors GitHub Page
GitHub
IEEE Xplore
arXiv

Machine Learning for Image and Video Processing

πŸ†” Title Repo Paper
1011 IoU-Aware Multi-Expert Cascade Network via Dynamic Ensemble for Long-Tailed Object Detection βž– IEEE Xplore
Pdf
1622 Efficient Compressed Video Action Recognition via Late Fusion with a Single Network βž– IEEE Xplore
1649 Amicable Aid: Perturbing Images to Improve Classification Performance βž– IEEE Xplore
arXiv
3861 Spatial Cross-Attention for Transformer-based Image Captioning βž– IEEE Xplore
Pdf
3879 Towards Hyperbolic Regularizers for Point Cloud Part Segmentation βž– IEEE Xplore
5265 Clip4VideoCap: Rethinking CLIP for Video Captioning with Multiscale Temporal Fusion and Commonsense Knowledge βž– IEEE Xplore
6356 Learning Silhouettes with Group Sparse Autoencoders GitHub IEEE Xplore
Pdf
5042 Deep Learning for Lagrangian Drift Simulation at The Sea Surface GitHub IEEE Xplore
arXiv
2382 Difference Guided VHR Remote Sensing Image Change Detection βž– IEEE Xplore
2696 Adaptive Submanifold-Preserving Sparse Regression for Feature Selection and Multiclass Classification βž– IEEE Xplore
6814 Learning Multiscale Convolutional Dictionaries for Image Reconstruction GitHub Page
GitHub
IEEE Xplore
arXiv
7162 Impact of PolSAR Pre-Processing and Balancing Methods on Complex-Valued Neural Networks Segmentation Tasks βž– IEEE Xplore
arXiv
HAL Science

ASR: Text Adaptation

πŸ†” Title Repo Paper
209 Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation βž– IEEE Xplore
arXiv
1007 AdapITN: A Fast, Reliable, and Dynamic Adaptive Inverse Text Normalization GitHub
Hugging Face
IEEE Xplore
ResearchGate
1373 Fast and Accurate Factorized Neural Transducer for Text Adaption of End-to-End Speech Recognition Models βž– IEEE Xplore
arXiv
1628 Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data βž– IEEE Xplore
1672 Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis βž– IEEE Xplore
arXiv
2409 Slot-triggered Contextual Biasing for Personalized Speech Recognition using Neural Transducers βž– IEEE Xplore
Pdf
3355 Fine-grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding βž– IEEE Xplore
4612 Gated Contextual Adapters for Selective Contextual Biasing in Neural Transducers βž– IEEE Xplore
Amazon Science
4830 Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation βž– IEEE Xplore
arXiv
4970 Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax βž– IEEE Xplore
arXiv
5596 Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation βž– IEEE Xplore
arXiv
6116 Factorized AED: Factorized Attention-based Encoder-Decoder for Text-Only Domain Adaptive ASR βž– IEEE Xplore

ASR: Training Methods

πŸ†” Title Repo Paper
3731 Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition βž– IEEE Xplore
arXiv
112 Reducing the GAP Between Streaming and Non-Streaming Transducer-based ASR by Adaptive Two-Stage Knowledge Distillation βž– IEEE Xplore
arXiv
164 Alignment Entropy Regularization βž– IEEE Xplore
arXiv
392 From English to more Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition βž– IEEE Xplore
arXiv
1499 Neural Transducer Training: Reduced Memory Consumption with Sample-Wise Computation βž– IEEE Xplore
arXiv
2433 Towards Domain Generalisation in ASR with Elitist Sampling and Ensemble Knowledge Distillation βž– IEEE Xplore
arXiv
2677 Accelerating RNN-T Training and Inference using CTC Guidance βž– IEEE Xplore
arXiv
3382 Resource-Efficient Transfer Learning from Speech Foundation Model using Hierarchical Feature Fusion βž– IEEE Xplore
arXiv
3917 Robust Knowledge Distillation from RNN-T Models with Noisy Training Labels using Full-Sum Loss βž– IEEE Xplore
arXiv
5520 More Speaking or more Speakers? βž– IEEE Xplore
arXiv
5845 Federated Learning for ASR based on Wav2Vec 2.0 βž– IEEE Xplore
arXiv
6343 Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models βž– IEEE Xplore

ASR: VAD and Other Topics

πŸ†” Title Repo Paper
691 Real-Time Speech Interruption Analysis: from Cloud to Client Deployment βž– IEEE Xplore
arXiv
2005 Audio-to-Intent using Acoustic-Textual Subword Representations from End-to-End ASR βž– IEEE Xplore
arXiv
2615 Adaptive Endpointing with Deep Contextual Multi-Armed Bandits βž– IEEE Xplore
arXiv
2616 Dynamic Speech Endpoint Detection with Regression Targets βž– IEEE Xplore
arXiv
2665 Speaker Change Detection for Transformer Transducer ASR βž– IEEE Xplore
arXiv
4769 Less is more: A Unified Architecture for Device-Directed Speech Detection with Multiple Invocation Types βž– IEEE Xplore
4865 SG-VAD: Stochastic Gates based Speech Activity Detection GitHub IEEE Xplore
arXiv
5523 Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss βž– IEEE Xplore
arXiv
5787 Unsupervised Voice Type Discrimination Score Adaptation using X-Vector Clusters βž– IEEE Xplore
6269 Multilingual Word Error Rate Estimation: E-Wer3 βž– IEEE Xplore
arXiv
5792 Multilingual Query-by-Example Keyword Spotting with Metric Learning and Phoneme-to-Embedding Mapping βž– IEEE Xplore
arXiv
7177 Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise βž– IEEE Xplore
arXiv
836 Keyword-Specific Acoustic Model Pruning for Open Vocabulary Keyword Spotting βž– IEEE Xplore
5030 Self-Supervised Speech Representation Learning for Keyword-Spotting with Light-Weight Transformers βž– IEEE Xplore
arXiv
5579 Lightweight Feature Encoder for Wake-Up Word Detection based on Self-Supervised Speech Representation βž– IEEE Xplore
arXiv
5649 VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting βž– arXiv
1378 Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers GitHub IEEE Xplore
arXiv
1518 Continual Learning for On-Device Speech Recognition using Disentangled Conformers βž– IEEE Xplore
arXiv
1986 Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting βž– IEEE Xplore
arXiv
3390 Locale Encoding for Scalable Multilingual Keyword Spotting Models βž– IEEE Xplore
arXiv
3531 Small-Footprint Slimmable Networks for Keyword Spotting βž– IEEE Xplore
arXiv
3615 Metric Learning for User-Defined Keyword Spotting WEB Page
GitHub
IEEE Xplore
arXiv
3928 WeKws: A Production First Small-Footprint End-to-End Keyword Spotting Toolkit GitHub IEEE Xplore
arXiv
4822 Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting βž– IEEE Xplore
arXiv

Automatic Audio Captioning and Retrieval

πŸ†” Title Repo Paper
662 A Novel Metric for Evaluating Audio Caption Similarity βž– IEEE Xplore
arXiv
5376 On Negative Sampling for Contrastive Audio-Text Retrieval βž– IEEE Xplore
arXiv
2001 Audio-Text Models do not yet Leverage Natural Language βž– IEEE Xplore
arXiv
4981 Improving Audio Captioning using Semantic Similarity Metrics βž– IEEE Xplore
arXiv
4900 SPICE+: Evaluation of Automatic Audio Captioning Systems with Pre-trained Language Models βž– IEEE Xplore
HAL Science
6766 Local Information Assisted Attention-Free Decoder for Audio Captioning GitHub IEEE Xplore
arXiv

Auditory EEG Decoding Challenge

πŸ†” Title Repo Paper
6832 HappyQuokka System for ICASSP 2023 Auditory EEG Challenge GitHub IEEE Xplore
arXiv
6855 Relate Auditory Speech to EEG by Shallow-Deep Attention-based Network βž– IEEE Xplore
arXiv
6859 Multi-Head Attention and GRU for Improved Match-Mismatch Classification of Speech Stimulus and EEG Response βž– IEEE Xplore
6861 Relating EEG Recordings to Speech using Envelope Tracking and the Speech-FFR βž– IEEE Xplore
arXiv
6882 Decoding Auditory EEG Responses using an Adapted WaveNet βž– IEEE Xplore

Image Restoration

πŸ†” Title Repo Paper
564 MRNet: Multi-Refinement Network for Dual-Pixel Images Defocus Deblurring βž– IEEE Xplore
5802 Joint Compression and Demosaicking For Satellite Images βž– IEEE Xplore
HAL Science
1157 Decontamination Transformer for Blind Image Inpainting GitHub Page
GitHub
IEEE Xplore
Pdf
658 Exploration into Translation-Equivariant Image Quantization βž– IEEE Xplore
arXiv
2562 Tensor Decomposition based Latent Feature Clustering for Hyperspectral Band Selection βž– IEEE Xplore

Interpretable and Explainable Machine Learning

Will soon be added

Language Modeling

Will soon be added

Language Modeling and Spoken Language Understanding

Will soon be added

Estimation Theory and Methods

Will soon be added

AI Security and Privacy in Speech and Audio Processing

πŸ†” Title Repo Paper
673 Privacy-Enhanced Federated Learning Against Attribute Inference Attack for Speech Emotion Recognition βž– IEEE Xplore
2009 Privacy-Preserving Occupancy Estimation βž– IEEE Xplore
3761 Federated Intelligent Terminals Facilitate Stuttering Monitoring GitHub IEEE Xplore
ResearchGate
4942 Beyond Neural-on-Neural Approaches to Speaker Gender Protection GitHub IEEE Xplore
arXiv
6129 Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling βž– IEEE Xplore
arXiv

Binaural Audio; Multichannel Source Separation

πŸ†” Title Repo Paper
1755 Spatially Informed Independent Vector Analysis for Source Extraction based on the Convolutive Transfer Function Model βž– IEEE Xplore
2514 Fast Online Source Steering Algorithm for Tracking Single Moving Source using Online Independent Vector Analysis βž– IEEE Xplore
Pdf
4589 Online Binaural Speech Separation of Moving Speakers with a Wavesplit Network βž– IEEE Xplore
arXiv
5759 Convolutive NTF for Ambisonic Source Separation under Reverberant Conditions βž– IEEE Xplore
4677 On the Relevance of the Differences between HRTF Measurement Setups for Machine Learning βž– IEEE Xplore
arXiv
6362 Neural Fourier Shift for Binaural Speech Rendering WEB Page
GitHub
IEEE Xplore
arXiv
1620 Global HRTF Interpolation via Learned Affine Transformation of Hyper-Conditioned Features WEB Page
GitHub
IEEE Xplore
arXiv
4790 HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields GitHub IEEE Xplore
arXiv
5041 Learning to Personalize Equalization for High-Fidelity Spatial Audio Reproduction βž– IEEE Xplore
6719 A Data-Driven Approach to Audio Decorrelation βž– IEEE Xplore
6777 Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms βž– IEEE Xplore
arXiv

Image/Video Caption Generation

πŸ†” Title Repo Paper
6029 End-to-End Non-Autoregressive Image Captioning GitHub IEEE Xplore
337 Enhancing Multimodal Alignment with Momentum Augmentation for Dense Video Captioning βž– IEEE Xplore
450 I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image Captioning βž– IEEE Xplore
arXiv
972 Video Captioning via Relation-Aware Graph Learning GitHub IEEE Xplore
1192 Improving Image Captioning with Control Signal of Sentence Quality βž– IEEE Xplore
arXiv
5827 Background Disturbance Mitigation for Video Captioning via Entity-Action Relocation βž– IEEE Xplore
5304 Motion-Aware Video Paragraph Captioning via Exploring Object-Centered Internal Knowledge βž– IEEE Xplore
2203 Associative Learning Network for Coherent Visual Storytelling βž– IEEE Xplore
6772 Shot Noise Analysis for Differential Sampling in Indirect Time of Flight Cameras βž– IEEE Xplore

Flow Estimation

Will soon be added

Image/Video Retrieval

Will soon be added

Transfer Learning

Will soon be added

Learning Theory and Algorithms

Will soon be added

Distributed and Federated Learning

Will soon be added

Machine Learning for Telecommunications

Will soon be added

Dialog and Multimodal Processing of Language

Will soon be added

Discourse and Dialog

Will soon be added

Emerging Topics in Speech Synthesis

Will soon be added

Audio and Text Segmentation, Tagging and Parsing

Will soon be added

Diffusion-based Generative Models for Audio and Speech

πŸ†” Title Repo Paper
5245 Cold Diffusion for Speech Enhancement βž– IEEE Xplore
arXiv
5709 Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration WEB Page
GitHub
IEEE Xplore
arXiv
2264 Unsupervised Vocal Dereverberation with Diffusion-based Generative Models GitHub Page IEEE Xplore
arXiv
5637 Solving Audio Inverse Problems with a Diffusion Model GitHub IEEE Xplore
arXiv
5778 DiffPhase: Generative Diffusion-based STFT Phase Retrieval WEB Page
GitHub
IEEE Xplore
arXiv
3196 Optimal Transport in Diffusion Modeling for Conversion Tasks in Audio Domain βž– IEEE Xplore

Multilingual Alzheimer's Dementia Recognition through Spontaneous Speech: a Signal Processing Grand Challenge

Will soon be added

Model Pruning and Compression

Will soon be added

Image Recognition and Detection

πŸ†” Title Repo Paper
907 Data-Aware Zero-Shot Neural Architecture Search for Image Recognition βž– IEEE Xplore
3890 CFFMixer: Multi-Dimensional Feature Fusion for Object Detection βž– IEEE Xplore
1242 SANet: Spatial Attention Network with Global Average Contrast Learning for Infrared Small Target GitHub IEEE Xplore
736 Logovit: Local-Global Vision Transformer for Object Re-Identification GitHub IEEE Xplore
319 ProContEXT: Exploring Progressive Context Transformer for Tracking GitHub IEEE Xplore
arXiv
3268 Pair DETR: Toward Faster Convergent DETR βž– IEEE Xplore
arXiv

Machine Learning Methods for Language

Will soon be added

Machine Translation and Dialog System

Will soon be added

Radar Waveform Design: Recent Advances and New Emerging Applications

Will soon be added

Conversational Healthcare Interfaces

Will soon be added

Computer Vision Applications

πŸ†” Title Repo Paper
6551 On the Quantization of Recurrent Neural Networks for Smiles Generation βž– IEEE Xplore
4821 WIFI-Based Robust Child Presence Detection for Smart Cars βž– IEEE Xplore
6365 CAN2V: Can-Bus Data-Based Seq2seq Model for Vehicle Velocity Prediction βž– IEEE Xplore
246 An Evaluation Platform to Scope Performance of Synthetic Environments in Autonomous Ground Vehicles Simulation βž– IEEE Xplore
3000 PreFallKD: Pre-Impact Fall Detection via CNN-ViT Knowledge Distillation βž– IEEE Xplore
arXiv
3733 Finding Optimal Numerical Format for Sub-8-Bit Post-Training Quantization of Vision Transformers βž– IEEE Xplore
3961 A Multi-Channel Aggregation Framework for Object Detection in Large-Scale SAR Image βž– IEEE Xplore
3136 Tracking Targets in Hyper-Scale Cameras Using Movement Predication βž– IEEE Xplore
2421 RGB-D Based Pose-Invariant Face Recognition Via Attention Decomposition Module βž– IEEE Xplore
256 NL-DSE: Non-Local Neural Network with Decoder-Squeeze-and-Excitation for Monocular Depth Estimation βž– IEEE Xplore
3137 Real-Time Modelling of Observation Filter in the Remote Microphone Technique for an Active Noise Control Application βž– IEEE Xplore
arXiv
1054 An Adaptive DFE Using Light-Pattern-Protection Algorithm in 12 NM CMOS Technology βž– IEEE Xplore

Domain-Specific Detection

Will soon be added

Temporal Video Analysis and Detection

πŸ†” Title Repo Paper
613 One-Shot Action Detection via Attention Zooming In βž– IEEE Xplore
619 ScaleMix: Intra- And Inter-Layer Multiscale Feature Combination for Change Detection βž– IEEE Xplore
1470 Semi-Supervised Remote Sensing Image Change Detection Using Mean Teacher Model for Constructing Pseudo-Labels βž– IEEE Xplore
1575 Modulation-Based Center Alignment and Motion Mining for Spatial Temporal Action Detection βž– IEEE Xplore
2692 DL-NET: Dilation Location Network for Temporal Action Detection βž– IEEE Xplore
2873 Semi-Supervised Contrastive Learning with Soft Mask Attention for Facial Action Unit Detection βž– IEEE Xplore
4046 Local-Global Siamese Network with Efficient Inter-Scale Feature Learning for Change Detection in VHR Remote Sensing Images βž– IEEE Xplore
4951 Multimodal Facial Action unit Detection with Physiological Signals βž– IEEE Xplore
5755 Background-Weakening Consistency Regularization for Semi-Supervised Video Action Detection βž– IEEE Xplore
5713 Low in Resolution, High in Precision: UAV Detection with Super-Resolution and Motion Information Extraction βž– IEEE Xplore
3350 Temporal Contrastive Learning with Curriculum βž– IEEE Xplore
arXiv
320 Longshortnet: Exploring Temporal and Semantic Features Fusion In Streaming Perception GitHub IEEE Xplore
arXiv

Object Detection

Will soon be added

Deep Learning for Speech and Audio Processing

Will soon be added

Deep Learning for Speech and Language Processing

Will soon be added

Language Modeling and Representation Learning

Will soon be added

Lightweight TTS and TTS Analysis

Will soon be added

Machine Translation for Spoken and Written Language

πŸ†” Title Repo Paper
683 Improving Speech-to-Speech Translation through Unlabeled Text βž– IEEE Xplore
arXiv
1867 A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation βž– IEEE Xplore
arXiv
3026 Decoupled Non-Parametric Knowledge Distillation for End-to-End Speech Translation βž– IEEE Xplore
arXiv
3135 Joint Pre-training with Speech and Bilingual Text for Direct Speech-to-Speech Translation GitHub Page
GitHub
IEEE Xplore
arXiv
3822 LEAPT: Learning Adaptive Prefix-to-Prefix Translation for Simultaneous Machine Translation βž– IEEE Xplore
arXiv
3889 Enhancing Speech-To-Speech Translation with Multiple TTS Targets βž– IEEE Xplore
arXiv
4196 Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation GitHub IEEE Xplore
arXiv
4387 Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation βž– IEEE Xplore
arXiv
4983 Efficient Speech Translation with Dynamic Latent Perceivers GitHub IEEE Xplore
arXiv
5169 Joint Training and Decoding for Multilingual End-to-End Simultaneous Speech Translation GitHub IEEE Xplore
5381 Enhancing Ontology Translation through Cross-Lingual Agreement βž– IEEE Xplore
6523 M3ST: Mix at Three Levels for Speech Translation βž– IEEE Xplore
arXiv

Music Audio Synthesis and Modeling

Will soon be added

Spoken Language Understanding Grand Challenge

Will soon be added

Image Segmentation

Will soon be added

Multi-Speaker ASR

Will soon be added

Multimodal Processing of Language and Language Systems

πŸ†” Title Repo Paper
1158 Prefix Tuning for Automated Audio Captioning GitHub Page
GitHub
IEEE Xplore
arXiv
1648 C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval GitHub IEEE Xplore
arXiv
2096 The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR EdAcc IEEE Xplore
arXiv
2768 Adaptive Knowledge Distillation between Text and Speech Pre-trained Models βž– IEEE Xplore
arXiv
6140 A Processing Framework to Access Large Quantities of Whispered Speech Found in ASMR GitHub Page
GitHub
IEEE Xplore
arXiv
567 Cross-Modal Mutual Learning for Cued Speech Recognition βž– IEEE Xplore
arXiv
1886 SLBERT: A Novel Pre-Training Framework for Joint Speech and Language Modeling βž– IEEE Xplore
2190 Cross-Modal Adversarial Contrastive Learning for Multi-Modal Rumor Detection βž– IEEE Xplore
arXiv
2884 Multiple Contrastive Learning for Multimodal Sentiment Analysis βž– IEEE Xplore
3666 Token2vec: A Joint Self-Supervised Pre-Training Framework using Unpaired Speech and Text βž– IEEE Xplore
arXiv
3714 DAIS: The Delft Database of EEG Recordings of Dutch Articulated and Imagined Speech βž– IEEE Xplore
4409 A Token-Level Contrastive Framework for Sign Language Translation GitHub IEEE Xplore
arXiv
4801 Sign Language Recognition via Deformable 3D Convolutions and Modulated Graph Convolutional Networks βž– IEEE Xplore
Pdf
4837 LAST: Scalable Lattice-based Speech Modelling in JAX GitHub IEEE Xplore
arXiv
4989 M-SpeechCLIP: Leveraging Large-Scale, Pre-trained Models for Multilingual Speech to Image Retrieval βž– IEEE Xplore
arXiv
5014 Using Emotion Embeddings to Transfer Knowledge between Emotions, Languages, and Annotation Formats GitHub IEEE Xplore
arXiv
5146 Speech-Text based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition βž– IEEE Xplore
arXiv

Tracking

Will soon be added

Radar-Assisted Perception (RAP)

Will soon be added

Data Driven and Machine Learning based Room Acoustic Modeling

Will soon be added

Sensing Applications

Will soon be added

Computational Imaging

Will soon be added

Anomaly Detection

Will soon be added

Deep Neural Network

Will soon be added

Deep Learning

Will soon be added

Deep and Sequential Learning

Will soon be added

Machine Learning for Time Series Analysis

Will soon be added

Multilingual Speech Recognition and Identification

Will soon be added

Quantum Computing for Machine Learning and Signal Processing

Will soon be added

Sound Event Detection

Will soon be added

Brain Connectivity

Will soon be added

Speech Signal Improvement Signal Processing Grand Challenge 2023

Will soon be added

Anonymization and Data Privacy

Will soon be added

Natural Language Processing

Will soon be added

Pronunciation and Fluency Assessment

Will soon be added

Edge Learning for Emerging Wireless Technologies

Will soon be added

Acoustic Sensor Array Processing and Sound Source Localization

Will soon be added

Representation Learning

Will soon be added

Adversarial Machine Learning

πŸ†” Title Repo Paper
987 Backdoor Defense via Suppressing Model Shortcuts GitHub IEEE Xplore
arXiv

Target Detection and Classification

Will soon be added

Spatial Processing for Audio and Speech

Will soon be added

Brain Computer Interfaces

Will soon be added

Acoustic Echo Cancellation Signal Processing Grand Challenge 2023

Will soon be added

DoA Estimation

Will soon be added

Speaker Recognition: Scoring, Fairness, Privacy

Will soon be added

Speaker Recognition: Verification, Diarization, Anti-Spoofing

πŸ†” Title Repo Paper
3059 Pushing the Limits of Self-Supervised Speaker Verification using Regularized Distillation Framework GitHub IEEE Xplore
arXiv

Recent Advances in Robust Learning for Modern Computational Imaging

Will soon be added

Signal Processing and Machine Learning for Networked Autonomous Agents

Will soon be added

Active Noise Control, echo Reduction and Feedback Reduction

Will soon be added

Anomaly Detection and Representation Learning for Audio Classification

Will soon be added

Data Processing

Will soon be added

Perceptual Assessment

Will soon be added

Machine Learning for Recommendation, Search and other Applications

Will soon be added

Reinforcement Learning

Will soon be added

Pattern Recognition and Classification

Will soon be added

Sparsity, Compressed Sensing, and Tensor Decomposition

Will soon be added

Adversarial Machine Learning and Information Theoretic Security

Will soon be added

Resource Constrained ASR

Will soon be added

Singing Voice Synthesis/Conversion and Pretrained TTS

Will soon be added

Medical Image Reconstruction

Will soon be added

L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality

Will soon be added

Multimedia Forensics

Will soon be added

MIMO Radars and Waveform Design

Will soon be added

Speech Dysarthria

Will soon be added

Speech Emotion Recognition: General Topics

πŸ†” Title Repo Paper
2490 Multi-Scale Receptive Field Graph Model for Emotion Recognition in Conversations GitHub IEEE Xplore
3918 MGAT: Multi-Granularity Attention based Transformers for Multi-Modal Emotion Recognition βž– IEEE Xplore
4523 Achieving Fair Speech Emotion Recognition via Perceptual Fairness βž– IEEE Xplore
5023 Personalized Task Load Prediction in Speech Communication WEB Page IEEE Xplore
arXiv
5075 DWFormer: Dynamic Window Transformer for Speech Emotion Recognition GitHub IEEE Xplore
arXiv
5730 Multi-View Learning for Speech Emotion Recognition with Categorical Emotion, Categorical Sentiment, and Dimensional Scores βž– IEEE Xplore
Microsoft
Pdf
540 Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations GitHub IEEE Xplore
arXiv
563 Emotion Recognition in Conversation from Variable-Length Context βž– IEEE Xplore
1423 Knowledge-Aware Graph Convolutional Network with Utterance-Specific Window Search for Emotion Recognition in Conversations βž– IEEE Xplore
1611 Masking Speech Contents by Random Splicing: is Emotional Expression Preserved? βž– IEEE Xplore
ResearchGate
3129 Multi-Local Attention for Speech-based Depression Detection βž– IEEE Xplore
Pdf
3130 Daily Mental Health Monitoring from Speech: A Real-World Japanese Dataset and Multitask Learning Analysis βž– IEEE Xplore
ResearchGate
3830 SDTN: Speaker Dynamics Tracking Network for Emotion Recognition in Conversation βž– IEEE Xplore
4065 Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition GitHub IEEE Xplore
arXiv
5683 Designing and Evaluating Speech Emotion Recognition Systems: A Reality Check Case Study with IEMOCAP βž– IEEE Xplore
arXiv
5711 EMix: A Data Augmentation Method for Speech Emotion Recognition βž– IEEE Xplore
6131 A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition βž– IEEE Xplore
arXiv
6316 Automatic Classification of Vocal Intensity Category from Speech βž– IEEE Xplore

Intelligent and Semantic Communications for 5G Mobile Networks and Beyond

Will soon be added

Audio and Speech Quality Measurements

Will soon be added

Acoustic Modeling; Auditory Modeling for Hearing Instruments

Will soon be added

Anonymization, Data Privacy, and Biometrics

Will soon be added

Object Recognition

Will soon be added

Identification Detection

Will soon be added

Tracking, Data Fusion, and Sensor Networks

πŸ†” Title Repo Paper
268 Deep Fusion of Multi-Object Densities using Transformer GitHub IEEE Xplore
arXiv
6240 Nonnegative Block-Term Decomposition with the Ξ²-Divergence: Joint Data Fusion and Blind Spectral Unmixing GitHub IEEE Xplore
2238 Robust Subspace Tracking with Contamination via Ξ±-Divergence GitHub IEEE Xplore
ResearchGate
2321 Wireless Location Tracking via Complex-Domain Super MDS with Time Series Self-Localization Information βž– IEEE Xplore
2463 Angle-of-Arrival Target Tracking using a Mobile UAV in External Signal-Denied Environment βž– IEEE Xplore
2821 A Distributed Adaptive Algorithm for Non-Smooth Spatial Filtering Problems βž– IEEE Xplore
arXiv
2937 A Computationally Efficient Algorithm for Distributed Adaptive Signal Fusion based on Fractional Programs βž– IEEE Xplore
Pdf
3217 Data Driven Joint Sensor Fusion and Regression based on Geometric Mean Squared Error βž– IEEE Xplore
4043 Sensor Selection for Angle of Arrival Estimation based on the Two-Target CramΓ©r-Rao Bound GitHub IEEE Xplore
4149 Clustered Greedy Algorithm for Large-Scale Sensor Selection βž– IEEE Xplore

Speaker Recognition: Neural Network Architecture

Will soon be added

Speech Analysis

Will soon be added

Speaker Recognition: Anti-Spoofing and Verification

πŸ†” Title Repo Paper
5447 SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing GitHub IEEE Xplore
arXiv

Bayesian Signal Processing

Will soon be added

Speaker Recognition: Verification and Diarization

Will soon be added

Learning on Graphs for Biology and Medicine

πŸ†” Title Repo Paper
2914 Deep Spatio-Temporal Multiplex Graph Learning for Cardiac Imaging Classification βž– IEEE Xplore
4165 Graph Signal Processing for Neurogimaging to Reveal Dynamics of Brain Structure-Function Coupling βž– IEEE Xplore
4375 Multiple Signed Graph Learning for Gene Regulatory Network Inference βž– IEEE Xplore
4599 Predicting Brain Age using Transferable Covariance Neural Networks βž– IEEE Xplore
arXiv
6456 Spatial Graph Signal Interpolation with an Application for Merging BCI Datasets with Various Dimensionalities GitHub IEEE Xplore
arXiv

Learning from Neuroimaging Data

Will soon be added

Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech

Will soon be added

Quality Assessment and Anomaly Detection

Will soon be added

Human-Centric Multimedia and Human-Machine Interaction

Will soon be added

Speech Emotion Recognition: Transfer Learning

πŸ†” Title Repo Paper
457 A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition βž– IEEE Xplore
3755 Fast Yet Effective Speech Emotion Recognition with Self-Distillation GitHub IEEE Xplore
arXiv
3954 Domain Adaptation without Catastrophic Forgetting on a Small-Scale Partially-Labeled Corpus for Speech Emotion Recognition βž– IEEE Xplore
4547 Phonetic Anchor-based Transfer Learning to Facilitate Unsupervised Cross-Lingual Speech Emotion Recognition βž– IEEE Xplore
Pdf
4559 Zero-Shot Speech Emotion Recognition using Generative Learning with Reconstructed Prototypes βž– IEEE Xplore
4858 Unsupervised Domain Adaptation for Preference Learning based Speech Emotion Recognition βž– IEEE Xplore
Pdf

Multi-Antenna Communications and Sensing

Will soon be added

Quantum Machine Learning Algorithms and Applications on NISQ Devices

Will soon be added

Neural Speech and Audio Coding: Emerging Challenges and Opportunities

Will soon be added

Medical and Environmental Acoustics; Audio Security

Will soon be added

Classification of Acoustic Scenes and Events

Will soon be added

Learning from EEG Data

Will soon be added

Physiological Signal Processing

Will soon be added

Speech Production, Perception,and Psychoacoustics

Will soon be added

Watermarking, Data Hiding and Human Factors in Security

Will soon be added

3D Point Cloud/Stereo Video

Will soon be added

Face Processing

Will soon be added

MIMO Radars and MIMO Communications

Will soon be added

Speaker Recognition: Diarization

Will soon be added

Estimation, Detection, and Classification

Will soon be added

Model Lightweight and Video Compression

Will soon be added

Subspace and Manifold Learning

# Title Repo Paper
2651 Generative Modeling based Manifold Learning for Adaptive Filtering Guidance βž– IEEE Xplore
Amazon Science
684 Tensor Completion for Efficient and Accurate Hyperparameter Optimisation in Large-Scale Statistical Learning βž– IEEE Xplore
903 CO-NET: Classification-Oriented Point Cloud Sampling via Informative Feature Learning and Non-Overlapped Local Adjustment βž– IEEE Xplore
2091 Deep Survival Analysis and Counterfactual Inference using Balanced Representations βž– IEEE Xplore
3045 Feature Space Recovery for Incomplete Multi-View Clustering βž– IEEE Xplore
HAL Science
4602 Study of Manifold Geometry using Multiscale Non-Negative Kernel Graphs βž– IEEE Xplore
arXiv

Speech Enhancement - Diffusion and Other Generative Models

πŸ†” Title Repo Paper
2594 Cross-domain Diffusion based Speech Enhancement for Very Noisy Speech GitHub Page IEEE Xplore
3643 SRTNet: Time Domain Speech Enhancement via Stochastic Refinement GitHub IEEE Xplore
arXiv
4671 Diffusion-based Generative Speech Source Separation GitHub IEEE Xplore
arXiv
4716 SEPDIFF: Speech Separation based on Denoising Diffusion Model βž– IEEE Xplore
5798 Fast and Efficient Speech Enhancement with Variational Autoencoders βž– IEEE Xplore
arXiv
6105 Metric-oriented Speech Enhancement using Diffusion Probabilistic Model βž– IEEE Xplore
arXiv

ICASSP2023 General Meeting Understanding and Generation (MUG) Challenge

Will soon be added

Signal Processing for Smart City Applications and the Internet of Things

Will soon be added

Symbol-Level Precoding: Recent Advance and New Applications in 6G and Beyond

Will soon be added

Graphical Inference and Modeling in Dynamical Systems

Will soon be added

Deep Learning-based Source Separation

Will soon be added

Medical Image Segmentation

Will soon be added

Bioinformatics

Will soon be added

Cybersecurity, Hardware and Network Security

Will soon be added

Multi-Antenna Communications and Intelligent Reflecting Surfaces

Will soon be added

Multimedia Compression and Quality

Will soon be added

Multimedia Analysis, Synthesis, and Learning

Will soon be added

DoA Estimation and Beamforming

Will soon be added

Speech Emotion Recognition: Multimodality

Will soon be added

Speech Emotion Recognition: Neural Architectures

Will soon be added

Optimization Methods for Signal Processing

Will soon be added

5th DNS Challenge at IEEE ICASSP 2023

Will soon be added

Signal Processing and Learning over Dynamic Graphs

Will soon be added

Human Action Recognition

Will soon be added

Deep Generative Model

πŸ†” Title Repo Paper
1565 String-based Molecule Generation via Multi-Decoder VAE βž– IEEE Xplore
arXiv
4161 Graph Contrastive Learning with Learnable Graph Augmentation βž– IEEE Xplore
3180 Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution GitHub Page
GitHub
IEEE Xplore
arXiv
5068 Evaluation of Categorical Generative Models - Bridging the Gap Between Real and Synthetic Data βž– IEEE Xplore
arXiv
6053 Diffusion Probabilistic Modeling for Fine-Grained Urban Traffic Flow Inference with Relaxed Structural Constraint βž– IEEE Xplore
4977 Single-Shot Domain Adaptation via Target-aware Generative Augmentations GitHub IEEE Xplore
arXiv

Multimodal Signal Processing and Analysis

Will soon be added

Speech Enhancement - Self-Supervised Learning

πŸ†” Title Repo Paper
915 Perceive and Predict: Self-Supervised Speech Representation based Loss Functions for Speech Enhancement βž– IEEE Xplore
arXiv
2006 DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks βž– IEEE Xplore
3343 Speech Separation with Large-Scale Self-Supervised Learning βž– IEEE Xplore
arXiv
3511 Self-Supervised Learning-based Source Separation for Meeting Data WEB Page IEEE Xplore
arXiv
4456 An Adapter based Multi-Label Pre-training for Speech Separation and Enhancement βž– IEEE Xplore
arXiv
5785 Self-Supervised Learning for Speech Enhancement Through Synthesis GitHub IEEE Xplore
arXiv

Distributed and Reliable Signal Processing and Communications

Will soon be added

Resource-Efficient Real-time Neural Speech Separation

Will soon be added

Multichannel Speech Enhancement, Dereverberation, and System Identification

Will soon be added

Multilabel Acoustic Event Classification

Will soon be added

Deep Learning for Medical Imaging

πŸ†” Title Repo Paper
1384 Coarse-to-Fine Covid-19 Segmentation via Vision-Language Alignment GitHub IEEE Xplore
arXiv

Machine/Deep Learning Methodologies for Multimedia

Will soon be added

Human-Centric Multimedia

Will soon be added

Source Localization and Separation

Will soon be added

Speech Enhancement /Audio-Visual, Multi-Channel, and Other

Will soon be added

Speech Enhancement - Separation and Target Speech Extraction

πŸ†” Title Repo Paper
3175 Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation GitHub IEEE Xplore
arXiv

Speech Enhancement - Single Channel

Will soon be added

Machine Learning Applications to Communications

Will soon be added

Aspects in Image Generation/Analysis

Will soon be added

Multi-Antenna and Multi-Carrier Communications

Will soon be added

Signal Filtering, Restoration, Enhancement, and Reconstruction

Will soon be added

ICASSP SP Clarity Challenge: Speech Enhancement for Hearing Aids

Will soon be added

Image and Video Enhancement

Will soon be added

Speech Recognition-training/adaptation

Will soon be added

Decentralized Wireless Systems and Energy Harvesting

Will soon be added

Robust Learning and Inference

Will soon be added

Music Classification and Transcription

Will soon be added

Music Information Retrieval

Will soon be added

Deep Learning for Medical Image Segmentation

Will soon be added

Detection and Classification in Medical Imaging

Will soon be added

Image Coding/Compression

Will soon be added

Audio-Visual Signal Processing and Analysis

Will soon be added

Various Aspects in Speech and Language Processing

Will soon be added

Speech Recognition: Modeling and Context

Will soon be added

Speech Recognition: Self-Supervised Models

Will soon be added

Channel State Estimation

Will soon be added

Signal Processing over Graphs and Networks

Will soon be added

Signal Processing over Networks

Will soon be added

Applications to Vision, Speech, and Robotics

πŸ†” Title Repo Paper
6443 LMBAO: A Landmark Map for Bundle Adjustment Odometry in Lidar Slam βž– IEEE Xplore
arXiv
1069 Residual Squeeze-and-Excitation U-Shaped Network for Minutia Extraction in Contactless Fingerprint Images βž– IEEE Xplore
1603 TSPTQ-ViT: Two-Scaled Post-Training Quantization for Vision Transformer βž– IEEE Xplore
arXiv
3925 Low-Complexity Low-Rank Approximation SVD for Massive Matrix in Tensor Train Format βž– IEEE Xplore
2043 DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech GitHub IEEE Xplore
arXiv
3040 Cooperative Five Degrees of Freedom Motion Estimation for a Swarm of Autonomous Vehicles βž– IEEE Xplore

Person Identification and Relapse Detection from Continuous Recordings of Biosignals

Will soon be added

Vision and Language Model

Will soon be added

TTS: AM and Vocoder

Will soon be added

Signal Processing Education

Will soon be added

Signal Processing and Systems for Remote Biometrics

Will soon be added

Signal Processing for RIS-Enabled Smart Wireless Environments

Will soon be added

Multimodal Learning

Will soon be added

Video Coding/Compression

Will soon be added

Object Tracking

Will soon be added

Image Generation

Will soon be added

Spoken Language Understanding

Will soon be added

Optimization and Machine Learning for Communications

Will soon be added

Sparse/Low-Dimensional Signal Processing

Will soon be added

Signal Processing Theory and Methods

Will soon be added

Radar/Array Signal Processing. Networks and Communications

Will soon be added

Applications to Communications

Will soon be added

The First Pathloss Radio Map Prediction Challenge

Will soon be added

Human Video Generation and Editing

Will soon be added

Point Cloud Processing

Will soon be added

Multimedia Databases and Information Retrieval

Will soon be added

Voice and Style Conversion

Will soon be added

Synergy between Human and Machine Approaches to Sound/Scene Recognition and Processing

Will soon be added

Topological and Simplicial Data Processing

Will soon be added

Unsupervised Deep Learning of Image Priors for Inverse Problems

Will soon be added

Self-Supervised Learning and Data-Efficiency for Speech and Audio

πŸ†” Title Repo Paper
5842 Audio Signal Enhancement with Learning from Positive and Unlabelled Data GitHub IEEE Xplore
arXiv

Sound Event Detection and Localization; Bioacoustic Event Detection

Will soon be added

Aspects in Machine Learning

Will soon be added

Aspects in Image/Video Processing and Analysis

πŸ†” Title Repo Paper
2133 ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal GitHub Page
GitHub
IEEE Xplore
arXiv

Learning Algorithms and Applications

Will soon be added

Optimization Methods in Machine Learning

Will soon be added

Applications of Machine Learning

Will soon be added

Sensing, Computing, and Semantic Communications

Will soon be added

Sparsity and Low-Rank Models

Will soon be added

Signal Processing over Graphs

Will soon be added

Target Source Extraction

Will soon be added

Music Generation and Arrangement

Will soon be added

Multimodal Information based Speech Processing (MISP) 2022 Challenge

Will soon be added

Image Retrieval and Classification

Will soon be added

Variational Inference and Approximate Bayesian Techniques

Will soon be added

Spatial Audio Recording and Reproduction

Will soon be added

Speech Modeling and Audio Coding

Will soon be added

Audio Processing and Analysis

Will soon be added

Image/Video Enhancement

Will soon be added

Zero or Few-Shot Learning

Will soon be added

Acoustic and Microphone Array Processing

Will soon be added

Speech and Language Disorders

Will soon be added

Various Aspects in Speech and Speaker Recognition

Will soon be added

Sampling Theory, Compressed and Non-uniform Sampling

Will soon be added

Show and Tell Demos: Session

πŸ†” Title Repo Paper
7049 Generating Sound Effects, Music, Speech, and Beyond, with Text βž– βž–
7059 DisCoHeadTV: Disentangled Control of Head Pose and Facial Expressions for Text-to-Video Synthesis βž– βž–
7064 Intelligent Dialogue-based Tutoring System for Second Language Reading Comprehension βž– βž–
7068 Optimize for my Voice with Speaker Identification βž– Pdf

Rising Stars Workshop

Will soon be added


Star History

Star History Chart

About

ICASSP 2023 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023 conference. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

License:MIT License