Liujingxiu23 / TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Updated on 2024.06.13

Usage instructions: here

This page is modified from here

Table of Contents
  1. TTS

TTS

Publish Date Title Authors PDF Code
2024-06-11 Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? Qingkai Fang et.al. 2406.07289 null
2024-06-11 AudioMarkBench: Benchmarking Robustness of Audio Watermarking Hongbin Liu et.al. 2406.06979 link
2024-06-11 Controlling Emotion in Text-to-Speech with Natural Language Prompts Thomas Bott et.al. 2406.06406 link
2024-06-10 Meta Learning Text-to-Speech Synthesis in over 7000 Languages Florian Lux et.al. 2406.06403 link
2024-06-10 MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance Semin Kim et.al. 2406.05965 null
2024-06-11 WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark Linhan Ma et.al. 2406.05763 null
2024-06-09 An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS Xiaofei Wang et.al. 2406.05699 null
2024-06-11 Text-aware and Context-aware Expressive Audiobook Speech Synthesis Dake Guo et.al. 2406.05672 null
2024-06-08 Autoregressive Diffusion Transformer for Text-to-Speech Synthesis Zhijun Liu et.al. 2406.05551 null
2024-06-08 VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Sanyuan Chen et.al. 2406.05370 null
2024-06-07 Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis Ryan Langman et.al. 2406.05298 null
2024-06-07 XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model Edresson Casanova et.al. 2406.04904 null
2024-06-07 TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking Junzuo Zhou et.al. 2406.04840 null
2024-06-07 Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study Chong Zhang et.al. 2406.04633 null
2024-06-06 Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis Théodor Lemerle et.al. 2406.04467 null
2024-06-06 Total-Duration-Aware Duration Modeling for Text-to-Speech Systems Sefik Emre Eskimez et.al. 2406.04281 null
2024-06-06 Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining Jinlong Xue et.al. 2406.03714 null
2024-06-06 Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model Jinlong Xue et.al. 2406.03706 null
2024-06-05 Style Mixture of Experts for Expressive Text-To-Speech Synthesis Ahad Jawaid et.al. 2406.03637 null
2024-06-07 Harder or Different? Understanding Generalization of Audio Deepfake Detection Nicolas M. Müller et.al. 2406.03512 null
2024-06-05 LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes Trung Dang et.al. 2406.02897 null
2024-06-04 Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Philip Anastassiou et.al. 2406.02430 null
2024-06-05 SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models Dongchao Yang et.al. 2406.02328 null
2024-06-04 BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation Hui-Peng Du et.al. 2406.02162 null
2024-06-04 Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis Kun Zhou et.al. 2406.02009 null
2024-06-03 ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec Shengpeng Ji et.al. 2406.01205 link
2024-06-03 Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training Jan Melechovsky et.al. 2406.01018 null
2024-06-02 Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback Chen Chen et.al. 2406.00654 null
2024-05-31 Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities Vicky Zayats et.al. 2405.18669 null
2024-05-28 TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation Chenyang Le et.al. 2405.17809 null
2024-05-27 RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis Haoxiang Shi et.al. 2405.17028 null
2024-05-24 Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Zijin Gu et.al. 2405.15216 null
2024-05-23 Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models Jingyi Chen et.al. 2405.14632 null
2024-05-22 A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction Yue Li et.al. 2405.13477 null
2024-05-20 Multi-speaker Text-to-speech Training with Speaker Anonymized Data Wen-Chin Huang et.al. 2405.11767 null
2024-05-19 VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications Mikhail Konenkov et.al. 2405.11537 null
2024-05-18 Exploring speech style spaces with language models: Emotional TTS without emotion labels Shreeram Suresh Chandra et.al. 2405.11413 null
2024-05-16 Faces that Speak: Jointly Synthesising Talking Face and Speech from Text Youngjoon Jang et.al. 2405.10272 null
2024-05-16 Building a Luganda Text-to-Speech Model From Crowdsourced Data Sulaiman Kagumire et.al. 2405.10211 null
2024-05-16 Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model Siyang Wang et.al. 2405.09768 null
2024-05-15 Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer Weifei Jin et.al. 2405.09470 null
2024-05-15 Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis Sho Inoue et.al. 2405.09171 null
2024-05-14 PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset Yang Hou et.al. 2405.08838 link
2024-04-30 Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech Hankun Wang et.al. 2404.19723 null
2024-04-29 MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis Xiang Li et.al. 2404.18398 null
2024-04-28 USAT: A Universal Speaker-Adaptive Text-to-Speech Approach Wenbin Wang et.al. 2404.18094 link
2024-04-27 TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality Tiantian Feng et.al. 2404.17983 null
2024-04-26 An RFP dataset for Real, Fake, and Partially fake audio detection Abdulazeez AlAli et.al. 2404.17721 null
2024-04-23 StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations Sen Liu et.al. 2404.14946 null
2024-04-23 Retrieval-Augmented Audio Deepfake Detection Zuheng Kang et.al. 2404.13892 null
2024-04-14 Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling Quanxiu Wang et.al. 2404.09192 null
2024-04-11 Voice-Assisted Real-Time Traffic Sign Recognition System Using Convolutional Neural Network Mayura Manawadu et.al. 2404.07807 null
2024-04-18 Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness Xincan Feng et.al. 2404.06714 null
2024-04-10 CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations Leying Zhang et.al. 2404.06690 null
2024-04-10 The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge Yiwei Guo et.al. 2404.06079 null
2024-04-07 Cross-Domain Audio Deepfake Detection: Dataset and Analysis Yuang Li et.al. 2404.04904 null
2024-04-06 HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks Yingting Li et.al. 2404.04645 link
2024-04-18 Open vocabulary keyword spotting through transfer learning from speech synthesis Kesavaraj V et.al. 2404.03914 null
2024-04-06 RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis Detai Xin et.al. 2404.03204 null
2024-04-03 CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech Jaehyeon Kim et.al. 2404.02781 null
2024-04-13 PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders Yu Pan et.al. 2404.02702 null
2024-03-31 Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation Rohan Chaudhury et.al. 2404.01339 link
2024-03-28 A Review of Multi-Modal Large Language and Vision Models Kilian Carolan et.al. 2404.01322 null
2024-04-09 KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis Adal Abilbekov et.al. 2404.01033 link
2024-03-31 CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models Xiang Li et.al. 2404.00569 link
2024-03-25 VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild Puyuan Peng et.al. 2403.16973 link
2024-03-20 Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning Shivam Ratnakant Mhaskar et.al. 2403.15469 null
2024-03-20 UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge Wataru Nakata et.al. 2403.13720 null
2024-03-20 Building speech corpus with diverse voice characteristics for its prompt-based representation Aya Watanabe et.al. 2403.13353 null
2024-03-17 Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations Claudio Pinhanez et.al. 2403.11209 null
2024-03-17 EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech Ziqi Liang et.al. 2403.08164 null
2024-03-09 HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling Chunhui Wang et.al. 2403.05989 null
2024-03-05 AttentionStitch: How Attention Solves the Speech Editing Problem Antonios Alexos et.al. 2403.04804 null
2024-03-07 Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation Sai Akarsh et.al. 2403.04178 null
2024-03-27 NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Zeqian Ju et.al. 2403.03100 null
2024-03-04 Brilla AI: AI Contestant for the National Science and Maths Quiz George Boateng et.al. 2403.01699 link
2024-03-02 Towards Accurate Lip-to-Speech Synthesis in-the-Wild Sindhu Hegde et.al. 2403.01087 null
2024-02-29 Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data Takaaki Saeki et.al. 2402.18932 null
2024-02-26 An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation Ahmet Gunduz et.al. 2402.16380 link
2024-02-22 Efficient data selection employing Semantic Similarity-based Graph Structures for model training Roxana Petcu et.al. 2402.14888 null
2024-02-22 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition Rendi Chevi et.al. 2402.14523 null
2024-02-19 On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models Miri Varshavsky-Hassid et.al. 2402.12423 null
2024-02-19 Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting Haolin Chen et.al. 2402.12220 null
2024-02-18 Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru Zining Wang et.al. 2402.11571 null
2024-02-14 MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech Shengpeng Ji et.al. 2402.09378 null
2024-02-15 BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Mateusz Łajszczak et.al. 2402.08093 null
2024-03-04 Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like Naoyuki Kanda et.al. 2402.07383 null
2024-02-09 A New Approach to Voice Authenticity Nicolas M. Müller et.al. 2402.06304 null
2024-02-08 Unified Speech-Text Pretraining for Spoken Dialog Modeling Heeseung Kim et.al. 2402.05706 null
2024-02-05 Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations Álvaro Martín-Cortinas et.al. 2402.03407 null
2024-02-02 Natural language guidance of high-fidelity text-to-speech with synthetic annotations Dan Lyth et.al. 2402.01912 null
2024-01-23 Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization Wei-Ping Huang et.al. 2402.01692 null
2024-02-01 Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech Dong Yang et.al. 2402.00288 null
2024-02-01 PAM: Prompting Audio-Language Models for Audio Quality Assessment Soham Deshmukh et.al. 2402.00282 link
2024-01-31 Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2 Jiatong Shi et.al. 2401.17619 link
2024-01-28 MunTTS: A Text-to-Speech System for Mundari Varun Gumma et.al. 2401.15579 null
2024-01-30 VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech Chenpeng Du et.al. 2401.14321 null
2024-01-25 Text to speech synthesis Harini s et.al. 2401.13891 null
2024-01-25 SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation Dong Zhang et.al. 2401.13527 link
2024-01-22 Benchmarking Large Multimodal Models against Common Corruptions Jiawei Zhang et.al. 2401.11943 link
2024-01-22 Adversarial speech for voice privacy protection from Personalized Speech generation Shihao Chen et.al. 2401.11857 null
2024-02-16 Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis Vinotha R et.al. 2401.11771 null
2024-01-19 Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech Abhinav Garg et.al. 2401.10465 null
2024-02-28 MLAAD: The Multi-Language Audio Anti-Spoofing Dataset Nicolas M. Müller et.al. 2401.09512 null
2024-01-15 MCMChaos: Improvising Rap Music with MCMC Methods and Chaos Theory Robert G. Kimelman et.al. 2401.07967 null
2024-01-14 ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering Yakun Song et.al. 2401.07333 null
2024-01-12 Multi-Task Learning for Front-End Text Processing in TTS Wonjune Kang et.al. 2401.06321 link
2024-01-11 End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2 Aniket Tathe et.al. 2401.06183 null
2024-01-11 Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection Lian Huang et.al. 2401.05614 null
2024-01-10 Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters Kenichi Fujita et.al. 2401.05111 null
2024-01-07 Evaluating and Personalizing User-Perceived Quality of Text-to-Speech Voices for Delivering Mindfulness Meditation with Different Physical Embodiments Zhonghao Shi et.al. 2401.03581 null
2024-01-07 Transfer the linguistic representations from TTS to accent conversion with non-parallel data Xi Chen et.al. 2401.03538 null
2024-01-03 Incremental FastPitch: Chunk-based High Quality Text to Speech Muyang Du et.al. 2401.01755 null
2024-01-03 Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction Minchan Kim et.al. 2401.01498 null
2023-12-18 Assisting Blind People Using Object Detection with Vocal Feedback Heba Najm et.al. 2401.01362 null
2023-12-30 Boosting Large Language Model for Speech Synthesis: An Empirical Study Hongkun Hao et.al. 2401.00246 null
2024-01-01 Normalization of Lithuanian Text Using Regular Expressions Pijus Kasparaitis et.al. 2312.17660 null
2023-12-27 AE-Flow: AutoEncoder Normalizing Flow Jakub Mosiński et.al. 2312.16552 null
2023-12-22 Creating New Voices using Normalizing Flows Piotr Bilinski et.al. 2312.14569 null
2023-12-22 ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations Cheng Gong et.al. 2312.14398 null
2023-12-19 External Knowledge Augmented Polyphone Disambiguation Using Large Language Model Chen Li et.al. 2312.11920 null
2023-12-17 A review-based study on different Text-to-Speech technologies Md. Jalal Uddin Chowdhury et.al. 2312.11563 null
2024-01-31 MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis Wenhao Guan et.al. 2312.10687 null
2024-02-22 Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Xueyao Zhang et.al. 2312.09911 link
2023-12-11 Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism Georgios Milis et.al. 2312.06613 link
2023-12-08 An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis Via Nielson et.al. 2312.05415 null
2023-12-06 Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis Zehua Chen et.al. 2312.03491 null
2023-12-02 Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning Raviraj Joshi et.al. 2312.01107 null
2023-12-02 Code-Mixed Text to Speech Synthesis under Low-Resource Constraints Raviraj Joshi et.al. 2312.01103 null
2023-11-29 Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes Pavel Korshunov et.al. 2311.17655 null
2024-02-06 Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech Enting Zhou et.al. 2311.14816 link
2023-12-07 Guided Flows for Generative Modeling and Decision Making Qinqing Zheng et.al. 2311.13443 null
2023-11-27 HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Sang-Hoon Lee et.al. 2311.12454 link
2023-11-18 Utilizing Speech Emotion Recognition and Recommender Systems for Negative Emotion Handling in Therapy Chatbots Farideh Majidi et.al. 2311.11116 null
2023-11-18 Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys Gabriel Cosache et.al. 2311.11030 null
2023-11-17 A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness Mathias Vogel et.al. 2311.10804 null
2023-11-16 Improving fairness for spoken language understanding in atypical speech with Text-to-Speech Helin Wang et.al. 2311.10149 link
2024-02-02 DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation Jianzong Wang et.al. 2311.07965 null
2023-11-12 ChatAnything: Facetime Chat with LLM-Enhanced Personas Yilin Zhao et.al. 2311.06772 null
2023-11-11 NewsGPT: ChatGPT Integration for Robot-Reporter Abdelhadi Hireche et.al. 2311.06640 link
2023-11-08 Synthetic Speaking Children -- Why We Need Them and How to Make Them Muhammad Ali Farooq et.al. 2311.06307 null
2023-09-25 Face-StyleSpeech: Improved Face-to-Voice latent mapping for Natural Zero-shot Speech Synthesis from a Face Image Minki Kang et.al. 2311.05844 null
2023-11-07 Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning Rishabh Jain et.al. 2311.04313 link
2023-11-07 Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment Jakir Hasan et.al. 2311.03792 null
2023-11-08 Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction Minchan Kim et.al. 2311.02898 null
2023-11-02 Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations Hanglei Zhang et.al. 2311.01260 null
2023-11-02 E3 TTS: Easy End-to-End Diffusion-based Text to Speech Yuan Gao et.al. 2311.00945 null
2023-10-31 An Implementation of Multimodal Fusion System for Intelligent Digital Human Generation Yingjie Zhou et.al. 2310.20251 link
2023-10-27 Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN Neeraj Kumar et.al. 2310.18169 null
2023-10-25 ArTST: Arabic Text and Speech Transformer Hawau Olamide Toyin et.al. 2310.16621 link
2023-10-25 Generative Pre-training for Speech with Flow Matching Alexander H. Liu et.al. 2310.16338 null
2023-10-23 DPP-TTS: Diversifying prosodic features of speech via determinantal point processes Seongho Joo et.al. 2310.14663 null
2023-10-22 An overview of text-to-speech systems and media applications Mohammad Reza Hasanabadi et.al. 2310.14301 null
2023-10-14 Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling Tiberiu Boros et.al. 2310.09636 link
2023-10-14 Attentive Multi-Layer Perceptron for Non-autoregressive Generation Shuyang Jiang et.al. 2310.09512 link
2023-12-22 Crowdsourced and Automatic Speech Prominence Estimation Max Morrison et.al. 2310.08464 link
2023-10-12 On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition Nick Rossenbach et.al. 2310.08132 null
2023-10-12 Vec-Tok Speech: speech vectorization and tokenization for neural speech generation Xinfa Zhu et.al. 2310.07246 link
2023-10-10 Prosody Analysis of Audiobooks Charuta Pethe et.al. 2310.06930 null
2023-10-09 JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions Detai Xin et.al. 2310.06072 null
2024-01-09 Unified speech and gesture synthesis using flow matching Shivam Mehta et.al. 2310.05181 null
2023-10-08 Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset Ze Liu et.al. 2310.04982 null
2023-10-11 LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Jiaming Wang et.al. 2310.04673 null
2024-01-22 Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis Jae-Sung Bae et.al. 2310.03538 null
2023-10-07 The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains Erica Cooper et.al. 2310.02640 null
2023-10-02 Towards human-like spoken dialogue generation between AI agents from written dialogue Kentaro Mitsui et.al. 2310.01088 null
2023-10-01 Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech Dareen Alharthi et.al. 2310.00706 null
2024-03-11 Fewer-token Neural Speech Codec with Time-invariant Codes Yong Ren et.al. 2310.00014 link
2024-01-31 ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech Wenhao Guan et.al. 2309.17056 null
2023-09-29 Low-Resource Self-Supervised Learning with SSL-Enhanced TTS Po-chun Hsu et.al. 2309.17020 null
2023-09-29 Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features Yuxiang Zhang et.al. 2309.16954 null
2023-12-18 High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models Chunyu Qiang et.al. 2309.15512 null
2024-01-09 BiSinger: Bilingual Singing Voice Synthesis Huali Zhou et.al. 2309.14089 link
2023-10-07 HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS Dake Guo et.al. 2309.13907 null
2023-09-24 VoiceLDM: Text-to-Speech with Environmental Context Yeonghyeon Lee et.al. 2309.13664 null
2023-09-24 Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control Aya Watanabe et.al. 2309.13509 null
2023-09-22 DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis Yu Gu et.al. 2309.12792 null
2023-09-22 Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts Shun Lei et.al. 2309.11977 null
2023-09-21 The Impact of Silence on Speech Anti-Spoofing Yuxiang Zhang et.al. 2309.11827 null
2023-09-21 Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech Rui Liu et.al. 2309.11724 link
2023-09-20 Speak While You Think: Streaming Speech Synthesis During Text Generation Avihu Dekel et.al. 2309.11210 null
2023-09-20 Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model Xinyu Zhou et.al. 2309.11000 link
2023-09-19 Exploring Speech Enhancement for Low-resource Speech Synthesis Zhaoheng Ni et.al. 2309.10795 null
2023-09-19 Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition Ziyang Ma et.al. 2309.10294 null
2023-09-17 Augmenting text for spoken language understanding with Large Language Models Roshan Sharma et.al. 2309.09390 null
2023-09-16 FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework Jianzong Wang et.al. 2309.08837 null
2023-09-15 Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech Dariusz Piotrowski et.al. 2309.08255 null
2023-09-15 HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods Hyun-seo Shin et.al. 2309.08208 link
2023-12-27 PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions Reo Shimizu et.al. 2309.08140 null
2023-09-15 Diversity-based core-set selection for text-to-speech with linguistic and acoustic features Kentaro Seki et.al. 2309.08127 null
2023-09-14 Direct Text to Speech Translation System using Acoustic Units Victoria Mingote et.al. 2309.07478 null
2023-10-07 FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec Zhihao Du et.al. 2309.07405 link
2023-09-13 DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation Zhichao Wu et.al. 2309.06787 null
2023-09-11 Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP Jinzuomu Zhong et.al. 2309.05423 link
2024-01-16 VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching Yiwei Guo et.al. 2309.05027 null
2023-09-08 Cross-Utterance Conditioned VAE for Speech Generation Yang Li et.al. 2309.04156 null
2023-09-07 Large-Scale Automatic Audiobook Creation Brendan Walsh et.al. 2309.03926 null
2023-09-11 GRASS: Unified Generation Model for Speech-to-Semantic Tasks Aobo Xia et.al. 2309.02780 null
2023-09-12 MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023 Zhihang Xu et.al. 2309.02743 null
2023-10-12 PromptTTS 2: Describing and Generating Voices with Text Prompt Yichong Leng et.al. 2309.02285 null
2023-09-04 A Comparative Analysis of Pretrained Language Models for Text-to-Speech Marcel Granero-Moya et.al. 2309.01576 null
2023-09-02 DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin Tao Li et.al. 2309.00883 null
2023-12-18 Learning Speech Representation From Contrastive Token-Acoustic Pretraining Chunyu Qiang et.al. 2309.00424 null
2023-09-01 The FruitShell French synthesis system at the Blizzard 2023 Challenge Xin Qi et.al. 2309.00223 null
2023-08-31 QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning Haohan Guo et.al. 2309.00126 null
2024-01-23 SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models Xin Zhang et.al. 2308.16692 link
2023-08-31 Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis Weiqin Li et.al. 2308.16593 null
2023-08-31 Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information Jie Chen et.al. 2308.16577 null
2023-08-31 LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech Jie Chen et.al. 2308.16569 null
2023-08-30 CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis Yi Meng et.al. 2308.16021 null
2023-09-01 The DeepZen Speech Synthesis System for Blizzard Challenge 2023 Christophe Veaux et.al. 2308.15945 null
2023-08-28 Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech Hyungchan Yoon et.al. 2308.14909 null
2023-09-04 Rep2wav: Noise Robust text-to-speech Using self-supervised representations Qiushi Zhu et.al. 2308.14553 null
2023-08-28 TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models Shengpeng Ji et.al. 2308.14430 link
2023-09-02 Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder Xuyuan Li et.al. 2308.13365 null
2023-08-24 Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations Wenbin Wang et.al. 2308.13007 null
2023-09-22 Sparks of Large Audio Models: A Survey and Outlook Siddique Latif et.al. 2308.12792 null
2023-10-25 SeamlessM4T: Massively Multilingual & Multimodal Machine Translation Seamless Communication et.al. 2308.11596 link
2023-08-31 Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models Heyang Xue et.al. 2308.10428 null
2023-08-16 AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis Hrishikesh Viswanath et.al. 2308.08577 null
2023-08-14 SpeechX: Neural Codec Language Model as a Versatile Speech Transformer Xiaofei Wang et.al. 2308.06873 null
2023-08-12 Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation Zhichao Wang et.al. 2308.06457 link
2023-09-09 AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining Haohe Liu et.al. 2308.05734 link
2023-08-09 Data Player: Automatic Generation of Data Videos with Narration-Animation Interplay Leixian Shen et.al. 2308.04703 null
2023-08-08 Towards an AI to Win Ghana's National Science and Maths Quiz George Boateng et.al. 2308.04333 link
2023-08-08 WonderFlow: Narration-Centric Design of Animated Data Videos Yun Wang et.al. 2308.04040 null
2023-08-04 Let's Give a Voice to Conversational Agents in Virtual Reality Michele Yin et.al. 2308.02665 link
2023-08-03 Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation Minsu Kim et.al. 2308.01831 link
2023-08-02 SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis Ramanan Sivaguru et.al. 2308.01018 null
2023-07-07 Artificial Eye for the Blind Abhinav Benagi et.al. 2308.00801 null
2023-07-31 Multilingual context-based pronunciation learning for Text-to-Speech Giulia Comini et.al. 2307.16709 null
2023-07-31 Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech Guangyan Zhang et.al. 2307.16679 null
2023-07-31 Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings Manuel Sam Ribeiro et.al. 2307.16643 null
2023-07-31 DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training Hyung-Seok Oh et.al. 2307.16549 link
2023-07-31 VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design Jungil Kong et.al. 2307.16430 null
2023-07-30 Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation Yuanhao Chen et.al. 2307.16199 link
2023-07-29 METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer Xinfa Zhu et.al. 2307.15951 null
2023-12-18 Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding Chunyu Qiang et.al. 2307.15484 null
2023-07-20 SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer Daegyeom Kim et.al. 2307.10550 link
2023-07-18 SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs Yinghao Aaron Li et.al. 2307.09435 null
2023-09-28 Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts Ziyue Jiang et.al. 2307.07218 null
2023-07-13 Controllable Emphasis with zero data for text-to-speech Arnaud Joly et.al. 2307.07062 null
2023-07-11 On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis Siyang Wang et.al. 2307.05132 null
2023-07-10 The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task Kun Song et.al. 2307.04630 null
2023-10-07 ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading Yujia Xiao et.al. 2307.00782 null
2023-06-28 EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech Daria Diatlova et.al. 2307.00024 link
2023-06-29 High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units Junchen Lu et.al. 2306.17005 null
2023-06-28 UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data Heeseung Kim et.al. 2306.16083 link
2023-10-19 Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Matthew Le et.al. 2306.15687 null
2023-06-27 GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech Yahuan Cong et.al. 2306.15304 null
2023-06-25 DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech Sen Liu et.al. 2306.14145 null
2023-06-21 Visual-Aware Text-to-Speech Mohan Zhou et.al. 2306.12020 null
2023-06-21 Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer Jakub Swiatkowski et.al. 2306.11662 null
2023-06-16 Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation Kishor Kayyar Lakshminarayana et.al. 2306.10152 null
2023-06-16 CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages Frederico S. Oliveira et.al. 2306.10097 null
2023-06-14 Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation Zheng Liang et.al. 2306.08588 null
2023-06-14 Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects Xinghua Qu et.al. 2306.08219 link
2023-11-20 StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Yinghao Aaron Li et.al. 2306.07691 null
2024-01-18 UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding Chenpeng Du et.al. 2306.07547 null
2023-06-13 PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling Ji-Sang Hwang et.al. 2306.07489 null
2023-06-09 Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech Shijun Wang et.al. 2306.05709 null
2023-06-08 VIFS: An End-to-End Variational Inference for Foley Sound Synthesis Junhyeok Lee et.al. 2306.05004 link
2023-07-11 Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge Wenhao Guan et.al. 2306.04301 null
2023-06-06 Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias Ziyue Jiang et.al. 2306.03509 null
2023-08-02 Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis Zhenhui Ye et.al. 2306.03504 null
2023-06-05 Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis Dengfeng Ke et.al. 2306.02593 null
2023-06-05 Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model Hoyeon Lee et.al. 2306.02579 null
2023-06-05 Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming Xinlei Niu et.al. 2306.02568 link
2023-06-02 Towards Robust FastSpeech 2 by Modelling Residual Multimodality Fabian Kögel et.al. 2306.01442 link
2023-05-30 Towards Selection of Text-to-speech Data to Augment ASR Training Shuo Liu et.al. 2306.00998 null
2023-06-01 EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis Haobin Tang et.al. 2306.00648 null
2023-06-01 The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech Phat Do et.al. 2306.00535 null
2023-05-31 Text-to-Speech Pipeline for Swiss German -- A comparison Tobias Bollinger et.al. 2305.19750 null
2023-05-31 XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech Linh The Nguyen et.al. 2305.19709 link
2023-06-01 PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions Guanghou Liu et.al. 2305.19522 null
2023-05-30 Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages Phat Do et.al. 2305.19396 null
2023-05-30 Make-A-Voice: Unified Voice Synthesis With Discrete Representation Rongjie Huang et.al. 2305.19269 null
2023-05-30 STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions Michel Plüss et.al. 2305.18855 null
2023-05-30 LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus Yuma Koizumi et.al. 2305.18802 null
2023-10-09 An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization Fei Kong et.al. 2305.18355 link
2023-05-29 ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation Ambuj Mehrish et.al. 2305.18028 link
2023-05-29 Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis Erik Ekstedt et.al. 2305.17971 null
2023-07-25 StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation Kun Song et.al. 2305.17732 null
2023-05-28 Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS Sewade Ogun et.al. 2305.17724 link
2023-07-19 Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on the False Alarms in Automated Speech Recognition Testing Julia Kaiwen Lau et.al. 2305.17445 link
2023-05-26 DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction Vineet Bhat et.al. 2305.16957 null
2023-05-25 Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion Rui Liu et.al. 2305.16353 link
2023-05-22 Text Generation with Speech Synthesis for ASR Data Augmentation Zhuangqun Huang et.al. 2305.16333 null
2023-05-25 VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation Tianrui Wang et.al. 2305.16107 null
2023-05-25 Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration Rustem Yeshpanov et.al. 2305.15749 link
2024-02-05 LAraBench: Benchmarking Arabic AI with Large Language Models Ahmed Abdelali et.al. 2305.14982 null
2023-05-23 EfficientSpeech: An On-Device Text to Speech Model Rowel Atienza et.al. 2305.13905 link
2023-05-23 ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models Minki Kang et.al. 2305.13831 null
2023-05-22 U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech Xin Jing et.al. 2305.13195 null
2023-05-25 EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels Kari Ali Noriy et.al. 2305.13137 link
2023-05-22 ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer Huadai Liu et.al. 2305.12708 null
2023-05-21 VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages Shivam Mhaskar et.al. 2305.12518 null
2023-05-26 Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus Detai Xin et.al. 2305.12442 link
2023-05-20 ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios Yuyue Wang et.al. 2305.12200 null
2023-05-19 MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting Neil Shah et.al. 2305.11926 null
2024-02-20 Data Redaction from Conditional Generative Models Zhifeng Kong et.al. 2305.11351 null
2023-05-18 Parameter-Efficient Learning for Text-to-Speech Accent Adaptation Li-Jen Yang et.al. 2305.11320 link
2023-05-19 Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation Martijn Bartelds et.al. 2305.10951 link
2023-09-30 Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data Yusheng Tian et.al. 2305.10891 link
2023-05-18 FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs Won Jang et.al. 2305.10823 null
2023-05-18 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training Zhenhui Ye et.al. 2305.10763 null
2023-08-29 a unified front-end framework for english text-to-speech synthesis Zelin Ying et.al. 2305.10666 null
2023-09-19 Controllable Speaking Styles Using a Large Language Model Atli Thor Sigurgeirsson et.al. 2305.10321 null
2023-05-23 Better speech synthesis through scaling James Betker et.al. 2305.07243 link
2023-10-29 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model Zhen Ye et.al. 2305.06908 link
2023-05-08 Accented Text-to-Speech Synthesis with Limited Data Xuehao Zhou et.al. 2305.04816 null
2023-05-03 M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis Jinlong Xue et.al. 2305.02269 null
2023-05-30 A Review of Deep Learning Techniques for Speech Processing Ambuj Mehrish et.al. 2305.00359 null
2023-04-26 Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis Ye-Xin Lu et.al. 2304.13270 null
2023-04-25 Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge Chenpeng Du et.al. 2304.13121 null
2023-04-24 Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model Kenichi Fujita et.al. 2304.11976 null
2023-04-23 DiffVoice: Text-to-Speech with Latent Diffusion Zhijun Liu et.al. 2304.11750 null
2023-04-23 SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model Jianzong Wang et.al. 2304.11547 null
2023-05-30 NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers Kai Shen et.al. 2304.09116 null
2023-04-16 A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers Juan Zuluaga-Gomez et.al. 2304.07842 null
2023-04-13 Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis Shun Lei et.al. 2304.06359 null
2023-04-10 Enhancing Speech-to-Speech Translation with Multiple TTS Targets Jiatong Shi et.al. 2304.04618 null
2023-04-07 ArmanTTS single-speaker Persian dataset Mohammd Hasan Shamgholi et.al. 2304.03585 null
2023-04-03 Ensemble prosody prediction for expressive speech synthesis Tian Huey Teh et.al. 2304.00714 null
2023-03-29 AraSpot: Arabic Spoken Command Spotting Mahmoud Salhab et.al. 2303.16621 link
2023-03-28 Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages Seongyeon Park et.al. 2303.15669 link
2023-03-27 Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis Karren Yang et.al. 2303.14885 null
2023-03-24 Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis Takuhiro Kaneko et.al. 2303.13909 null
2023-04-02 A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI Chenshuang Zhang et.al. 2303.13336 null
2023-03-20 Code-Switching Text Generation and Injection in Mandarin-English ASR Haibin Yu et.al. 2303.10949 null
2023-03-14 Controlling High-Dimensional Data With Sparse Input Dan Andrei Iliescu et.al. 2303.09446 null
2023-03-09 Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports Hyunseung Chung et.al. 2303.09395 link
2023-03-15 Cross-speaker Emotion Transfer by Manipulating Speech Style Latents Suhee Jo et.al. 2303.08329 null
2023-03-14 QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis Haobin Tang et.al. 2303.07682 null
2023-03-10 An End-to-End Neural Network for Image-to-Audio Transformation Liu Chen et.al. 2303.06078 null
2023-03-09 Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation Qi Chen et.al. 2303.05322 link
2023-03-07 Do Prosody Transfer Models Transfer Prosody? Atli Thor Sigurgeirsson et.al. 2303.04289 null
2023-03-07 Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Ziqiang Zhang et.al. 2303.03926 null
2023-03-02 Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding Yingting Li et.al. 2303.03267 link
2023-03-08 FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model Ruiqing Xue et.al. 2303.02939 null
2023-08-14 Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations Yuma Koizumi et.al. 2303.01664 null
2023-03-11 Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities Shijun Wang et.al. 2303.01508 null
2023-12-17 ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations Neil Shah et.al. 2303.01261 null
2023-03-02 LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion Chunfeng Wang et.al. 2303.01086 null
2023-03-02 Leveraging Large Text Corpora for End-to-End Speech Summarization Kohei Matsuura et.al. 2303.00978 null
2023-03-01 DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction Raviteja Anantha et.al. 2303.00171 null
2023-02-28 ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus Ajinkya Kulkarni et.al. 2303.00069 null
2023-02-28 Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners Jocelyn Huang et.al. 2302.14523 null
2023-06-12 CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis Ji-Hoon Kim et.al. 2302.14370 null
2023-05-19 UniFLG: Unified Facial Landmark Generator from Text or Speech Kentaro Mitsui et.al. 2302.14337 null
2023-02-27 Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech Jiyoung Lee et.al. 2302.13700 link
2023-02-27 Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech Dong Yang et.al. 2302.13652 null
2023-02-27 Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow Yoonhyung Lee et.al. 2302.13458 null
2023-06-06 PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS Junhyeok Lee et.al. 2302.12391 link
2023-02-21 Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition Leyuan Qu et.al. 2302.09723 null
2023-02-23 QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion Houjian Guo et.al. 2302.08296 link
2023-02-13 Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages Sudhanshu Srivastava et.al. 2302.06227 null
2023-02-08 A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech Li-Wei Chen et.al. 2302.04215 link
2023-02-07 Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision Eugene Kharitonov et.al. 2302.03540 null
2023-02-15 MAC: A unified framework boosting low resource automatic speech recognition Zeping Min et.al. 2302.03498 null
2023-06-25 InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt Dongchao Yang et.al. 2301.13662 link
2023-03-01 UzbekTagger: The rule-based POS tagger for Uzbek language Maksud Sharipov et.al. 2301.12711 null
2023-05-27 Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining Takaaki Saeki et.al. 2301.12596 link
2023-01-31 Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker Navjot Kaur et.al. 2301.12331 link
2023-01-26 On granularity of prosodic representations in expressive text-to-speech Mikolaj Babianski et.al. 2301.11446 null
2023-01-26 Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study Massa Baali et.al. 2301.09099 link
2023-01-20 Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions Yinghao Aaron Li et.al. 2301.08810 null
2023-01-11 Modelling low-resource accents without accent-specific TTS frontend Georgi Tinchev et.al. 2301.04606 null
2022-12-11 BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm Yu-Wen Chen et.al. 2301.04120 link
2023-01-10 UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion Haogeng Liu et.al. 2301.03801 null
2023-01-10 Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation Abdullah Shahid et.al. 2301.03751 null
2023-09-19 Applying Automated Machine Translation to Educational Video Courses Linden Wang et.al. 2301.03141 null
2023-01-06 Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition David M. Chan et.al. 2301.02736 null
2023-01-05 Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Chengyi Wang et.al. 2301.02111 link
2022-12-11 MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset Kailin Liang et.al. 2301.00657 link
2022-12-30 ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech Zehua Chen et.al. 2212.14518 null
2022-12-29 StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models Yinghao Aaron Li et.al. 2212.14227 link
2022-12-22 HMM-based data augmentation for E2E systems for building conversational speech synthesis systems Ishika Gupta et.al. 2212.11982 null
2022-12-21 ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement Wei-Ning Hsu et.al. 2212.11377 null
2022-12-20 TTS-Guided Training for Accent Conversion Without Parallel Data Yi Zhou et.al. 2212.10204 null
2023-06-28 Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling Tuomo Raitio et.al. 2212.10075 null
2022-12-16 Speech Aware Dialog System Technology Challenge (DSTC11) Hagen Soltau et.al. 2212.08704 null
2022-12-16 Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder Yusuke Yasuda et.al. 2212.08329 null
2022-12-16 Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language Yusuke Yasuda et.al. 2212.08321 null
2022-12-15 RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis Shinhyeok Oh et.al. 2212.07939 link
2022-12-14 Probing Deep Speaker Embeddings for Speaker-related Tasks Zifeng Zhao et.al. 2212.07068 null
2022-12-08 SpeechLMScore: Evaluating speech generation using speech language model Soumi Maiti et.al. 2212.04559 link
2023-04-04 Learning to Dub Movies via Hierarchical Prosody Models Gaoxiang Cong et.al. 2212.04054 link
2022-12-07 Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning Ankur Debnath et.al. 2212.03558 null
2022-12-07 Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue Daxin Tan et.al. 2212.03398 null
2022-12-06 UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis Yi Lei et.al. 2212.01546 null
2022-11-30 SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech Byoung Jin Choi et.al. 2211.16866 null
2022-11-29 Controllable speech synthesis by learning discrete phoneme-level prosodic representations Nikolaos Ellinas et.al. 2211.16307 null
2023-05-25 Evaluating and reducing the distance between synthetic and real speech distributions Christoph Minixhofer et.al. 2211.16049 null
2022-11-26 Contextual Expressive Text-to-Speech Jianhong Tu et.al. 2211.14548 null
2022-12-05 Efficient Incremental Text-to-Speech on GPUs Muyang Du et.al. 2211.13939 null
2023-03-21 Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems? Xuan Shi et.al. 2211.13868 link
2022-11-23 IMaSC -- ICFOSS Malayalam Speech Corpus Deepa P Gopinath et.al. 2211.12796 null
2022-11-22 PromptTTS: Controllable Text-to-Speech with Text Descriptions Zhifang Guo et.al. 2211.12171 null
2022-11-04 Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech Xin Zhang et.al. 2211.09731 null
2023-02-17 Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar et.al. 2211.09536 link
2023-02-16 EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance Yiwei Guo et.al. 2211.09496 null
2022-11-17 Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation Chunyu Qiang et.al. 2211.09495 null
2022-11-17 NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Hyeong-Seok Choi et.al. 2211.09407 null
2023-03-14 Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models Minki Kang et.al. 2211.09383 null
2023-01-04 Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation Xin Yuan et.al. 2211.09365 null
2022-11-14 SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech Perry Lam et.al. 2211.07283 null
2023-05-24 Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing Jacob J Webber et.al. 2211.06989 null
2023-05-29 OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta et.al. 2211.06892 link
2023-05-29 Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations Yoori Oh et.al. 2211.06160 null
2022-12-04 ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Xiaoran Fan et.al. 2211.03545 link
2022-11-07 Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder Jan Melechovsky et.al. 2211.03316 link
2022-11-06 Parallel Attention Forcing for Machine Translation Qingyun Dou et.al. 2211.03237 null
2022-11-06 An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space Jihwan Lee et.al. 2211.03078 null
2022-11-04 NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS Dongchao Yang et.al. 2211.02448 null
2022-11-04 Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts Detai Xin et.al. 2211.02336 null
2023-04-16 Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS Ziqi Liang et.al. 2211.01948 null
2022-11-01 Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages Anusha Prakash et.al. 2211.01338 null
2023-05-28 DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP Kun Song et.al. 2211.01087 null
2022-11-22 Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement Wei Song et.al. 2211.00967 null
2022-11-01 Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers Cheng-Ping Hsieh et.al. 2211.00585 link
2023-06-11 Generating Multilingual Gender-Ambiguous Text-to-Speech Voices Konstantinos Markopoulos et.al. 2211.00375 null
2023-05-07 Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features Alexandra Vioni et.al. 2211.00342 null
2022-11-02 Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS Kun Song et.al. 2210.17349 null
2024-02-27 Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation Nikolaos Ellinas et.al. 2210.17264 null
2022-10-31 Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection Luigi Attorresi et.al. 2210.17222 null
2022-10-31 Structured State Space Decoder for Speech Recognition and Synthesis Koichi Miyazaki et.al. 2210.17098 null
2022-10-28 Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders Jason Fong et.al. 2210.16045 null
2023-02-21 Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform Masaya Kawamura et.al. 2210.15975 link
2023-02-22 Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis Yuma Shirahata et.al. 2210.15964 null
2022-10-28 Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation Nobuyuki Morioka et.al. 2210.15868 null
2023-03-15 Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech Takaaki Saeki et.al. 2210.15447 null
2022-10-27 Explicit Intensity Control for Accented Text-to-speech Rui Liu et.al. 2210.15364 null
2022-10-27 FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis Yifan Hu et.al. 2210.15360 link
2022-10-26 Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection Kentaro Seki et.al. 2210.14850 null
2022-10-25 Semi-Supervised Learning Based on Reference Model for Low-resource TTS Xulong Zhang et.al. 2210.14723 null
2022-10-26 Cover Reproducible Steganography via Deep Generative Models Kejiang Chen et.al. 2210.14632 null
2022-10-26 Improving Speech-to-Speech Translation Through Unlabeled Text Xuan-Phi Nguyen et.al. 2210.14514 null
2022-10-26 The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge Yuhao Liang et.al. 2210.14448 null
2022-10-25 Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data Xulong Zhang et.al. 2210.13803 null
2023-09-17 HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation Chunhui Wang et.al. 2210.12740 null
2022-10-21 Low-Resource Multilingual and Zero-Shot Multispeaker TTS Florian Lux et.al. 2210.12223 link
2022-10-21 Adaptive re-calibration of channel-wise features for Adversarial Audio Classification Vardhan Dongre et.al. 2210.11722 null
2022-10-20 Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS Chunyu Qiang et.al. 2210.11429 null
2022-10-17 Towards Relation Extraction From Speech Tongtong Wu et.al. 2210.08759 link
2023-02-08 Generating Synthetic Speech from SpokenVocab for Speech Translation Jinming Zhao et.al. 2210.08174 link
2022-10-17 LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge Yan Jia et.al. 2210.07749 null
2022-10-20 Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy Sarina Meyer et.al. 2210.07002 link
2022-10-13 Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar Aolan Sun et.al. 2210.06877 null
2022-10-12 Can we use Common Voice to train a Multi-Speaker TTS system? Sewade Ogun et.al. 2210.06370 null
2023-06-01 SQuId: Measuring Speech Naturalness in Many Languages Thibault Sellam et.al. 2210.06324 null
2022-11-22 Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech Byoung Jin Choi et.al. 2210.05979 null
2022-10-06 An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era Andreas Triantafyllopoulos et.al. 2210.03538 null
2022-09-29 Facial Landmark Predictions with Applications to Metaverse Qiao Han et.al. 2209.14698 link
2022-09-26 Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech Yusuke Nakai et.al. 2209.12549 null
2022-09-22 EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models Perry Lam et.al. 2209.10890 null
2022-09-22 MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Yifan Hu et.al. 2209.10848 link
2022-09-22 Controllable Accented Text-to-Speech Synthesis Rui Liu et.al. 2209.10804 null
2022-09-16 TIMIT-TTS: a Text-to-Speech Dataset for Multimodal Synthetic Media Detection Davide Salvi et.al. 2209.08000 null
2022-09-14 Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset Michael Chinen et.al. 2209.06358 null
2022-09-08 SANIP: Shopping Assistant and Navigation for the visually impaired Shubham Deshmukh et.al. 2209.03570 null
2022-09-07 Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech Huu-Tien Dang et.al. 2209.02971 null
2022-09-02 Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model Jennifer Drexler Fox et.al. 2209.01250 null
2022-08-28 Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks Lev Finkelstein et.al. 2208.13183 null
2022-10-04 Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale Aditya Agarwal et.al. 2208.09796 null
2022-08-21 Visualising Model Training via Vowel Space for Text-To-Speech Systems Binu Abeysinghe et.al. 2208.09775 link
2022-08-15 Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0 Mohammed Salah Al-Radhi et.al. 2208.07122 null
2022-12-28 Speech Synthesis with Mixed Emotions Kun Zhou et.al. 2208.05890 null
2022-08-03 A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Qibing Bai et.al. 2208.02189 null
2022-07-29 Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation Giulia Comini et.al. 2207.14607 null
2022-07-25 Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis Raul Fernandez et.al. 2207.12262 null
2022-07-01 A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese Song Zhang et.al. 2207.12089 null
2022-07-20 When Is TTS Augmentation Through a Pivot Language Useful? Nathaniel Robinson et.al. 2207.09889 link
2022-07-11 LIP: Lightweight Intelligent Preprocessor for meaningful text-to-speech Harshvardhan Anand et.al. 2207.07118 null
2022-07-13 ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Rongjie Huang et.al. 2207.06389 link
2022-07-13 Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu et.al. 2207.06088 null
2022-07-13 SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate Nabarun Goswami et.al. 2207.06011 null
2022-07-13 Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS Yookyung Shin et.al. 2207.06000 null
2022-07-13 A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System Yi-Chiao Wu et.al. 2207.05913 null
2022-07-12 Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition Rodolfo Zevallos et.al. 2207.05498 null
2022-07-12 End-to-end speech recognition modeling from de-identified data Martin Flechl et.al. 2207.05469 null
2022-07-11 Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data Naoki Makishima et.al. 2207.04659 null
2022-07-11 DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders Yanqing Liu et.al. 2207.04646 null
2023-01-02 Dreamento: an open-source dream engineering toolbox for sleep EEG wearables Mahdad Jafarzadeh Esfahani et.al. 2207.03977 link
2022-07-07 BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus Josh Meyer et.al. 2207.03546 link
2022-07-05 Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion Yi Lei et.al. 2207.01832 null
2022-07-04 BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model Brooke Stephenson et.al. 2207.01718 null
2022-07-04 Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS) Ariadna Sanchez et.al. 2207.01547 null
2022-07-04 Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS) Ziyao Zhang et.al. 2207.01507 null
2023-03-13 DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech Keon Lee et.al. 2207.01063 link
2022-07-02 Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need Daniel Korzekwa et.al. 2207.00774 null
2022-07-01 Building African Voices Perez Ogayo et.al. 2207.00688 link
2022-07-01 Automatic Evaluation of Speaker Similarity Deja Kamil et.al. 2207.00344 null
2022-08-03 Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding Wei-Ping Huang et.al. 2206.15427 null
2022-06-30 R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS Kyle Kastner et.al. 2206.15276 null
2022-07-01 Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems Hyun-Wook Yoon et.al. 2206.15067 null
2022-06-30 TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder Eunwoo Song et.al. 2206.14984 null
2022-06-29 Improving Deliberation by Text-Only and Semi-Supervised Training Ke Hu et.al. 2206.14716 null
2022-06-29 Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody Peter Makarov et.al. 2206.14643 null
2022-06-28 Expressive, Variable, and Controllable Duration Modelling in TTS Ammar Abbas et.al. 2206.14165 null
2022-06-28 Comparison of Speech Representations for the MOS Prediction System Aki Kunikoshi et.al. 2206.13817 null
2022-06-22 A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data Raviraj Joshi et.al. 2206.13240 null
2022-06-25 Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations Chin-Cheng Hsu et.al. 2206.12662 null
2022-10-21 Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech Florian Lux et.al. 2206.12229 link
2022-06-24 SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech Hyunjae Cho et.al. 2206.12132 null
2022-06-24 End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue Kentaro Mitsui et.al. 2206.12040 null
2022-05-29 Exploiting Transliterated Words for Finding Similarity in Inter-Language News Articles using Machine Learning Sameea Naeem et.al. 2206.11860 null
2022-06-21 Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS Kenta Udagawa et.al. 2206.10256 null
2022-06-24 Towards Optimizing OCR for Accessibility Peya Mowar et.al. 2206.10254 null
2022-06-16 Automatic Prosody Annotation with Pre-Trained Text-Speech Model Ziqian Dai et.al. 2206.07956 link
2022-11-16 NatiQ: An End-to-end Text-to-Speech System for Arabic Ahmed Abdelali et.al. 2206.07373 null
2022-06-15 Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning Rui Liu et.al. 2206.07229 link
2022-12-12 A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation Junhui Zhang et.al. 2206.04922 null
2022-06-09 Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos Alexander Waibel et.al. 2206.04523 null
2022-06-07 FlexLip: A Controllable Text-to-Lip System Dan Oneata et.al. 2206.03206 null
2022-10-11 UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder Jiachen Lian et.al. 2206.02512 null
2023-10-19 Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech Ziyue Jiang et.al. 2206.02147 link
2022-11-02 AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation Kun Song et.al. 2206.00208 null
2022-05-31 Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish Alp Öktem et.al. 2205.15599 link
2023-11-20 StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis Yinghao Aaron Li et.al. 2205.15439 link
2022-05-30 Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data Sungwon Kim et.al. 2205.15370 null
2022-05-26 QSpeech: Low-Qubit Quantum Speech Application Toolkit Zhenhou Hong et.al. 2205.13221 link
2022-11-10 T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation Paul-Ambroise Duquenne et.al. 2205.12216 null
2022-05-20 PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit Hui Zhang et.al. 2205.12007 link
2022-05-24 TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS Xulong Zhang et.al. 2205.11824 null
2022-10-12 GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Rongjie Huang et.al. 2205.07211 link
2022-05-13 Talking Face Generation with Multilingual TTS Hyoung-Kyu Song et.al. 2205.06421 null
2022-05-10 NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality Xu Tan et.al. 2205.04421 link
2022-05-09 Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech Yang Li et.al. 2205.04120 link
2022-05-09 ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence Sangshin Oh et.al. 2205.04104 null
2022-07-14 Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss Efthymios Georgiou et.al. 2204.13437 null
2022-04-25 SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech Zhenhui Ye et.al. 2204.11792 null
2022-04-22 LibriS2S: A German-English Speech-to-Speech Translation Corpus Pedro Jeuris et.al. 2204.10593 link
2022-07-05 Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation Ryo Terashima et.al. 2204.10020 null
2022-04-21 FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis Rongjie Huang et.al. 2204.09934 link
2022-04-20 Audio Deep Fake Detection System with Neural Stitching for ADD 2022 Rui Yan et.al. 2204.08720 null
2022-04-14 Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech Cong Zhang et.al. 2204.07228 null
2022-12-09 Study of Indian English Pronunciation Variabilities relative to Received Pronunciation Priyanshi Pal et.al. 2204.06502 null
2022-04-12 Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch Hanbin Bae et.al. 2204.05753 null
2023-01-30 The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance Lin Zhang et.al. 2204.05177 null
2022-10-27 Fine-grained Noise Control for Multispeaker Speech Synthesis Karolos Nikitaras et.al. 2204.05070 null
2022-08-31 Karaoker: Alignment-free singing voice synthesis with speech training data Panos Kakoulidis et.al. 2204.04127 null
2022-08-15 Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech Jae-Sung Bae et.al. 2204.04004 null
2022-04-07 Arabic Text-To-Speech (TTS) Data Preparation Hala Al Masri et.al. 2204.03255 null
2022-04-07 Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis Yutian Wang et.al. 2204.03238 null
2022-08-24 SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis Georgia Maniati et.al. 2204.03040 null
2022-09-13 Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation Sravya Popuri et.al. 2204.02967 null
2022-07-02 Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification Jin Woo Lee et.al. 2204.02639 null
2023-08-28 Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech Hyungchan Yoon et.al. 2204.02172 null
2022-09-07 Deliberation Model for On-Device Spoken Language Understanding Duc Le et.al. 2204.01893 null
2022-12-14 Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck Youngsik Eom et.al. 2204.01387 null
2022-11-11 Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis Yixuan Zhou et.al. 2204.00990 null
2022-06-30 VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature Chenpeng Du et.al. 2204.00768 null
2022-04-01 AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios Yihan Wu et.al. 2204.00436 null
2022-04-01 Text-To-Speech Data Augmentation for Low Resource Speech Recognition Rodolfo Zevallos et.al. 2204.00291 null
2022-07-19 Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech Guangyan Zhang et.al. 2203.17190 null
2022-03-31 An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer Wenlin Dai et.al. 2203.16954 link
2022-07-11 WavThruVec: Latent speech representation as intermediate features for neural speech synthesis Hubert Siuzdak et.al. 2203.16930 null
2022-03-31 A Character-level Span-based Model for Mandarin Prosodic Structure Prediction Xueyuan Chen et.al. 2203.16922 link
2022-07-01 JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech Dan Lim et.al. 2203.16852 link
2022-03-31 Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset Zehui Yang et.al. 2203.16844 null
2022-03-31 NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism Jingbei Li et.al. 2203.16838 link
2022-03-31 Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition Anirudh Gupta et.al. 2203.16823 null
2022-04-21 Does Audio Deepfake Detection Generalize? Nicolas M. Müller et.al. 2203.16263 null
2022-03-30 End to End Lip Synchronization with a Temporal AutoEncoder Yoav Shalev et.al. 2203.16224 link
2022-08-15 Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition Junrui Ni et.al. 2203.15796 link
2022-06-29 DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning Takaaki Saeki et.al. 2203.15683 null
2022-11-05 Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation Rendi Chevi et.al. 2203.15643 link
2022-10-06 Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus Minchan Kim et.al. 2203.15447 null
2022-07-11 VoiceMe: Personalized voice generation in TTS Pol van Rijn et.al. 2203.15379 link

(back to top)

About

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

License:Apache License 2.0


Languages

Language:Python 100.0%