PRML Lab. Speech Team

Pattern Recognition & Machine Learning Lab

Korea University

2025

Under review, [Paper] [Demo]

FillerSpeech: Towards Human-Like Text-to-Speech Synthesis with Filler Injection and Filler Style Control


Under review, [Paper] [Demo]

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech


Under review, [Paper] [Demo]

EmoSphere-SER: Enhancing Speech Emotion Recognition through Spherical Representation with Auxiliary Classification


Under review, [Paper] [Demo]

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech


Under review, [Paper] [Demo]

VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion


Under review, [Paper] [Demo]

Hierarchical Diffusion Model for Zero-Shot Singing Voice Synthesis with MIDI Priors


ICASSP, [Paper] [Demo]

FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching


ICASSP, [Paper] [Demo]

JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis


IEEE Transactions on Affective Computing, [Paper] [Demo]

DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations Without Text Alignment


IEEE/ACM Transactions on Audio, Speech, and Language Processing, [Paper] [Demo]

UnitCorrect: Unit-based Mispronunciation Correcting System with a DTW-based Detection


IEEE Transactions on Affective Computing, [Paper] [Demo]

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector


ICLR, [Paper] [Demo]

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation


2024

SMC, [Paper] [Demo]

PromotiCon: Prompt-based Emotion Controllable Text-to-Speech via Prompt Generation and Matching


INTERSPEECH, [Paper] [Demo]

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech


TASLP, [Paper] [Demo]

DiffProsody: Diffusion-Based Latent Prosody Generation for Expressive Speech Synthesis With Prosody Conditional Adversarial Training


TASLP, [Paper] [Demo]

Audio Super-Resolution With Robust Speech Representation Learning of Masked Autoencoder


ICASSP, [Paper] [Demo]

TranSentence: Speech-to-Speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data


ICASSP, [Paper] [Demo]

MIDI-Voice: Expressive Zero-shot Singing Voice Synthesis via MIDI-driven Priors


AAAI, [Paper] [Demo]

DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion


2023

INTERSPEECH, [Paper] [Demo]

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer


INTERSPEECH, [Paper] [Demo]

Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation


ACPR, [Paper] [Demo]

PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling


2022

NeurIPS, [Paper] [Demo]

HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representation for Speech Synthesis


TASLP, [Paper] [Demo]

Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck


ICPR, [Paper] [Demo]

StyleVC: Non-parallel Voice Conversion with Adversarial Style Generalization


ICASSP, [Paper] [Demo]

EmoQ-TTS: Emotion intensity Quantization for Fine-grained Controllable Emotional Text-to-Speech


ICASSP, [Paper] [Demo]

Fre-GAN 2: Fast and Efficient Frequency-consistent Audio Synthesis


ICASSP, [Paper] [Demo]

PVAE-TTS: High-Quality Adaptive Text-to-Speech via Progressive Variational Autoencoder


2021

NeurIPS, [Paper] [Demo]

VoiceMixer: Adversarial Voice Style Mixup


AAAI2021, [Paper] [Demo]

Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Recombination for Speech Synthesis


SMC, [Paper] [Demo]

GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints


INTERSPEECH, [Paper] [Demo]

Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech


INTERSPEECH, [Paper] [Demo]

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis


2020

INTERSPEECH, [Paper] [Demo]

Audio dequantization for high fidelity audio generation in flow-based neural vocoder