EMNLP,
[Paper]
[Demo]
FillerSpeech: Towards Human-Like Text-to-Speech Synthesis with Filler Injection and Filler Style Control
TNNLS,
[Paper]
[Demo]
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation by Hierarchical Variational Inference for Zero-shot Speech Synthesis
INTERSPEECH,
[Paper]
[Demo]
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
INTERSPEECH,
[Paper]
EmoSphere-SER: Enhancing Speech Emotion Recognition through Spherical Representation with Auxiliary Classification
INTERSPEECH,
[Paper]
[Demo]
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
INTERSPEECH,
[Paper]
[Demo]
VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion
TASLP,
[Paper]
[Demo]
Hierarchical Diffusion Model for Zero-Shot Singing Voice Synthesis with MIDI Priors
ICASSP,
[Paper]
[Demo]
FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching
ICASSP,
[Paper]
[Demo]
JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis
TAFFC,
[Paper]
[Demo]
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations Without Text Alignment
TASLP,
[Paper]
[Demo]
UnitCorrect: Unit-based Mispronunciation Correcting System with a DTW-based Detection
TAFFC,
[Paper]
[Demo]
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
ICLR,
[Paper]
[Demo]
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation