Under review,
[Paper]
[Demo]
FillerSpeech: Towards Human-Like Text-to-Speech Synthesis with Filler Injection and Filler Style Control
Under review,
[Paper]
[Demo]
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
Under review,
[Paper]
[Demo]
EmoSphere-SER: Enhancing Speech Emotion Recognition through Spherical Representation with Auxiliary Classification
Under review,
[Paper]
[Demo]
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
Under review,
[Paper]
[Demo]
VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion
Under review,
[Paper]
[Demo]
Hierarchical Diffusion Model for Zero-Shot Singing Voice Synthesis with MIDI Priors
ICASSP,
[Paper]
[Demo]
FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching
ICASSP,
[Paper]
[Demo]
JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis
IEEE Transactions on Affective Computing,
[Paper]
[Demo]
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations Without Text Alignment
IEEE/ACM Transactions on Audio, Speech, and Language Processing,
[Paper]
[Demo]
UnitCorrect: Unit-based Mispronunciation Correcting System with a DTW-based Detection
IEEE Transactions on Affective Computing,
[Paper]
[Demo]
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
ICLR,
[Paper]
[Demo]
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation