About me
Liumeng Xue is a Postdoctoral Researcher at Hong Kong University of Science and Technology, working with Prof.Yike Guo and Prof.Wei Xue. Now, I’m working on audio, music, and speech generation and understanding. Before that, I performed postdoctoral research at The Chinese University of Hong Kong, Shenzhen (CUHK-SZ), working with Prof.Haizhou Li and Prof.Zhizheng Wu. I was a co-founder of Amphion, an open-source platform for Audio, Music, and Speech Generation. I received Ph.D. degree from the Audio, Speech and Language Processing Laboratory at Northwestern Polytechnical University (ASLP@NWPU), supervised by Prof.Lei Xie. During my studies, I performed research at JD AI Lab (2018-2019), Tencent AI Lab (2021-2022) and Microsoft (2019-2020, 2021-2022). My research interests include audio, speech and language processing, audio, music, and speech generation.
News
- 🎉 Aug 30, 2024: Amphion: An Open-Source Audio, Music and Speech Generation Toolkit is accepted by IEEE SLT 2024.
- 🎉 Aug 30, 2024: Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion is accepted by IEEE SLT 2024.
- 🎉 Aug 20, 2024: The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings is accepted by ISCSLP 2024.
- 🎉 Aug 20, 2024: SingVisio is accepted by Computers & Graphics.
- 🎉 Jun 13, 2024: Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion is accepted and published by TASLP.
- 🎉 Jun 4, 2024: WenetSpeech4TTS, Single-Codec, TACA-Audiobook are accepted by INTERSPEECH2024.
- 🎉 May 31, 2024: Conversational Voice Clone Challenge (CoVoC) is launched.
- 🎉 Feb 28, 2024: IEEE SLT workshop 2024 is announced.
- 🎉 Dec 18, 2023: Amphion is released. Amphion is an open-source platform for Audio, Music, and Speech Generation.
- I built the fundamental framework to integrate various generative tasks into a unified pipeline, including data pre-processing, model building, training and inference, etc. Github
- I led the reproduction of several typical TTS models and released the pre-trained models. HuggingFace
- I developed and led the SingVisio project, an interactive visualization platform that makes the inner work mechanism of diffusion model easily understanble in the context of singing voice conversion. Try it.
Research Experience
- 2021.11 - 2022.10, Research Intern, Microsoft.
- 2021.06 - 2021.11, Research Intern, Tencent AI Lab.
- 2019.04 - 2020.06, Research Intern, Microsoft.
- 2018.10 - 2019.04, Research Intern, JD.COM AI Lab.
Selected Publications
-
Amphion: An open-source audio, music and speech generation toolkit, Xueyao Zhang*, Liumeng Xue*, Yicheng Gu*, Yuancheng Wang*, Jiaqi Li*, Haorui He, Chaoren Wang, Songting Liu, Xi Chen, Junan Zhang, Zihao Fang, Haopeng Chen, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu. IEEE SLT, 2024.
-
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion, Xueyao Zhang, Zihao Fang, Yicheng Gu, Haopeng Chen, Lexiao Zou, Junan Zhang, Liumeng Xue, Zhizheng Wu, IEEE SLT 2024.
-
SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion, Liumeng Xue, Chaoren Wang, Mingxuan Wang, Xueyao Zhang, Jun Han, Zhizheng Wu, Computers & Graphics, 2024
-
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion, Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie. Yuanzhe Chen, Qiao Tian, Yuping Wang, TASLP, 2024
-
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation, Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie. Yunlin Chen, Hao Yin, Zhifei Li. INTERSPEECH, 2024.
-
Text-aware and Context-aware Expressive Audiobook Speech Synthesis, Dake Guo, Xinfa Zhu, Liumeng Xue, Yongmao Zhang, Wenjie Tian, Lei Xie. INTERSPEECH, 2024.
-
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark, Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie. INTERSPEECH, 2024.
-
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder, Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu, under review, 2024.
-
SponTTS: modeling and transferring spontaneous style for TTS, Hanzhao Li, Xinfa Zhu, Liumeng Xue, Yang Song, Yunlin Chen, Lei Xie. ICASSP, 2024.
-
Transfer the linguistic representations from TTS to accent conversion with non-parallel data, Xi Chen, Jiakun Pei, Liumeng Xue, Mingyang Zhang, ICASSP, 2024
-
Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder, Yicheng Gu, Xueyao Zhang, Liumeng Xue, Zhizheng Wu, ICASSP, 2024.
-
An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification, Jiaqi Li, Li Wang, Liumeng Xue, Lei Wang, Zhizheng Wu, ICASSP, 2024.
-
HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS, Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie. ASRU, 2023.
-
Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion, Xueyao Zhang, Yicheng Gu, Haopeng Chen, Zihao Fang, Lexiao Zou, Liumeng Xue, Zhizheng Wu, ML4Audio @ NeurIPS 2023.
-
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features, Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie. Mengxiao Bi. ICASSP, 2023.
-
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers, Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie. INTERSPEECH, 2022
-
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS, Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie. TASLP, 2022
-
Cycle consistent network for end-to-end style transfer TTS training Liumeng Xue, Shifeng Pan, Lei He, Lei Xie. Frank K Soong. Neural Networks 2021
-
Controllable emotion transfer for end-to-end speech synthesis Tao Li, Shan Yang, Liumeng Xue, Lei Xie. ISCSLP 2021
- On the localness modeling for the self-attention based end-to-end speech synthesis Shan Yang, Heng Lu, Shiyin Kang, Liumeng Xue, Jinba Xiao, Dan Su, Lei Xie. Dong Yu. Neural networks 2020
- Building a mixed-lingual neural TTS system with only monolingual data Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie. Zhizheng Wu. INTERSPEECH 2019
Awards & Services
- Conference Co-organizer: IEEE Spoken Language Technology Workshop 2024 (IEEE SLT workshop 2024)
- Challenge Co-organizer: Conversational Voice Clone Challenge (CoVoC)
- Invited Reviewer: IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), Speech Processing Letters, Speech Communication, ICASSP, INTERSPEECH, ASRU, SLT, ISCSLP, etc.