About me

Liumeng Xue is a Postdoctoral Researcher at Hong Kong University of Science and Technology, working with Prof.Yike Guo and Prof.Wei Xue. Now, I’m working on audio, music, and speech generation and understanding. Before that, I performed postdoctoral research at The Chinese University of Hong Kong, Shenzhen (CUHK-SZ), working with Prof.Haizhou Li and Prof.Zhizheng Wu. I was a co-founder of Amphion, an open-source platform for Audio, Music, and Speech Generation. I received Ph.D. degree from the Audio, Speech and Language Processing Laboratory at Northwestern Polytechnical University (ASLP@NWPU), supervised by Prof.Lei Xie. During my studies, I performed research at JD AI Lab (2018-2019), Tencent AI Lab (2021-2022) and Microsoft (2019-2020, 2021-2022). My research interests include audio, speech and language processing, audio, music, and speech generation.

News

🎉 Mar 19, 2025: Invited talk about Audio-FLAN by Josh Gardner from Apple team.
🎉 Mar 3, 2025: Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens is released. GitHub Demo
🎉 Feb 23, 2025: Audio-FLAN is released. GitHub HuggingFace
🎉 Feb 6, 2025: Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis is released. GitHub Demo
🎉 Jan 18, 2025: YuE is released. GitHub Demo
🎉 Aug 30, 2024: Amphion: An Open-Source Audio, Music and Speech Generation Toolkit is accepted by IEEE SLT 2024.
🎉 Aug 30, 2024: Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion is accepted by IEEE SLT 2024.
🎉 Aug 20, 2024: The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings is accepted by ISCSLP 2024.
🎉 Aug 20, 2024: SingVisio is accepted by Computers & Graphics.
🎉 Jun 13, 2024: Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion is accepted and published by TASLP.
🎉 Jun 4, 2024: WenetSpeech4TTS, Single-Codec, TACA-Audiobook are accepted by INTERSPEECH2024.
🎉 May 31, 2024: Conversational Voice Clone Challenge (CoVoC) is launched.
🎉 Feb 28, 2024: IEEE SLT workshop 2024 is announced.
🎉 Dec 18, 2023: Amphion is released. Amphion is an open-source platform for Audio, Music, and Speech Generation. As a co-founder of Amphion,
- I built the fundamental framework to integrate various generative tasks into a unified pipeline, including data pre-processing, model building, training and inference, etc. Github
- I led the reproduction of several typical TTS models and released the pre-trained models. HuggingFace
- I developed and led the SingVisio project, an interactive visualization platform that makes the inner work mechanism of diffusion model easily understanble in the context of singing voice conversion. Try it.

Research Experience

2021.11 - 2022.10, Research Intern, Microsoft.
2021.06 - 2021.11, Research Intern, Tencent AI Lab.
2019.04 - 2020.06, Research Intern, Microsoft.
2018.10 - 2019.04, Research Intern, JD.COM AI Lab.

Selected Publications

Audio-flan: A preliminary release, Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue. 2025.
Yue: Scaling open foundation models for long-form music generation, Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, …, Liumeng Xue, Xingwei Qu, Yizhi Li, Shangda Wu, Tianhao Shen, Ziyang Ma, …, Wei Xue, Xu Tan, Yike Guo. 2025.
Spark-TTS: An efficient llm-based text-to-speech model with single-stream decoupled speech tokens, Xinsheng Wang, Mingqi Jiang, Ziyang Ma, …, Liumeng Xue, …, Xie Chen, Lei Xie, Yike Guo, Wei Xue. 2025.
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis, Zhen Ye, Xinfa Zhu, Chi-Min Chan, Xinsheng Wang, Xu Tan, …, Liumeng Xue, …, Yike Guo, Wei Xue. 2025.
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings, Kangxiang Xia, Dake Guo, Jixun Yao, Liumeng Xue, Hanzhao Li, Shuai Wang, Zhao Guo, Lei Xie, Qingqing Zhang, Lei Luo, Minghui Dong, Peng Sun. ISCSLP, 2025.
SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion, Liumeng Xue, Chaoren Wang, Mingxuan Wang, Xueyao Zhang, Jun Han, Zhizheng Wu, Computers & Graphics, 2024
Amphion: An open-source audio, music and speech generation toolkit, Xueyao Zhang^*, Liumeng Xue^*, Yicheng Gu^*, Yuancheng Wang^*, Jiaqi Li^*, Haorui He, Chaoren Wang, Songting Liu, Xi Chen, Junan Zhang, Zihao Fang, Haopeng Chen, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu. IEEE SLT, 2024.
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion, Xueyao Zhang, Zihao Fang, Yicheng Gu, Haopeng Chen, Lexiao Zou, Junan Zhang, Liumeng Xue, Zhizheng Wu, IEEE SLT 2024.
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion, Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie. Yuanzhe Chen, Qiao Tian, Yuping Wang, TASLP, 2024
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation, Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie. Yunlin Chen, Hao Yin, Zhifei Li. INTERSPEECH, 2024.
Text-aware and Context-aware Expressive Audiobook Speech Synthesis, Dake Guo, Xinfa Zhu, Liumeng Xue, Yongmao Zhang, Wenjie Tian, Lei Xie. INTERSPEECH, 2024.
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark, Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie. INTERSPEECH, 2024.
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder, Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu, under review, 2024.
SponTTS: modeling and transferring spontaneous style for TTS, Hanzhao Li, Xinfa Zhu, Liumeng Xue, Yang Song, Yunlin Chen, Lei Xie. ICASSP, 2024.
Transfer the linguistic representations from TTS to accent conversion with non-parallel data, Xi Chen, Jiakun Pei, Liumeng Xue, Mingyang Zhang, ICASSP, 2024
Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder, Yicheng Gu, Xueyao Zhang, Liumeng Xue, Zhizheng Wu, ICASSP, 2024.
An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification, Jiaqi Li, Li Wang, Liumeng Xue, Lei Wang, Zhizheng Wu, ICASSP, 2024.
HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS, Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie. ASRU, 2023.
Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion, Xueyao Zhang, Yicheng Gu, Haopeng Chen, Zihao Fang, Lexiao Zou, Liumeng Xue, Zhizheng Wu, ML4Audio @ NeurIPS 2023.
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features, Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie. Mengxiao Bi. ICASSP, 2023.
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers, Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie. INTERSPEECH, 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS, Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie. TASLP, 2022

Cycle consistent network for end-to-end style transfer TTS training Liumeng Xue, Shifeng Pan, Lei He, Lei Xie. Frank K Soong. Neural Networks 2021
Controllable emotion transfer for end-to-end speech synthesis Tao Li, Shan Yang, Liumeng Xue, Lei Xie. ISCSLP 2021

On the localness modeling for the self-attention based end-to-end speech synthesis Shan Yang, Heng Lu, Shiyin Kang, Liumeng Xue, Jinba Xiao, Dan Su, Lei Xie. Dong Yu. Neural networks 2020

Building a mixed-lingual neural TTS system with only monolingual data Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie. Zhizheng Wu. INTERSPEECH 2019
Awards & Services
Conference Co-organizer: IEEE Spoken Language Technology Workshop 2024 (IEEE SLT workshop 2024)
Challenge Co-organizer: Conversational Voice Clone Challenge (CoVoC)
Invited Reviewer: IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), Speech Processing Letters, Speech Communication, ACMMM, ICASSP, INTERSPEECH, ASRU, SLT, ISCSLP, etc.

Liumeng Xue

About me

News

Research Experience

Selected Publications

Awards & Services