Hi there 👋
Rongjie Huang is a Second-Year Master's student (expected to graduate at 2024.03) in the College of Computer Science and Software at Zhejiang University, supervised by Prof. Zhou Zhao. I have long-term collaboration with Yi Ren (ByteDance AI Lab), Jinglin Liu (Zhejiang University). I am a research intern at Tencent AI Lab (Seattle Lab), where I work with Chunlei Zhang and Dong Yu. I have published 10+ papers at the top international AI conferences such as NeurIPS/ICLR/IJCAI/ACM-MM.
I am actively looking for academic collaboration, feel free to drop me an email.
📎 Homepages
- Personal Pages: https://rongjiehuang.github.io (updated recently
🔥 ) - Linkedin: https://www.linkedin.com/in/rongjie-huang-a362541b2
- Google Scholar: https://scholar.google.com/citations?user=iRHBUsgAAAAJ
🔥 News
- 2023.04: AudioGPT and AcademiCodec come out!
- 2023.04: One papers is accepted by ICML 2023!
- 2023.02: Make-An-Audio comes out! Media coverage: Heart of Machine, ByteDance, and Twitter.
- 2023.01: One papers is accepted by ICLR 2023!
- 2022.09: Two papers are accepted by NeurIPS 2022!
- 2022.02: We release a diffusion text-to-speech pipeline
using ProDiff
and FastDiff
. Welcome to STAR and FORK!
- 2022.06: Two papers are accepted by ACM-MM 2022!
- 2022.04: One paper is accepted by IJCAI 2022
💻 Selected Research Papers
Generative AI for Speech, Sing, and Audio: Text-to-Speech Synthesis, Singing Voice Synthesis, General Audio Synthesis
Spoken language processing: Speech-to-speech Translation, Speech-to-SQL Parsing, Self-supervised Learning
My full paper list is shown at my personal homepage.
Text-to-Speech Synthesis
- GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech. Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, and Zhou Zhao. NeurIPS, 2022
- FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. Rongjie Huang, Max W.Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, and Zhou Zhao. IJCAI, 2022(oral)
- ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech. Rongjie Huang, Zhou Zhao, Huadai Liu, Jinglin Liu, and Yi Ren. ACM MM, 2022
- EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model. Chenye Cui, Yi Ren, Jinglin Liu, Feiyang Chen, Rongjie Huang, Mei Li, and Zhou Zhao. Interspeech, 2021
Text-to-Audio Synthesis
- Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models. Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao. ICML, 2023
- VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement. Chenye Cui, Yi Ren, Jinglin Liu, Rongjie Huang, Zhou Zhao. ICASSP, 2023
Singing Voice Synthesis
- Multi-Singer: Fast multi-singer singing voice vocoder with a large-scale corpus. Rongjie Huang, Feiyang Chen, Yi Ren, Jinglin Liu, Chenye Cui, and Zhou Zhao. ACM MM, 2021(oral)
- SingGAN: Generative Adversarial NetWork For High-Fidelity Singing Voice Generation. Rongjie Huang, Chenye Cui, Feiyang Chen, Yi Ren, Jinglin Liu, and Zhou Zhao. ACM MM, 2022
- M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus. Lichao Zhang, Ruiqi Li, Shoutong Wang, Liqun Deng, Jinglin Liu, Yi Ren, Jinzheng He, Rongjie Huang, Jieming Zhu, Xiao Chen, and Zhou Zhao. NeurIPS, 2022
Spoken Language Processing
- TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, and Zhou Zhao. ICLR, 2023
- Bilateral Denoising Diffusion Models. Max W.Y. Lam, Jun Wang, Rongjie Huang, Dan Su, Dong Yu. Preprint



