Rongjiehuang

Hi there 👋

Rongjie Huang is a Second-Year Master's student (expected to graduate at 2024.03) in the College of Computer Science and Software at Zhejiang University, supervised by Prof. Zhou Zhao. I have long-term collaboration with Yi Ren (ByteDance AI Lab), Jinglin Liu (Zhejiang University). I am a research intern at Tencent AI Lab (Seattle Lab), where I work with Chunlei Zhang and Dong Yu. I have published 10+ papers at the top international AI conferences such as NeurIPS/ICLR/IJCAI/ACM-MM.

I am actively looking for academic collaboration, feel free to drop me an email.

📎 Homepages

Personal Pages: https://rongjiehuang.github.io (updated recently🔥)
Linkedin: https://www.linkedin.com/in/rongjie-huang-a362541b2
Google Scholar: https://scholar.google.com/citations?user=iRHBUsgAAAAJ

🔥 News

2023.04: AudioGPT and AcademiCodec come out!
2023.04: One papers is accepted by ICML 2023!
2023.02: Make-An-Audio comes out! Media coverage: Heart of Machine, ByteDance, and Twitter.
2023.01: One papers is accepted by ICLR 2023!
2022.09: Two papers are accepted by NeurIPS 2022!
2022.02: We release a diffusion text-to-speech pipeline using ProDiff and FastDiff . Welcome to STAR and FORK!
2022.06: Two papers are accepted by ACM-MM 2022!
2022.04: One paper is accepted by IJCAI 2022

💻 Selected Research Papers

Generative AI for Speech, Sing, and Audio: Text-to-Speech Synthesis, Singing Voice Synthesis, General Audio Synthesis

Spoken language processing: Speech-to-speech Translation, Speech-to-SQL Parsing, Self-supervised Learning

My full paper list is shown at my personal homepage.

Text-to-Speech Synthesis

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech. Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, and Zhou Zhao. NeurIPS, 2022
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. Rongjie Huang, Max W.Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, and Zhou Zhao. IJCAI, 2022(oral)
ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech. Rongjie Huang, Zhou Zhao, Huadai Liu, Jinglin Liu, and Yi Ren. ACM MM, 2022
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model. Chenye Cui, Yi Ren, Jinglin Liu, Feiyang Chen, Rongjie Huang, Mei Li, and Zhou Zhao. Interspeech, 2021

Text-to-Audio Synthesis

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models. Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao. ICML, 2023
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement. Chenye Cui, Yi Ren, Jinglin Liu, Rongjie Huang, Zhou Zhao. ICASSP, 2023

Singing Voice Synthesis

Multi-Singer: Fast multi-singer singing voice vocoder with a large-scale corpus. Rongjie Huang, Feiyang Chen, Yi Ren, Jinglin Liu, Chenye Cui, and Zhou Zhao. ACM MM, 2021(oral)
SingGAN: Generative Adversarial NetWork For High-Fidelity Singing Voice Generation. Rongjie Huang, Chenye Cui, Feiyang Chen, Yi Ren, Jinglin Liu, and Zhou Zhao. ACM MM, 2022
M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus. Lichao Zhang, Ruiqi Li, Shoutong Wang, Liqun Deng, Jinglin Liu, Yi Ren, Jinzheng He, Rongjie Huang, Jieming Zhu, Xiao Chen, and Zhou Zhao. NeurIPS, 2022

Spoken Language Processing

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, and Zhou Zhao. ICLR, 2023
Bilateral Denoising Diffusion Models. Max W.Y. Lam, Jun Wang, Rongjie Huang, Dan Su, Dong Yu. Preprint

Created 36 commits in 3 repositories

AIGC-Audio/AudioGPT 29 commits

Rongjiehuang/Rongjiehuang 6 commits

yangdongchao/AcademiCodec 1 commit

Created 1 repository

Rongjiehuang/AudioGPT Python Apr 26

Opened 6 pull requests in 1 repository

AIGC-Audio/AudioGPT 6 merged

refine readme Apr 26

Merge Apr 26

clean some codes Apr 12

update Apr 9

clean some codes Apr 2

merge code and do some cleanings Apr 2

55 contributions in private repositories Apr 2 – Apr 27

Rongjiehuang

Achievements

Achievements

Block or report Rongjiehuang

Hi there 👋

📎 Homepages

🔥 News

💻 Selected Research Papers

Text-to-Speech Synthesis

Text-to-Audio Synthesis

Singing Voice Synthesis

Spoken Language Processing

Pinned

298 contributions in the last year

Contribution activity

April 2023

Mar	APR	May
	29
2022	2023	2024