The Wayback Machine - https://web.archive.org/web/20231211043800/https://github.com/AIGC-Audio/AudioGPT
Skip to content

AIGC-Audio/AudioGPT

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
April 9, 2023 17:02
April 9, 2023 17:05
April 6, 2023 22:18
March 28, 2023 23:30
April 30, 2023 23:28
April 2, 2023 20:05
April 9, 2023 17:02
April 30, 2023 23:29
April 30, 2023 23:16
April 30, 2023 23:16
March 24, 2023 13:43

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

arXiv GitHub Stars visitors Hugging Face

We provide our implementation and pretrained models as open source in this repository.

Get Started

Please refer to run.md

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Currently not every model has repository.

Speech

Task Supported Foundation Models Status
Text-to-Speech FastSpeech, SyntaSpeech, VITS Yes (WIP)
Style Transfer GenerSpeech Yes
Speech Recognition whisper, Conformer Yes
Speech Enhancement ConvTasNet Yes (WIP)
Speech Separation TF-GridNet Yes (WIP)
Speech Translation Multi-decoder WIP
Mono-to-Binaural NeuralWarp Yes

Sing

Task Supported Foundation Models Status
Text-to-Sing DiffSinger, VISinger Yes (WIP)

Audio

Task Supported Foundation Models Status
Text-to-Audio Make-An-Audio Yes
Audio Inpainting Make-An-Audio Yes
Image-to-Audio Make-An-Audio Yes
Sound Detection Audio-transformer Yes
Target Sound Detection TSDNet Yes
Sound Extraction LASSNet Yes

Talking Head

Task Supported Foundation Models Status
Talking Head Synthesis GeneFace Yes (WIP)

Acknowledgement

We appreciate the open source of the following projects:

ESPNetNATSpeechVisual ChatGPTHugging FaceLangChainStable Diffusion