🚀 LocalAI 3.0 – A New Era Begins
Say hello to LocalAI 3.0 — our most ambitious release yet!
We’ve taken huge strides toward making LocalAI not just local, but limitless. Whether you're building LLM-powered agents, experimenting with audio pipelines, or deploying multimodal backends at scale — this release is for you.
Let’s walk you through what’s new. (And yes, there’s a lot to love.)
TL;DR – What’s New in LocalAI 3.0.0 🎉
- 🧩 Backend Gallery: Install/remove backends on the fly, powered by OCI images — fully customizable and API-driven.
- 🎙️ Audio Support: Upload audio, PDFs, or text in the UI — plus new audio understanding models like Qwen Omni.
- 🌐 Realtime API: WebSocket support compatible with OpenAI clients, great for chat apps and agents.
- 🧠 Reasoning UI Boosts: Thinking indicators now show in chat for smart models.
- 📊 Dynamic VRAM Handling: Smarter GPU usage with automatic offloading.
- 🦙 Llama.cpp Upgrades: Now with reranking + multimodal via libmtmd.
- 📦 50+ New Models: Huge model gallery update with fresh LLMs across categories.
- 🐞 Bug Fixes: Streamed runes, template stability, better backend gallery UX.
- ❌ Deprecated: Extras images — replaced by the new backend system.
👉 Dive into the full changelog and docs below to explore more!
🧩 Introducing the Backend Gallery — Plug, Play, Power Up
No more hunting for dependencies or custom hacks.
With the new Backend Gallery, you can now:
- Install & remove backends at runtime or startup via API or directly from the WebUI
- Use custom galleries, just like you do for models
- Enjoy zero-config access to the default LocalAI gallery
Backends are standard OCI images — portable, composable, and totally DIY-friendly. Goodbye to "extras images" — hello to full backend modularity, even with Python-based dependencies.
📖 Explore the Backend Gallery Docs
⚠️ Important: Breaking Changes
From this release we will stop pushing -extra
images containing python backends. You can now use standard images, and you will have only to pick the ones that are suited for your GPU. Additional backends can be installed via the backend gallery.
Here below some examples, note that the CI is still publishing the images so won't be available until jobs are processed, and the installation scripts will be updated right after images are publicly available.
CPU only image:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
NVIDIA GPU Images:
# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CUDA 11
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11
# NVIDIA Jetson (L4T) ARM64
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
AMD GPU Images (ROCm):
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
Intel GPU Images (oneAPI):
# Intel GPU with FP16 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16
# Intel GPU with FP32 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32
Vulkan GPU Images:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
AIO Images (pre-downloaded models):
# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11
# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel-f16
# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
For more information about the AIO images and pre-downloaded models, see Container Documentation.
🧠 Smarter Reasoning, Smoother Chat
- Realtime WebSocket API: OpenAI-style streaming support via WebSocket is here. Ideal for agents and chat apps.
- "Thinking" Tags: Reasoning models now show a visual "thinking" box during inference in the UI. Intuitive and satisfying.
🧠 Model Power-Up: VRAM Savvy + Multimodal Brains
Dynamic VRAM Estimation: LocalAI now adapts and offloads layers depending on your GPU’s capabilities. Optimal performance, no guesswork.
Llama.cpp upgrades also includes:
- reranking
- Enhanced multimodal support via libmtmd
🧪 New Models!
More than 50 new models joined the gallery, including:
- 🧠 skywork-or1-32b, rivermind-lux-12b, qwen3-embedding-*, llama3-24b-mullein, ultravox-v0_5, and more
- 🧬 Multimodal, reasoning, and domain-specific LLMs for every need
- 📦 Browse the latest additions in the Model Gallery
🐞 Bugfixes & Polish
- Rune streaming is now buttery smooth
- Countless fixes across templates, inputs, CI, and realtime session updates
- Backend gallery UI is more stable and informative
The Complete Local Stack for Privacy-First AI
With LocalAGI rejoining LocalAI alongside LocalRecall, our ecosystem provides a complete, open-source stack for private, secure, and intelligent AI operations:
![]() LocalAI |
The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
![]() LocalAGI |
A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
![]() LocalRecall |
A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI. |
Join the Movement! ❤️
A massive THANK YOU to our incredible community and our sponsors! LocalAI has over 33,300 stars, and LocalAGI has already rocketed past 750+ stars!
As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time and our sponsors to provide us the hardware! If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!
👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI
LocalAI 3.0.0 is here. What will you build next?
Full changelog 👇
👉 Click to expand 👈
What's Changed
Breaking Changes 🛠
- feat: Add backend gallery by @mudler in #5607
- chore(backends): move
bark-cpp
to the backend gallery by @mudler in #5682
Bug fixes 🐛
- fix(ci): tag latest against cpu-only image by @mudler in #5362
- fix(flux): Set CFG=1 so that prompts are followed by @richiejp in #5378
- fix(template): we do not always have .Name by @mudler in #5508
- fix(input): handle correctly case where we pass by string list as inputs by @mudler in #5521
- fix(streaming): stream complete runes by @mudler in #5539
- fix(install.sh): vulkan docker tag by @halkeye in #5589
- fix(realtime): Use updated model on session update by @richiejp in #5604
- fix(backends gallery): propagate p2p settings to correctly draw menu by @mudler in #5684
Exciting New Features 🎉
- feat(llama.cpp): upgrade and use libmtmd by @mudler in #5379
- feat(ui): add error page to display errors by @mudler in #5418
- feat(llama.cpp): add reranking by @mudler in #5396
- feat: Realtime API support reboot by @richiejp in #5392
- feat(llama.cpp): add support for audio input by @mudler in #5466
- feat(ui): add audio upload button in chat view by @mudler in #5526
- feat(ui): allow to upload PDF and text files, also add support to multiple input files by @mudler in #5538
- feat(ui): display thinking tags appropriately by @mudler in #5540
- feat: improve RAM estimation by using values from summary by @mudler in #5525
- feat(backend gallery): display download progress by @mudler in #5687
🧠 Models
- chore(model gallery): add skywork_skywork-or1-32b by @mudler in #5369
- chore(model gallery): add skywork_skywork-or1-7b by @mudler in #5370
- chore(model gallery): add thedrummer_snowpiercer-15b-v1 by @mudler in #5371
- chore(model gallery): add thedrummer_rivermind-lux-12b-v1 by @mudler in #5372
- chore(model gallery): add primeintellect_intellect-2 by @mudler in #5373
- fix: typos by @omahs in #5376
- chore(model gallery): add soob3123_grayline-qwen3-14b by @mudler in #5393
- chore(model gallery): add soob3123_grayline-qwen3-8b by @mudler in #5394
- chore(model gallery): add a-m-team_am-thinking-v1 by @mudler in #5395
- chore(model gallery): add thedrummer_valkyrie-49b-v1 by @mudler in #5410
- chore(model gallery): add facebook_kernelllm by @mudler in #5411
- chore(model gallery): add smolvlm-256m-instruct by @mudler in #5412
- chore(model gallery): add smolvlm-500m-instruct by @mudler in #5413
- chore(model gallery): add smolvlm-instruct by @mudler in #5414
- chore(model gallery): add smolvlm2-2.2b-instruct by @mudler in #5415
- chore(model gallery): add smolvlm2-500m-video-instruct by @mudler in #5416
- chore(model gallery): add smolvlm2-256m-video-instruct by @mudler in #5417
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #5422
- chore(model gallery): add nvidia_llama-3.1-nemotron-nano-4b-v1.1 by @mudler in #5427
- chore(model gallery): add mistralai_devstral-small-2505 by @mudler in #5428
- chore(model gallery): add delta-vector_archaeo-12b-v2 by @mudler in #5429
- chore(model gallery): add arliai_qwq-32b-arliai-rpr-v4 by @mudler in #5443
- chore(model gallery): add whiterabbitneo_whiterabbitneo-v3-7b by @mudler in #5444
- chore(model gallery): add vulpecula-4b by @mudler in #5445
- chore(model gallery): add medgemma-4b-it by @mudler in #5460
- chore(model gallery): add medgemma-27b-text-it by @mudler in #5461
- chore(model gallery): add allura-org_q3-30b-a3b-pentiment by @mudler in #5462
- chore(model gallery): add allura-org_q3-30b-a3b-designant by @mudler in #5502
- chore(model gallery): add luckyrp-24b by @mudler in #5503
- chore(model gallery): add mrm8488_qwen3-14b-ft-limo by @mudler in #5504
- chore(model gallery): add llama3-24b-mullein-v1 by @mudler in #5505
- chore(model gallery): add ms-24b-mullein-v0 by @mudler in #5506
- chore(model gallery): add qwen2.5-omni-7b by @mudler in #5513
- chore(model gallery): add pku-ds-lab_fairyr1-14b-preview by @mudler in #5516
- chore(model gallery): add pku-ds-lab_fairyr1-32b by @mudler in #5517
- chore(model gallery): add moondream2-20250414 by @mudler in #5518
- chore(model gallery): add arcee-ai_homunculus by @mudler in #5577
- chore(model gallery): add nvidia_nemotron-research-reasoning-qwen-1.5b by @mudler in #5578
- chore(model gallery): add e-n-v-y_legion-v2.1-llama-70b-elarablated-v0.8-hf by @mudler in #5579
- chore(model gallery): add deepseek-ai_deepseek-r1-0528-qwen3-8b by @mudler in #5580
- chore(model gallery): add goekdeniz-guelmez_josiefied-qwen3-14b-abliterated-v3 by @mudler in #5590
- chore(model gallery): add ultravox-v0_5-llama-3_2-1b by @mudler in #5591
- chore(model gallery): add ultravox-v0_5-llama-3_1-8b by @mudler in #5592
- chore(model gallery): add open-thoughts_openthinker3-7b by @mudler in #5595
- chore(model gallery): add nbeerbower_qwen3-gutenberg-encore-14b by @mudler in #5596
- chore(model gallery): add akhil-theerthala_kuvera-8b-v0.1.0 by @mudler in #5600
- chore(model gallery): add qwen2.5-omni-3b by @mudler in #5606
- chore(model gallery): add kwaipilot_kwaicoder-autothink-preview by @mudler in #5627
- chore(model gallery): add sophosympatheia_strawberrylemonade-l3-70b-v1.0 by @mudler in #5628
- chore(model gallery): add mistralai_magistral-small-2506 by @mudler in #5629
- chore(model gallery): add baai_robobrain2.0-7b by @mudler in #5630
- chore(model gallery): add openbuddy_openbuddy-r1-0528-distill-qwen3-32b-preview0-qat by @mudler in #5631
- chore(model gallery): add qwen3-embedding-4b by @mudler in #5632
- chore(model gallery): add qwen3-embedding-8b by @mudler in #5633
- chore(model gallery): add qwen3-embedding-0.6b by @mudler in #5634
- chore(model gallery): add yanfei-v2-qwen3-32b by @mudler in #5639
📖 Documentation and examples
- chore(docs/install.sh): image changes by @mudler in #5354
- updating the documentation on fine tuning and advanced guide. by @TheDarkTrumpet in #5420
👒 Dependencies
- chore: ⬆️ Update ggml-org/whisper.cpp to
e41bc5c61ae66af6be2bd7011769bb821a83e8ae
by @localai-bot in #5357 - chore: ⬆️ Update ggml-org/llama.cpp to
de4c07f93783a1a96456a44dc16b9db538ee1618
by @localai-bot in #5358 - chore: ⬆️ Update ggml-org/whisper.cpp to
f89056057511a1657af90bb28ef3f21e5b1f33cd
by @localai-bot in #5364 - chore: ⬆️ Update ggml-org/whisper.cpp to
f389d7e3e56bbbfec49fd333551927a0fcbb7213
by @localai-bot in #5367 - chore: ⬆️ Update ggml-org/whisper.cpp to
20a20decd94badfd519a07ea91f0bba8b8fc4dea
by @localai-bot in #5374 - chore: ⬆️ Update ggml-org/whisper.cpp to
d1f114da61b1ae1e70b03104fad42c9dd666feeb
by @localai-bot in #5381 - chore: ⬆️ Update ggml-org/llama.cpp to
e3a7cf6c5bf6a0a24217f88607b06e4405a2b5d9
by @localai-bot in #5384 - chore: ⬆️ Update ggml-org/llama.cpp to
6a2bc8bfb7cd502e5ebc72e36c97a6f848c21c2c
by @localai-bot in #5390 - chore: ⬆️ Update ggml-org/whisper.cpp to
62dc8f7d7b72ca8e75c57cd6a100712c631fa5d5
by @localai-bot in #5398 - chore: ⬆️ Update ggml-org/llama.cpp to
b7a17463ec190aeee7b9077c606c910fb4688b84
by @localai-bot in #5399 - chore: ⬆️ Update ggml-org/llama.cpp to
8e186ef0e764c7a620e402d1f76ebad60bf31c49
by @localai-bot in #5423 - chore: ⬆️ Update ggml-org/whisper.cpp to
bd1cb0c8e3a04baa411dc12c1325b6a9f12ee7f4
by @localai-bot in #5424 - chore: ⬆️ Update ggml-org/whisper.cpp to
78b31ca7824500e429ba026c1a9b48e0b41c50cb
by @localai-bot in #5439 - chore: ⬆️ Update ggml-org/llama.cpp to
8a1d206f1d2b4e45918b589f3165b4be232f7ba8
by @localai-bot in #5440 - chore: ⬆️ Update ggml-org/whisper.cpp to
13d92d08ae26031545921243256aaaf0ee057943
by @localai-bot in #5449 - chore: ⬆️ Update ggml-org/llama.cpp to
d13d0f6135803822ec1cd7e3efb49360b88a1bdf
by @localai-bot in #5448 - chore(deps): bump llama.cpp to 'fef693dc6b959a8e8ba11558fbeaad0b264dd457' by @mudler in #5467
- chore: ⬆️ Update ggml-org/whisper.cpp to
ea9f206f18d86c4eb357db9fdc52e4d9dc24435e
by @localai-bot in #5464 - chore: ⬆️ Update ggml-org/llama.cpp to
a26c4cc11ec7c6574e3691e90ecdbd67deeea35b
by @localai-bot in #5500 - chore: ⬆️ Update ggml-org/llama.cpp to
a3c30846e410c91c11d7bf80978795a03bb03dee
by @localai-bot in #5509 - chore: ⬆️ Update ggml-org/whisper.cpp to
0ed00d9d30e8c984936ff9ed9a4fcd475d6d82e5
by @localai-bot in #5510 - chore: ⬆️ Update ggml-org/llama.cpp to
d98f2a35fcf4a8d3e660ad48cd19e2a1f3d5b2ef
by @localai-bot in #5514 - chore: ⬆️ Update ggml-org/whisper.cpp to
1f5fdbecb411a61b8576242e5170c5ecef24b05a
by @localai-bot in #5515 - chore: ⬆️ Update ggml-org/whisper.cpp to
e5e900dd00747f747143ad30a697c8f21ddcd59e
by @localai-bot in #5522 - chore(deps): bump llama.cpp to 'e83ba3e460651b20a594e9f2f0f0bffb998d3ce1 by @mudler in #5527
- chore: ⬆️ Update ggml-org/whisper.cpp to
98dfe8dc264b7d0d1daccfff9a9c043bcc2ece4b
by @localai-bot in #5542 - chore(deps): bump llama.cpp to 'e562eece7cb476276bfc4cbb18deb7c0369b2233' by @mudler in #5552
- chore: ⬆️ Update ggml-org/whisper.cpp to
7fd6fa809749078aa00edf945e959c898f2bd1af
by @localai-bot in #5556 - chore: ⬆️ Update ggml-org/whisper.cpp to
e05af2457b7b4134ee626dc044294a19b096e62f
by @localai-bot in #5569 - chore(deps): bump llama.cpp to '363757628848a27a435bbf22ff9476e9aeda5f40' by @mudler in #5571
- chore: ⬆️ Update ggml-org/llama.cpp to
7e00e60ef86645a01fda738fef85b74afa016a34
by @localai-bot in #5574 - chore: ⬆️ Update ggml-org/whisper.cpp to
82f461eaa4e6a1ba29fc0dbdaa415a9934ee8a1d
by @localai-bot in #5575 - chore(deps): bump GrantBirki/git-diff-action from 2.8.0 to 2.8.1 by @dependabot in #5564
- chore: ⬆️ Update ggml-org/llama.cpp to
0d3984424f2973c49c4bcabe4cc0153b4f90c601
by @localai-bot in #5585 - chore: ⬆️ Update ggml-org/whisper.cpp to
799eacdde40b3c562cfce1508da1354b90567f8f
by @localai-bot in #5586 - chore: ⬆️ Update ggml-org/llama.cpp to
1caae7fc6c77551cb1066515e0f414713eebb367
by @localai-bot in #5593 - chore: ⬆️ Update ggml-org/whisper.cpp to
b175baa665bc35f97a2ca774174f07dfffb84e19
by @localai-bot in #5597 - chore: ⬆️ Update ggml-org/llama.cpp to
745aa5319b9930068aff5e87cf5e9eef7227339b
by @localai-bot in #5598 - chore: ⬆️ Update ggml-org/llama.cpp to
5787b5da57e54dba760c2deeac1edf892e8fc450
by @localai-bot in #5601 - chore: ⬆️ Update ggml-org/llama.cpp to
247e5c6e447707bb4539bdf1913d206088a8fc69
by @localai-bot in #5605 - chore: ⬆️ Update ggml-org/whisper.cpp to
d78f08142381c1460604713e2f2ddf3331c7d816
by @localai-bot in #5619 - chore: ⬆️ Update ggml-org/llama.cpp to
3678b838bb71eaccbaeb479ff38c2e12bfd2f960
by @localai-bot in #5620 - chore: ⬆️ Update ggml-org/whisper.cpp to
2679bec6e09231c6fd59715fcba3eebc9e2f6076
by @localai-bot in #5625 - chore: ⬆️ Update ggml-org/whisper.cpp to
ebbc874e85b518f963a87612f6d79f5c71a55e84
by @localai-bot in #5635 - chore: ⬆️ Update ggml-org/llama.cpp to
ed52f3668e633423054a4eab61bb7efee47025ab
by @localai-bot in #5636 - chore: ⬆️ Update ggml-org/whisper.cpp to
705db0f728310c32bc96f4e355e2b18076932f75
by @localai-bot in #5643 - chore: ⬆️ Update ggml-org/llama.cpp to
3cb203c89f60483e349f841684173446ed23c28f
by @localai-bot in #5644 - chore: ⬆️ Update ggml-org/llama.cpp to
30e5b01de2a0bcddc7c063c8ef0802703a958417
by @localai-bot in #5659 - chore(deps): bump securego/gosec from 2.22.4 to 2.22.5 by @dependabot in #5663
- chore: ⬆️ Update ggml-org/whisper.cpp to
2a4d6db7d90899aff3d58d70996916968e4e0d27
by @localai-bot in #5661 - chore(deps): bump llama.cpp to 'e434e69183fd9e1031f4445002083178c331a28b by @mudler in #5665
- chore: ⬆️ Update ggml-org/whisper.cpp to
f3ff80ea8da044e5b8833e7ba54ee174504c518d
by @localai-bot in #5677 - chore: ⬆️ Update ggml-org/llama.cpp to
860a9e4eeff3eb2e7bd1cc38f65787cc6c8177af
by @localai-bot in #5678 - chore: ⬆️ Update ggml-org/llama.cpp to
8d947136546773f6410756f37fcc5d3e65b8135d
by @localai-bot in #5685 - chore: ⬆️ Update ggml-org/whisper.cpp to
ecb8f3c2b4e282d5ef416516bcbfb92821f06bf6
by @localai-bot in #5686
Other Changes
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5363
- chore: memoize detected GPUs by @mudler in #5385
- fix(transformers): pin protobuf by @mudler in #5421
- chore(scripts): allow to specify quants by @mudler in #5430
- fix(transformers): try to pin to working release by @mudler in #5426
- chore(model gallery): add nvidia_acereason-nemotron-14b by @mudler in #5463
- chore(deps): remove pin on transformers by @mudler in #5501
- feat(chatterbox): add new backend by @mudler in #5524
- fix(ci): try to add different mirrors to avoid 403 issues by @mudler in #5554
- Revert "fix(ci): try to add different mirrors to avoid 403 issues" by @mudler in #5555
- chore(deps): bump grpcio from 1.72.0 to 1.72.1 by @mudler in #5570
- fix(chatterbox): install only with cuda 12 by @mudler in #5573
- chore(deps): bump pytorch to 2.7 in vllm by @mudler in #5576
- fix(deps): pin grpcio by @mudler in #5621
- Improve Comments and Documentation for MixedMode and ParseJSON Functions by @leopardracer in #5626
- Fix Typos in Comments and Error Messages by @kilavvy in #5637
- docs: Update docs metadata headers so when mentioned on slack it doesn't say hugo by @halkeye in #5642
- Minor Documentation Updates: Clarified Comments in Python and Go Files by @vtjl10 in #5641
- chore: improve tests by @mudler in #5646
- Fix Typos and Improve Documentation Clarity by @zeevick10 in #5648
- chore(ci): use public runner for extra backends by @mudler in #5657
- chore: Add python3 to images by @mudler in #5660
- fix: add python symlink, use absolute python env path when running backends by @mudler in #5664
- chore(backend gallery): re-order and add description for vLLM by @mudler in #5676
- chore(backend gallery): add description for remaining backends by @mudler in #5679
- chore(ci): switch to public runners for base images by @mudler in #5680
- chore(ci): try to use public runners also for release builds by @mudler in #5681
- chore(ci): move also other jobs to public runner by @mudler in #5683
- Fix Typos in Documentation and Python Comments by @maximevtush in #5658
- Fix Typos and Improve Clarity in GPU Acceleration Documentation by @leopardracer in #5688
New Contributors
- @omahs made their first contribution in #5376
- @TheDarkTrumpet made their first contribution in #5420
- @halkeye made their first contribution in #5589
- @leopardracer made their first contribution in #5626
- @kilavvy made their first contribution in #5637
- @vtjl10 made their first contribution in #5641
- @zeevick10 made their first contribution in #5648
- @maximevtush made their first contribution in #5658
Full Changelog: v2.29.0...v3.0.0