Pulse · ggml-org/llama.cpp · GitHub

June 18, 2025 – June 25, 2025

Overview

81 Active pull requests

62 Active issues

44 Releases published by 1 person

b5701
published Jun 19, 2025
b5702
published Jun 19, 2025
b5703
published Jun 19, 2025
b5704
published Jun 19, 2025
b5706
published Jun 19, 2025
b5707
published Jun 19, 2025
b5708
published Jun 19, 2025
b5709
published Jun 19, 2025
b5711
published Jun 19, 2025
b5712
published Jun 20, 2025
b5713
published Jun 20, 2025
b5714
published Jun 20, 2025
b5715
published Jun 20, 2025
b5716
published Jun 20, 2025
b5717
published Jun 20, 2025
b5718
published Jun 20, 2025
b5720
published Jun 20, 2025
b5719
published Jun 20, 2025
b5721
published Jun 20, 2025
b5723
published Jun 20, 2025
b5722
published Jun 20, 2025
b5726
published Jun 20, 2025
b5728
published Jun 21, 2025
b5729
published Jun 21, 2025
b5731
published Jun 21, 2025
b5733
published Jun 22, 2025
b5734
published Jun 22, 2025
b5735
published Jun 22, 2025
b5736
published Jun 22, 2025
b5737
published Jun 22, 2025
b5738
published Jun 22, 2025
b5740
published Jun 22, 2025
b5742
published Jun 23, 2025
b5743
published Jun 23, 2025
b5744
published Jun 23, 2025
b5745
published Jun 23, 2025
b5747
published Jun 24, 2025
b5749
published Jun 24, 2025
b5751
published Jun 24, 2025
b5752
published Jun 24, 2025
b5753
published Jun 24, 2025
b5754
published Jun 25, 2025
b5755
published Jun 25, 2025
b5756
published Jun 25, 2025

55 Pull requests merged by 31 people

ggml-cpu: enable IBM NNPA Vector Intrinsics
#14317 merged Jun 25, 2025
ggml : do not output unprintable characters on GGUF load failure
#14381 merged Jun 25, 2025
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices
#13973 merged Jun 25, 2025
opencl: ref count ggml_backend_opencl_context and refactor profiling
#14254 merged Jun 24, 2025
batch : fix check for empty sequences in memory
#14364 merged Jun 24, 2025
cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION
#14362 merged Jun 24, 2025
docs: Fix server API key doc for /props (move it to /health)
#14352 merged Jun 24, 2025
main : honor --verbose-prompt on interactive prompts
#14350 merged Jun 24, 2025
Add Mistral-Small-3.2-24B-Instruct-2506.jinja
#14349 merged Jun 24, 2025
CUDA/HIP: optimize mmv paths taken for HIP/CDNA
#14324 merged Jun 23, 2025
ci: add workflow for relocatable cmake package
#14346 merged Jun 23, 2025
vulkan: update windows SDK in release.yml
#14344 merged Jun 23, 2025
Fixes for rwkv-world template and the missing inputs.use_jinja in llama-cli
#14336 merged Jun 23, 2025
CUDA: mul_mat_v support for batch sizes > 1
#14262 merged Jun 23, 2025
kv-cells : fix tracking of seq_pos during cache reuse
#14339 merged Jun 23, 2025
vulkan: update windows SDK in CI
#14334 merged Jun 23, 2025
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 merged Jun 22, 2025
gguf-py : fix SpecialVocab parsing when post_processor is null
#14330 merged Jun 22, 2025
run : avoid double tokenization
#14327 merged Jun 22, 2025
examples : fix is_first logic for tokenization
#14329 merged Jun 22, 2025
HIP: enable vec fattn on RDNA4
#14323 merged Jun 22, 2025
mtmd: fix Pixtral OOM with large images by capping image_size to 1024
#14326 merged Jun 22, 2025
common : use std::string_view now that we target c++17
#14319 merged Jun 22, 2025
CUDA: add mean operation
#14313 merged Jun 22, 2025
gguf-py : fix Qwen3-Embedding eos token
#14314 merged Jun 21, 2025
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 merged Jun 21, 2025
gguf-py : fix TemplateProcessing pair when bos/eos is missing
#14312 merged Jun 21, 2025
metal : fix thread-safety
#14300 merged Jun 21, 2025
memory : rename interface to llama_memory_context_i
#14296 merged Jun 21, 2025
Fix Llama 4 conversion
#14311 merged Jun 21, 2025
sync : ggml
#14308 merged Jun 20, 2025
docs : fix the link to llama.h
#14293 merged Jun 20, 2025
CUDA: add conv_2d_transpose
#14287 merged Jun 20, 2025
lint : remove trailing whitepace
#14304 merged Jun 20, 2025
vocab : prevent tokenizer overflow
#14301 merged Jun 20, 2025
sycl: add usage of enqueue_functions extension
#14244 merged Jun 20, 2025
Implement GGML_CPU_ALL_VARIANTS for PowerPC
#14286 merged Jun 20, 2025
llama : improve sep token handling
#14272 merged Jun 20, 2025
cuda : synchronize graph capture and cublas handle destruction
#14288 merged Jun 20, 2025
ggml : fix repack work size for mul_mat_id
#14292 merged Jun 20, 2025
ggml: Update KleidiAI to v1.9.0
#14277 merged Jun 20, 2025
model : more uniform output id handling
#14275 merged Jun 20, 2025
ubatch : new splitting logic
#14217 merged Jun 20, 2025
CUDA: add conv_2d_dw
#14265 merged Jun 20, 2025
ggml-cpu : remove unnecesary arm feature detection
#14281 merged Jun 19, 2025
gguf-py: Make sentencepiece optional
#14200 merged Jun 19, 2025
server: args for draft model cache types (#11200)
#13782 merged Jun 19, 2025
fix: resolve gcc compile warnings
#14261 merged Jun 19, 2025
sycl: Cleanup codepaths in Get Rows in sycl backend
#14215 merged Jun 19, 2025
llama-bench : add --no-warmup flag (#14224)
#14270 merged Jun 19, 2025
scripts: Fix remote option in Windows (#14102)
#14100 merged Jun 19, 2025
llamafile: support s390x SIMD instruction set
#14273 merged Jun 19, 2025
Vulkan: Fix host-pinned memory for large allocations
#14249 merged Jun 19, 2025
Hybrid recurrent cache
#13979 merged Jun 19, 2025
metal : add mean kernel
#14267 merged Jun 19, 2025

26 Pull requests opened by 20 people

ggml : add ggml_set_rows
#14274 opened Jun 19, 2025
kv-cache : use ggml_set_rows
#14285 opened Jun 19, 2025
Fix Windows Null Pointer Bug and Enhance Memory Operations in ggml-sycl
#14290 opened Jun 20, 2025
GitHub workflow: set RPATH to "@loader_path" / "$ORIGIN" to ensure executables and dynamic libraries search for dependencies in their origin directory.
#14309 opened Jun 20, 2025
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 opened Jun 21, 2025
Conv2D: Add CPU version
#14320 opened Jun 21, 2025
Fix appearance of the chats list context menu for the browser Safari
#14322 opened Jun 22, 2025
make "serve" library
#14331 opened Jun 22, 2025
vulkan: lock accesses of pinned_memory vector
#14333 opened Jun 22, 2025
Make the shell scripts cross-platform
#14341 opened Jun 23, 2025
vulkan: Increase workgroup size for GLU, for performance
#14345 opened Jun 23, 2025
[server] webui DB import and export
#14347 opened Jun 23, 2025
Add script to test op perf and compare
#14354 opened Jun 24, 2025
build: refine toplevel .gitignore
#14355 opened Jun 24, 2025
llama : expose C API to get layer device type
#14358 opened Jun 24, 2025
server : fix assistant prefilling when content is an array
#14360 opened Jun 24, 2025
CUDA: add bf16 and f32 support to cublas_mul_mat_batched
#14361 opened Jun 24, 2025
llama : add high-throughput mode
#14363 opened Jun 24, 2025
ggml : add pointer to attach user data
#14365 opened Jun 24, 2025
vulkan: Add fusion support for RMS_NORM+MUL
#14366 opened Jun 24, 2025
test-backend-ops: add support for specifying output format
#14368 opened Jun 25, 2025
docs: fix broken url in main readme
#14371 opened Jun 25, 2025
Q2k interleaving implementation - x86/x64 SIMD
#14373 opened Jun 25, 2025
webui: preserve partial content when streaming errors occur
#14374 opened Jun 25, 2025
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipeline
#14378 opened Jun 25, 2025
ggml-cpu: Build variant targeting Neoverse-V2
#14380 opened Jun 25, 2025

47 Issues closed by 10 people

Feature Request: Qwen 2.5 VL
#11483 closed Jun 26, 2025
Misc. bug: Model not loaded on Android with NDK
#13399 closed Jun 26, 2025
Eval bug: I cannot run llama 405b on CPU
#13475 closed Jun 26, 2025
web UI either doesn't scroll or jumps to the wrong element
#13479 closed Jun 26, 2025
Partial offload support for training
#13486 closed Jun 26, 2025
Misc. bug: ggml_cuda_compute_forward: MUL failed ROCm error: invalid device function
#14370 closed Jun 25, 2025
Misc. bug: llama-server slower on 4bit quantized model with f470bc36bed
#14235 closed Jun 25, 2025
Misc. bug: Completions hang after CUDA error, but health endpoint reports all OK
#13281 closed Jun 25, 2025
Misc. bug: The web UI of llama-server is not displaying correctly.
#13428 closed Jun 25, 2025
Compile bug: ld returned 1 exit status (file bigger than 2gb)
#13446 closed Jun 25, 2025
Drop support for sentencepiece
#13448 closed Jun 25, 2025
Misc. bug: Illegal CUDA memory access in ggml_backend_cuda_cpy_tensor_async
#13449 closed Jun 25, 2025
Feature Request: add draft model in llama-bench and more.
#13456 closed Jun 25, 2025
Misc. bug: llama-server webui overriding command line parameters
#13277 closed Jun 24, 2025
Eval bug: Regex
#13347 closed Jun 24, 2025
Differential mode for llama-bench + plotting code
#13408 closed Jun 24, 2025
Eval bug: Qwen3-30B-A3B-Q4_K_M: Slows down when using the \no_think mode.
#13427 closed Jun 24, 2025
Eval bug: llama-speculative core dump with Qwen3, GGML_ASSERT(batch.n_tokens > 0) failed
#13433 closed Jun 24, 2025
Misc. bug: Completion fails with error 500
#14298 closed Jun 23, 2025
Feature Request: update Windows actions builder to use *LATEST* Vulkan SDK (1.4.313 SDK) for VK_KHR_shader_bfloat16 support (like Linux builds)....
#14230 closed Jun 23, 2025
Eval bug: exmaple llama-simple-chat run failed in Android
#14253 closed Jun 23, 2025
Compile bug: I tried compiling llama.cpp for HIP on my system (elementaryOS 8/ubuntu 24.04, rocm 6.4.0, gfx1100) using the installation guide
#13340 closed Jun 23, 2025
Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation
#13404 closed Jun 23, 2025
Misc. bug: convert: AttributeError: 'NoneType' object has no attribute 'get'
#14328 closed Jun 22, 2025
mtmd: Eval bug: Mistral small 2506 needs 1024x1024 image size cap
#14310 closed Jun 22, 2025
Feature Request: Add Support for ModernBert
#11282 closed Jun 22, 2025
Feature Request: allow setting jinja chat template from server webui
#11689 closed Jun 22, 2025
Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
#13160 closed Jun 22, 2025
Token Generation Speed Decline with GGUF Models on M3 Ultra
#13373 closed Jun 22, 2025
Compile bug: ninja: build stopped: subcommand failed.
#13375 closed Jun 22, 2025
Misc. bug: invalid regex grammar causes segment violation
#13390 closed Jun 22, 2025
Feature Request: fix handling of Qwen3-Embedding-0.6B input to add EOS token
#14252 closed Jun 21, 2025
Not able to convert hf models to gguf anymore
#14315 closed Jun 21, 2025
Llama 4 mmproj fails `unable to find tensor mm.model.fc.weight`
#14237 closed Jun 21, 2025
Eval bug: IQ2_M broken for mradermacher / Llama-4-Maverick-17B-128E-Instruct-GGUF
#12913 closed Jun 21, 2025
Feature Request: tensor split needs control over where CPU layers go
#13314 closed Jun 21, 2025
Misc. bug: error in remote conversion for the new ServiceNow Nemotron 15B model
#13354 closed Jun 21, 2025
Compile bug: clang-18.1.3 compile fail (vsetivli)
#13358 closed Jun 21, 2025
Misc. bug: Extended swap/unswap times when loading large models on Apple Silicon
#13361 closed Jun 21, 2025
Misc. bug: `ggml_backend_reg_count()` returns 0 on linux from b5587 and onwards
#14302 closed Jun 20, 2025
Misc. bug: RPC immediate closing of the connection
#14307 closed Jun 20, 2025
Compile bug: [blas] choose blas backend to run llama2-7b model, but system info doesn't have the blas flag.
#14259 closed Jun 20, 2025
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 closed Jun 20, 2025
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 closed Jun 20, 2025
Misc. bug: batch in the mtmd-cli.cpp not freed
#13620 closed Jun 19, 2025
Feature Request: Add --no-warmup to llama-bench
#14224 closed Jun 19, 2025
Misc. bug: option --remote of convert_hf_to_gguf.py does not work in Windows
#14102 closed Jun 19, 2025

15 Issues opened by 15 people

Eval bug: Regression: Function calling using Mistral does not work anymore, refers to a character named Mistral in Tekken 7
#14383 opened Jun 26, 2025
Feature Request: allow running llama with an idle (lowest) priority as well
#14382 opened Jun 25, 2025
Feature Request: Exclude thinking tokens from server cache for reasoning models
#14379 opened Jun 25, 2025
Request for Official Support of AMD Ryzen AI Platform NPU
#14377 opened Jun 25, 2025
Compile error for ggml_gemv_q4_K_8x8_q8_K on Intel x86_64 MacOS (AVX2)
#14372 opened Jun 25, 2025
Misc. bug: Inconsistent Gemma3 implementation in rope factor
#14367 opened Jun 24, 2025
Misc. bug: llama-server assistant prefill only works when message content is a string (not a list of objects)
#14353 opened Jun 24, 2025
Feature Request: Suggest to provide armv7l version to run on Raspberry Pi devices.
#14348 opened Jun 23, 2025
OpenCL backend with Qualcomm Adreno GPUs load time is too long
#14337 opened Jun 23, 2025
Eval bug: Program crashes during long input inference when batch size is set to 16384
#14325 opened Jun 22, 2025
Feature Request: Add support for moonshotai/Kimi-VL-A3B-Instruct
#14318 opened Jun 21, 2025
Eval bug: When I store the model on my hard drive, llama.cpp attempts to load it and then says it's warming it up with a blank run after which it crashes the terminal session.
#14297 opened Jun 20, 2025
mtmd: Any plan for mtmd to support video input and audio output?
#14295 opened Jun 20, 2025
Eval bug: [CANN]AutoDL Ascend 910B instance running DeepSeek-r1 32B_Q8 error
#14291 opened Jun 20, 2025
Eval bug: Inconsistent Embedding Similarity between llama-server and LlamaCppEmbeddings for BGE-M3 Model
#14280 opened Jun 19, 2025

69 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on Jun 25, 2025 • 11 new comments
imatrix : use GGUF to store importance matrices
#9400 commented on Jun 25, 2025 • 6 new comments
llama : initial Mamba-2 support
#9126 commented on Jun 25, 2025 • 0 new comments
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on Jun 19, 2025 • 0 new comments
Error while converting peft finetuned merged model to gguf
#12494 commented on Jun 26, 2025 • 0 new comments
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 commented on Jun 26, 2025 • 0 new comments
Compile bug: Vulkan shaders build fails due to missing vulkan-shaders directory during ExternalProject\_Add configure step
#13753 commented on Jun 26, 2025 • 0 new comments
Compile bug: Vulkan Build Fails in Termux/Proot Due to Missing Cooperative Matrix Shader Variables
#13801 commented on Jun 26, 2025 • 0 new comments
Eval bug: Output garbled on DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf from unsloth using musa backend with VMM off
#13788 commented on Jun 26, 2025 • 0 new comments
ERROR:hf-to-gguf:Model Qwen2_5_VLModel is not supported
#13802 commented on Jun 26, 2025 • 0 new comments
ERROR:hf-to-gguf:Model MllamaForConditionalGeneration is not supported
#13805 commented on Jun 26, 2025 • 0 new comments
Misc. bug: ROCm images cannot be found
#11913 commented on Jun 26, 2025 • 0 new comments
Feature Request: (webui) do not throw away message if there is error in stream
#13709 commented on Jun 25, 2025 • 0 new comments
something with llama_server? slow vs llama_cli
#13560 commented on Jun 25, 2025 • 0 new comments
Misc. bug: RUNPATH properties are not properly set
#13740 commented on Jun 25, 2025 • 0 new comments
Eval bug: terminate called after throwing an instance of 'std::runtime_error' what(): Unexpected empty grammar stack after accepting piece: [control_36]
#13690 commented on Jun 25, 2025 • 0 new comments
Eval bug: Command-A generates a single repeating token when using split mode row on P40
#14228 commented on Jun 24, 2025 • 0 new comments
open source dataset for low bit quantization?
#13736 commented on Jun 24, 2025 • 0 new comments
Add SmolLM3
#14240 commented on Jun 19, 2025 • 0 new comments
MODEL: Falcon-H1 support
#14238 commented on Jun 25, 2025 • 0 new comments
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 commented on Jun 25, 2025 • 0 new comments
ggml : implement op fusion, starting with REGLU/GEGLU/SWIGLU
#14158 commented on Jun 24, 2025 • 0 new comments
ggml: aarch64: Implement SVE Kernels for Int 8 Quantization
#14117 commented on Jun 25, 2025 • 0 new comments
llama : support qwen3 rerank and embeddings
#14029 commented on Jun 26, 2025 • 0 new comments
Add plamo2
#13930 commented on Jun 25, 2025 • 0 new comments
convert: add eagle2 draft arch
#13908 commented on Jun 24, 2025 • 0 new comments
finetune.cpp command-line arg
#13873 commented on Jun 25, 2025 • 0 new comments
musa: enable fp16 mma (all) and cublas on qy2
#13842 commented on Jun 25, 2025 • 0 new comments
cmake : set `RPATH` to `$ORIGIN` on Linux (#13740)
#13741 commented on Jun 25, 2025 • 0 new comments
Granite Four
#13550 commented on Jun 20, 2025 • 0 new comments
cuda: set cuda compiler path (#13527)
#13528 commented on Jun 20, 2025 • 0 new comments
Added dynamic context size. This is perfect for servers running llama models as a service.
#13295 commented on Jun 22, 2025 • 0 new comments
WIP: Add support for CogAgent
#12679 commented on Jun 24, 2025 • 0 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Jun 23, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Jun 23, 2025 • 0 new comments
Eval bug: A100 GPU not working with CUDA 12.8 in llama.cpp
#13609 commented on Jun 21, 2025 • 0 new comments
Eval bug: Output garbled in dual-GPU environment
#13673 commented on Jun 21, 2025 • 0 new comments
Misc. bug: AMX is not ready to be used!
#13678 commented on Jun 21, 2025 • 0 new comments
Eval bug: SYCL branch produces mul_mat bug when trying to run.
#13674 commented on Jun 21, 2025 • 0 new comments
devops/nix: `flake.lock` is very obsolete
#13679 commented on Jun 21, 2025 • 0 new comments
Eval bug: std::regex to split the text
#13691 commented on Jun 21, 2025 • 0 new comments
Feature Request: Support Jina V3 arch
#9585 commented on Jun 20, 2025 • 0 new comments
llama_eval removed, no deprecation info, still referenced in comments
#14271 commented on Jun 20, 2025 • 0 new comments
Eval bug: Cannot load Qwen3 ranking models
#13820 commented on Jun 20, 2025 • 0 new comments
Compile bug: llama.cpp-master/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:80:54:error '_mm256_set_m128i' was not declared in this scope
#11385 commented on Jun 20, 2025 • 0 new comments
Eval bug: repeated output for llama-server
#12782 commented on Jun 20, 2025 • 0 new comments
How to start gemma3 multimodal model service using llama_server
#13465 commented on Jun 20, 2025 • 0 new comments
Eval bug: Not splitting model across rows correctly
#13661 commented on Jun 20, 2025 • 0 new comments
Feature Request: Procedure for reproducing test models
#13662 commented on Jun 20, 2025 • 0 new comments
Feature Request: Llama-bench improvement
#13671 commented on Jun 20, 2025 • 0 new comments
Feature Request: Save Model Name in Conversation Chats (WebUI)
#13570 commented on Jun 19, 2025 • 0 new comments
Feature Request:
#13989 commented on Jun 19, 2025 • 0 new comments
Feature Request: Add keep_alive function for llama-server
#13748 commented on Jun 24, 2025 • 0 new comments
Misc. bug: segfault in test-gbnf-validator
#13762 commented on Jun 24, 2025 • 0 new comments
changelog : `libllama` API
#9289 commented on Jun 23, 2025 • 0 new comments
Eval bug: Server and mtmd both crashing when starting Ultravox
#13727 commented on Jun 23, 2025 • 0 new comments
Eval bug: [CUDA] MoE model (Qwen3-30B-A3B) loads to GPU but does not utilize CUDA for inference in build b5466
#13729 commented on Jun 23, 2025 • 0 new comments
[Tracker] Docker build fails on CI for arm64
#11888 commented on Jun 22, 2025 • 0 new comments
Compile bug: gcc-12: error: unrecognized command-line option ‘-compress-mode=size’
#14260 commented on Jun 22, 2025 • 0 new comments
Misc. bug: Inconsistent Vulkan segfault
#10528 commented on Jun 22, 2025 • 0 new comments
Eval bug: KV cache stopped working in b5554 version
#14071 commented on Jun 22, 2025 • 0 new comments
Feature Request: add jina embeddings model availible convert to gguf
#12327 commented on Jun 22, 2025 • 0 new comments
Feature Request: support for image input in llama-server (and web ui)
#12792 commented on Jun 22, 2025 • 0 new comments
Misc. bug: -sm row results in gibberish output on HIP (ROCm 6.3.3)
#13545 commented on Jun 22, 2025 • 0 new comments
Eval bug: swa_full = true is slower than false
#13683 commented on Jun 22, 2025 • 0 new comments
Eval bug: Server Returns Empty Responses Under High Load
#13703 commented on Jun 22, 2025 • 0 new comments
Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working
#13700 commented on Jun 22, 2025 • 0 new comments
tts : add support for Orpheus
#12476 commented on Jun 21, 2025 • 0 new comments
Misc. bug: Gemma3 multimodal (or all VL models?): </think> tag in the image or PDF text breaks prompt processing (or token generation?)
#14143 commented on Jun 21, 2025 • 0 new comments