-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
44 Releases published by 1 person
-
b5701
published
Jun 19, 2025 -
b5702
published
Jun 19, 2025 -
b5703
published
Jun 19, 2025 -
b5704
published
Jun 19, 2025 -
b5706
published
Jun 19, 2025 -
b5707
published
Jun 19, 2025 -
b5708
published
Jun 19, 2025 -
b5709
published
Jun 19, 2025 -
b5711
published
Jun 19, 2025 -
b5712
published
Jun 20, 2025 -
b5713
published
Jun 20, 2025 -
b5714
published
Jun 20, 2025 -
b5715
published
Jun 20, 2025 -
b5716
published
Jun 20, 2025 -
b5717
published
Jun 20, 2025 -
b5718
published
Jun 20, 2025 -
b5720
published
Jun 20, 2025 -
b5719
published
Jun 20, 2025 -
b5721
published
Jun 20, 2025 -
b5723
published
Jun 20, 2025 -
b5722
published
Jun 20, 2025 -
b5726
published
Jun 20, 2025 -
b5728
published
Jun 21, 2025 -
b5729
published
Jun 21, 2025 -
b5731
published
Jun 21, 2025 -
b5733
published
Jun 22, 2025 -
b5734
published
Jun 22, 2025 -
b5735
published
Jun 22, 2025 -
b5736
published
Jun 22, 2025 -
b5737
published
Jun 22, 2025 -
b5738
published
Jun 22, 2025 -
b5740
published
Jun 22, 2025 -
b5742
published
Jun 23, 2025 -
b5743
published
Jun 23, 2025 -
b5744
published
Jun 23, 2025 -
b5745
published
Jun 23, 2025 -
b5747
published
Jun 24, 2025 -
b5749
published
Jun 24, 2025 -
b5751
published
Jun 24, 2025 -
b5752
published
Jun 24, 2025 -
b5753
published
Jun 24, 2025 -
b5754
published
Jun 25, 2025 -
b5755
published
Jun 25, 2025 -
b5756
published
Jun 25, 2025
55 Pull requests merged by 31 people
-
ggml-cpu: enable IBM NNPA Vector Intrinsics
#14317 merged
Jun 25, 2025 -
ggml : do not output unprintable characters on GGUF load failure
#14381 merged
Jun 25, 2025 -
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices
#13973 merged
Jun 25, 2025 -
opencl: ref count
ggml_backend_opencl_context
and refactor profiling#14254 merged
Jun 24, 2025 -
batch : fix check for empty sequences in memory
#14364 merged
Jun 24, 2025 -
cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION
#14362 merged
Jun 24, 2025 -
docs: Fix server API key doc for /props (move it to /health)
#14352 merged
Jun 24, 2025 -
main : honor --verbose-prompt on interactive prompts
#14350 merged
Jun 24, 2025 -
Add Mistral-Small-3.2-24B-Instruct-2506.jinja
#14349 merged
Jun 24, 2025 -
CUDA/HIP: optimize mmv paths taken for HIP/CDNA
#14324 merged
Jun 23, 2025 -
ci: add workflow for relocatable cmake package
#14346 merged
Jun 23, 2025 -
vulkan: update windows SDK in release.yml
#14344 merged
Jun 23, 2025 -
Fixes for rwkv-world template and the missing inputs.use_jinja in llama-cli
#14336 merged
Jun 23, 2025 -
CUDA: mul_mat_v support for batch sizes > 1
#14262 merged
Jun 23, 2025 -
kv-cells : fix tracking of seq_pos during cache reuse
#14339 merged
Jun 23, 2025 -
vulkan: update windows SDK in CI
#14334 merged
Jun 23, 2025 -
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 merged
Jun 22, 2025 -
gguf-py : fix SpecialVocab parsing when post_processor is null
#14330 merged
Jun 22, 2025 -
run : avoid double tokenization
#14327 merged
Jun 22, 2025 -
examples : fix is_first logic for tokenization
#14329 merged
Jun 22, 2025 -
HIP: enable vec fattn on RDNA4
#14323 merged
Jun 22, 2025 -
mtmd: fix Pixtral OOM with large images by capping image_size to 1024
#14326 merged
Jun 22, 2025 -
common : use std::string_view now that we target c++17
#14319 merged
Jun 22, 2025 -
CUDA: add mean operation
#14313 merged
Jun 22, 2025 -
gguf-py : fix Qwen3-Embedding eos token
#14314 merged
Jun 21, 2025 -
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 merged
Jun 21, 2025 -
gguf-py : fix TemplateProcessing pair when bos/eos is missing
#14312 merged
Jun 21, 2025 -
metal : fix thread-safety
#14300 merged
Jun 21, 2025 -
memory : rename interface to llama_memory_context_i
#14296 merged
Jun 21, 2025 -
Fix Llama 4 conversion
#14311 merged
Jun 21, 2025 -
sync : ggml
#14308 merged
Jun 20, 2025 -
docs : fix the link to
llama.h
#14293 merged
Jun 20, 2025 -
CUDA: add conv_2d_transpose
#14287 merged
Jun 20, 2025 -
lint : remove trailing whitepace
#14304 merged
Jun 20, 2025 -
vocab : prevent tokenizer overflow
#14301 merged
Jun 20, 2025 -
sycl: add usage of enqueue_functions extension
#14244 merged
Jun 20, 2025 -
Implement GGML_CPU_ALL_VARIANTS for PowerPC
#14286 merged
Jun 20, 2025 -
llama : improve sep token handling
#14272 merged
Jun 20, 2025 -
cuda : synchronize graph capture and cublas handle destruction
#14288 merged
Jun 20, 2025 -
ggml : fix repack work size for mul_mat_id
#14292 merged
Jun 20, 2025 -
ggml: Update KleidiAI to v1.9.0
#14277 merged
Jun 20, 2025 -
model : more uniform output id handling
#14275 merged
Jun 20, 2025 -
ubatch : new splitting logic
#14217 merged
Jun 20, 2025 -
CUDA: add conv_2d_dw
#14265 merged
Jun 20, 2025 -
ggml-cpu : remove unnecesary arm feature detection
#14281 merged
Jun 19, 2025 -
gguf-py: Make sentencepiece optional
#14200 merged
Jun 19, 2025 -
server: args for draft model cache types (#11200)
#13782 merged
Jun 19, 2025 -
fix: resolve gcc compile warnings
#14261 merged
Jun 19, 2025 -
sycl: Cleanup codepaths in Get Rows in sycl backend
#14215 merged
Jun 19, 2025 -
llama-bench : add --no-warmup flag (#14224)
#14270 merged
Jun 19, 2025 -
scripts: Fix remote option in Windows (#14102)
#14100 merged
Jun 19, 2025 -
llamafile: support s390x SIMD instruction set
#14273 merged
Jun 19, 2025 -
Vulkan: Fix host-pinned memory for large allocations
#14249 merged
Jun 19, 2025 -
Hybrid recurrent cache
#13979 merged
Jun 19, 2025 -
metal : add mean kernel
#14267 merged
Jun 19, 2025
26 Pull requests opened by 20 people
-
ggml : add ggml_set_rows
#14274 opened
Jun 19, 2025 -
kv-cache : use ggml_set_rows
#14285 opened
Jun 19, 2025 -
Fix Windows Null Pointer Bug and Enhance Memory Operations in ggml-sycl
#14290 opened
Jun 20, 2025 -
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 opened
Jun 21, 2025 -
Conv2D: Add CPU version
#14320 opened
Jun 21, 2025 -
Fix appearance of the chats list context menu for the browser Safari
#14322 opened
Jun 22, 2025 -
make "serve" library
#14331 opened
Jun 22, 2025 -
vulkan: lock accesses of pinned_memory vector
#14333 opened
Jun 22, 2025 -
Make the shell scripts cross-platform
#14341 opened
Jun 23, 2025 -
vulkan: Increase workgroup size for GLU, for performance
#14345 opened
Jun 23, 2025 -
[server] webui DB import and export
#14347 opened
Jun 23, 2025 -
Add script to test op perf and compare
#14354 opened
Jun 24, 2025 -
build: refine toplevel .gitignore
#14355 opened
Jun 24, 2025 -
llama : expose C API to get layer device type
#14358 opened
Jun 24, 2025 -
server : fix assistant prefilling when content is an array
#14360 opened
Jun 24, 2025 -
CUDA: add bf16 and f32 support to cublas_mul_mat_batched
#14361 opened
Jun 24, 2025 -
llama : add high-throughput mode
#14363 opened
Jun 24, 2025 -
ggml : add pointer to attach user data
#14365 opened
Jun 24, 2025 -
vulkan: Add fusion support for RMS_NORM+MUL
#14366 opened
Jun 24, 2025 -
test-backend-ops: add support for specifying output format
#14368 opened
Jun 25, 2025 -
docs: fix broken url in main readme
#14371 opened
Jun 25, 2025 -
Q2k interleaving implementation - x86/x64 SIMD
#14373 opened
Jun 25, 2025 -
webui: preserve partial content when streaming errors occur
#14374 opened
Jun 25, 2025 -
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipeline
#14378 opened
Jun 25, 2025 -
ggml-cpu: Build variant targeting Neoverse-V2
#14380 opened
Jun 25, 2025
47 Issues closed by 10 people
-
Feature Request: Qwen 2.5 VL
#11483 closed
Jun 26, 2025 -
Misc. bug: Model not loaded on Android with NDK
#13399 closed
Jun 26, 2025 -
Eval bug: I cannot run llama 405b on CPU
#13475 closed
Jun 26, 2025 -
web UI either doesn't scroll or jumps to the wrong element
#13479 closed
Jun 26, 2025 -
Partial offload support for training
#13486 closed
Jun 26, 2025 -
Misc. bug: ggml_cuda_compute_forward: MUL failed ROCm error: invalid device function
#14370 closed
Jun 25, 2025 -
Misc. bug: llama-server slower on 4bit quantized model with f470bc36bed
#14235 closed
Jun 25, 2025 -
Misc. bug: Completions hang after CUDA error, but health endpoint reports all OK
#13281 closed
Jun 25, 2025 -
Misc. bug: The web UI of llama-server is not displaying correctly.
#13428 closed
Jun 25, 2025 -
Compile bug: ld returned 1 exit status (file bigger than 2gb)
#13446 closed
Jun 25, 2025 -
Drop support for sentencepiece
#13448 closed
Jun 25, 2025 -
Misc. bug: Illegal CUDA memory access in ggml_backend_cuda_cpy_tensor_async
#13449 closed
Jun 25, 2025 -
Feature Request: add draft model in llama-bench and more.
#13456 closed
Jun 25, 2025 -
Misc. bug: llama-server webui overriding command line parameters
#13277 closed
Jun 24, 2025 -
Eval bug: Regex
#13347 closed
Jun 24, 2025 -
Differential mode for llama-bench + plotting code
#13408 closed
Jun 24, 2025 -
Eval bug: Qwen3-30B-A3B-Q4_K_M: Slows down when using the \no_think mode.
#13427 closed
Jun 24, 2025 -
Eval bug: llama-speculative core dump with Qwen3, GGML_ASSERT(batch.n_tokens > 0) failed
#13433 closed
Jun 24, 2025 -
Misc. bug: Completion fails with error 500
#14298 closed
Jun 23, 2025 -
Eval bug: exmaple llama-simple-chat run failed in Android
#14253 closed
Jun 23, 2025 -
Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation
#13404 closed
Jun 23, 2025 -
Misc. bug: convert: AttributeError: 'NoneType' object has no attribute 'get'
#14328 closed
Jun 22, 2025 -
mtmd: Eval bug: Mistral small 2506 needs 1024x1024 image size cap
#14310 closed
Jun 22, 2025 -
Feature Request: Add Support for ModernBert
#11282 closed
Jun 22, 2025 -
Feature Request: allow setting jinja chat template from server webui
#11689 closed
Jun 22, 2025 -
Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
#13160 closed
Jun 22, 2025 -
Token Generation Speed Decline with GGUF Models on M3 Ultra
#13373 closed
Jun 22, 2025 -
Compile bug: ninja: build stopped: subcommand failed.
#13375 closed
Jun 22, 2025 -
Misc. bug: invalid regex grammar causes segment violation
#13390 closed
Jun 22, 2025 -
Feature Request: fix handling of Qwen3-Embedding-0.6B input to add EOS token
#14252 closed
Jun 21, 2025 -
Not able to convert hf models to gguf anymore
#14315 closed
Jun 21, 2025 -
Llama 4 mmproj fails `unable to find tensor mm.model.fc.weight`
#14237 closed
Jun 21, 2025 -
Eval bug: IQ2_M broken for mradermacher / Llama-4-Maverick-17B-128E-Instruct-GGUF
#12913 closed
Jun 21, 2025 -
Feature Request: tensor split needs control over where CPU layers go
#13314 closed
Jun 21, 2025 -
Misc. bug: error in remote conversion for the new ServiceNow Nemotron 15B model
#13354 closed
Jun 21, 2025 -
Compile bug: clang-18.1.3 compile fail (vsetivli)
#13358 closed
Jun 21, 2025 -
Misc. bug: Extended swap/unswap times when loading large models on Apple Silicon
#13361 closed
Jun 21, 2025 -
Misc. bug: `ggml_backend_reg_count()` returns 0 on linux from b5587 and onwards
#14302 closed
Jun 20, 2025 -
Misc. bug: RPC immediate closing of the connection
#14307 closed
Jun 20, 2025 -
Compile bug: [blas] choose blas backend to run llama2-7b model, but system info doesn't have the blas flag.
#14259 closed
Jun 20, 2025 -
Eval bug: Abort is called in a thread from a custom thread pool during a llama_decode call
#13990 closed
Jun 20, 2025 -
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 closed
Jun 20, 2025 -
Misc. bug: batch in the mtmd-cli.cpp not freed
#13620 closed
Jun 19, 2025 -
Feature Request: Add --no-warmup to llama-bench
#14224 closed
Jun 19, 2025 -
Misc. bug: option --remote of convert_hf_to_gguf.py does not work in Windows
#14102 closed
Jun 19, 2025
15 Issues opened by 15 people
-
Feature Request: allow running llama with an idle (lowest) priority as well
#14382 opened
Jun 25, 2025 -
Feature Request: Exclude thinking tokens from server cache for reasoning models
#14379 opened
Jun 25, 2025 -
Request for Official Support of AMD Ryzen AI Platform NPU
#14377 opened
Jun 25, 2025 -
Compile error for ggml_gemv_q4_K_8x8_q8_K on Intel x86_64 MacOS (AVX2)
#14372 opened
Jun 25, 2025 -
Misc. bug: Inconsistent Gemma3 implementation in rope factor
#14367 opened
Jun 24, 2025 -
Feature Request: Suggest to provide armv7l version to run on Raspberry Pi devices.
#14348 opened
Jun 23, 2025 -
OpenCL backend with Qualcomm Adreno GPUs load time is too long
#14337 opened
Jun 23, 2025 -
Eval bug: Program crashes during long input inference when batch size is set to 16384
#14325 opened
Jun 22, 2025 -
Feature Request: Add support for moonshotai/Kimi-VL-A3B-Instruct
#14318 opened
Jun 21, 2025 -
mtmd: Any plan for mtmd to support video input and audio output?
#14295 opened
Jun 20, 2025 -
Eval bug: [CANN]AutoDL Ascend 910B instance running DeepSeek-r1 32B_Q8 error
#14291 opened
Jun 20, 2025 -
Eval bug: Inconsistent Embedding Similarity between llama-server and LlamaCppEmbeddings for BGE-M3 Model
#14280 opened
Jun 19, 2025
69 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
Jun 25, 2025 • 11 new comments -
imatrix : use GGUF to store importance matrices
#9400 commented on
Jun 25, 2025 • 6 new comments -
llama : initial Mamba-2 support
#9126 commented on
Jun 25, 2025 • 0 new comments -
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on
Jun 19, 2025 • 0 new comments -
Error while converting peft finetuned merged model to gguf
#12494 commented on
Jun 26, 2025 • 0 new comments -
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 commented on
Jun 26, 2025 • 0 new comments -
Compile bug: Vulkan shaders build fails due to missing vulkan-shaders directory during ExternalProject\_Add configure step
#13753 commented on
Jun 26, 2025 • 0 new comments -
Compile bug: Vulkan Build Fails in Termux/Proot Due to Missing Cooperative Matrix Shader Variables
#13801 commented on
Jun 26, 2025 • 0 new comments -
Eval bug: Output garbled on DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf from unsloth using musa backend with VMM off
#13788 commented on
Jun 26, 2025 • 0 new comments -
ERROR:hf-to-gguf:Model Qwen2_5_VLModel is not supported
#13802 commented on
Jun 26, 2025 • 0 new comments -
ERROR:hf-to-gguf:Model MllamaForConditionalGeneration is not supported
#13805 commented on
Jun 26, 2025 • 0 new comments -
Misc. bug: ROCm images cannot be found
#11913 commented on
Jun 26, 2025 • 0 new comments -
Feature Request: (webui) do not throw away message if there is error in stream
#13709 commented on
Jun 25, 2025 • 0 new comments -
something with llama_server? slow vs llama_cli
#13560 commented on
Jun 25, 2025 • 0 new comments -
Misc. bug: RUNPATH properties are not properly set
#13740 commented on
Jun 25, 2025 • 0 new comments -
Eval bug: terminate called after throwing an instance of 'std::runtime_error' what(): Unexpected empty grammar stack after accepting piece: [control_36]
#13690 commented on
Jun 25, 2025 • 0 new comments -
Eval bug: Command-A generates a single repeating token when using split mode row on P40
#14228 commented on
Jun 24, 2025 • 0 new comments -
open source dataset for low bit quantization?
#13736 commented on
Jun 24, 2025 • 0 new comments -
Add SmolLM3
#14240 commented on
Jun 19, 2025 • 0 new comments -
MODEL: Falcon-H1 support
#14238 commented on
Jun 25, 2025 • 0 new comments -
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 commented on
Jun 25, 2025 • 0 new comments -
ggml : implement op fusion, starting with REGLU/GEGLU/SWIGLU
#14158 commented on
Jun 24, 2025 • 0 new comments -
ggml: aarch64: Implement SVE Kernels for Int 8 Quantization
#14117 commented on
Jun 25, 2025 • 0 new comments -
llama : support qwen3 rerank and embeddings
#14029 commented on
Jun 26, 2025 • 0 new comments -
Add plamo2
#13930 commented on
Jun 25, 2025 • 0 new comments -
convert: add eagle2 draft arch
#13908 commented on
Jun 24, 2025 • 0 new comments -
finetune.cpp command-line arg
#13873 commented on
Jun 25, 2025 • 0 new comments -
musa: enable fp16 mma (all) and cublas on qy2
#13842 commented on
Jun 25, 2025 • 0 new comments -
cmake : set `RPATH` to `$ORIGIN` on Linux (#13740)
#13741 commented on
Jun 25, 2025 • 0 new comments -
Granite Four
#13550 commented on
Jun 20, 2025 • 0 new comments -
cuda: set cuda compiler path (#13527)
#13528 commented on
Jun 20, 2025 • 0 new comments -
Added dynamic context size. This is perfect for servers running llama models as a service.
#13295 commented on
Jun 22, 2025 • 0 new comments -
WIP: Add support for CogAgent
#12679 commented on
Jun 24, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Jun 23, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Jun 23, 2025 • 0 new comments -
Eval bug: A100 GPU not working with CUDA 12.8 in llama.cpp
#13609 commented on
Jun 21, 2025 • 0 new comments -
Eval bug: Output garbled in dual-GPU environment
#13673 commented on
Jun 21, 2025 • 0 new comments -
Misc. bug: AMX is not ready to be used!
#13678 commented on
Jun 21, 2025 • 0 new comments -
Eval bug: SYCL branch produces mul_mat bug when trying to run.
#13674 commented on
Jun 21, 2025 • 0 new comments -
devops/nix: `flake.lock` is very obsolete
#13679 commented on
Jun 21, 2025 • 0 new comments -
Eval bug: std::regex to split the text
#13691 commented on
Jun 21, 2025 • 0 new comments -
Feature Request: Support Jina V3 arch
#9585 commented on
Jun 20, 2025 • 0 new comments -
llama_eval removed, no deprecation info, still referenced in comments
#14271 commented on
Jun 20, 2025 • 0 new comments -
Eval bug: Cannot load Qwen3 ranking models
#13820 commented on
Jun 20, 2025 • 0 new comments -
Compile bug: llama.cpp-master/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:80:54:error '_mm256_set_m128i' was not declared in this scope
#11385 commented on
Jun 20, 2025 • 0 new comments -
Eval bug: repeated output for llama-server
#12782 commented on
Jun 20, 2025 • 0 new comments -
How to start gemma3 multimodal model service using llama_server
#13465 commented on
Jun 20, 2025 • 0 new comments -
Eval bug: Not splitting model across rows correctly
#13661 commented on
Jun 20, 2025 • 0 new comments -
Feature Request: Procedure for reproducing test models
#13662 commented on
Jun 20, 2025 • 0 new comments -
Feature Request: Llama-bench improvement
#13671 commented on
Jun 20, 2025 • 0 new comments -
Feature Request: Save Model Name in Conversation Chats (WebUI)
#13570 commented on
Jun 19, 2025 • 0 new comments -
Feature Request:
#13989 commented on
Jun 19, 2025 • 0 new comments -
Feature Request: Add keep_alive function for llama-server
#13748 commented on
Jun 24, 2025 • 0 new comments -
Misc. bug: segfault in test-gbnf-validator
#13762 commented on
Jun 24, 2025 • 0 new comments -
changelog : `libllama` API
#9289 commented on
Jun 23, 2025 • 0 new comments -
Eval bug: Server and mtmd both crashing when starting Ultravox
#13727 commented on
Jun 23, 2025 • 0 new comments -
Eval bug: [CUDA] MoE model (Qwen3-30B-A3B) loads to GPU but does not utilize CUDA for inference in build b5466
#13729 commented on
Jun 23, 2025 • 0 new comments -
[Tracker] Docker build fails on CI for arm64
#11888 commented on
Jun 22, 2025 • 0 new comments -
Compile bug: gcc-12: error: unrecognized command-line option ‘-compress-mode=size’
#14260 commented on
Jun 22, 2025 • 0 new comments -
Misc. bug: Inconsistent Vulkan segfault
#10528 commented on
Jun 22, 2025 • 0 new comments -
Eval bug: KV cache stopped working in b5554 version
#14071 commented on
Jun 22, 2025 • 0 new comments -
Feature Request: add jina embeddings model availible convert to gguf
#12327 commented on
Jun 22, 2025 • 0 new comments -
Feature Request: support for image input in llama-server (and web ui)
#12792 commented on
Jun 22, 2025 • 0 new comments -
Misc. bug: -sm row results in gibberish output on HIP (ROCm 6.3.3)
#13545 commented on
Jun 22, 2025 • 0 new comments -
Eval bug: swa_full = true is slower than false
#13683 commented on
Jun 22, 2025 • 0 new comments -
Eval bug: Server Returns Empty Responses Under High Load
#13703 commented on
Jun 22, 2025 • 0 new comments -
Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working
#13700 commented on
Jun 22, 2025 • 0 new comments -
tts : add support for Orpheus
#12476 commented on
Jun 21, 2025 • 0 new comments -
Misc. bug: Gemma3 multimodal (or all VL models?): </think> tag in the image or PDF text breaks prompt processing (or token generation?)
#14143 commented on
Jun 21, 2025 • 0 new comments