Skip to content
Navigation Menu
Toggle navigation
Sign in
Appearance settings
Platform
AI CODE CREATION
GitHub Copilot
Write better code with AI
GitHub Spark
Build and deploy intelligent apps
GitHub Models
Manage and compare prompts
MCP Registry
New
Integrate external tools
DEVELOPER WORKFLOWS
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
APPLICATION SECURITY
GitHub Advanced Security
Find and fix vulnerabilities
Code security
Secure your code as you build
Secret protection
Stop leaks before they start
EXPLORE
Why GitHub
Documentation
Blog
Changelog
Marketplace
View all features
Solutions
BY COMPANY SIZE
Enterprises
Small and medium teams
Startups
Nonprofits
BY USE CASE
App Modernization
DevSecOps
DevOps
CI/CD
View all use cases
BY INDUSTRY
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
EXPLORE BY TOPIC
AI
Software Development
DevOps
Security
View all topics
EXPLORE BY TYPE
Customer stories
Events & webinars
Ebooks & reports
Business insights
GitHub Skills
SUPPORT & SERVICES
Documentation
Customer support
Community forum
Trust center
Partners
Open Source
COMMUNITY
GitHub Sponsors
Fund open source developers
PROGRAMS
Security Lab
Maintainer Community
Accelerator
Archive Program
REPOSITORIES
Topics
Trending
Collections
Enterprise
ENTERPRISE SOLUTIONS
Enterprise platform
AI-powered developer platform
AVAILABLE ADD-ONS
GitHub Advanced Security
Enterprise-grade security features
Copilot for Business
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
Appearance settings
Resetting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
ggml-org
/
llama.cpp
Public
Notifications
You must be signed in to change notification settings
Fork
14.1k
Star
91.2k
Code
Issues
335
Pull requests
617
Discussions
Actions
Projects
10
Wiki
Security
Uh oh!
There was an error while loading.
Please reload this page
.
Insights
Additional navigation options
Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights
changelog :
libllama
API
#9289 ·
ggerganov
opened
on Sep 3, 2024
12
changelog :
llama-server
REST API
#9291 ·
ggerganov
opened
on Sep 3, 2024
18
tutorials : list for llama.cpp
#13523 ·
ggerganov
opened
on May 14, 2025
16
Issues
Search Issues
state
:
open
label
:
"generation quality"
state:open label:"generation quality"
Search
Labels
Milestones
New issue
Search results
Open
Closed
quantize : configurable neutral imatrix prior
examples
generation quality
Quality of model output
Quality of model output
need feedback
Testing and feedback with results are needed
Testing and feedback with results are needed
research 🔬
Status: Draft (not ready).
ggml-org/llama.cpp
number 15060
#15060
In ggml-org/llama.cpp;
·
compilade
opened
on Aug 3, 2025
ggml-quants : weighted rounding algorithms with cumulative search
generation quality
Quality of model output
Quality of model output
ggml
changes relating to the ggml tensor library for machine learning
changes relating to the ggml tensor library for machine learning
Less than 4 bits
Efforts related to viable quantized models using <4 bits
Efforts related to viable quantized models using <4 bits
research 🔬
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
Generally require more time to grok but manageable by beginner to medium expertise level
Tensor Encoding Scheme
https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
Status: Draft (not ready).
ggml-org/llama.cpp
number 12557
#12557
In ggml-org/llama.cpp;
·
compilade
opened
on Mar 25, 2025
Smooth Sampling / Quadratic Sampling support
generation quality
Quality of model output
Quality of model output
performance
Speed related topics
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
Generally require indepth knowledge of LLMs or GPUs
Status: Open (in progress).
ggml-org/llama.cpp
number 6445
#6445
In ggml-org/llama.cpp;
·
kalomaze
opened
on Apr 2, 2024
P-Step Truncation Sampling
generation quality
Quality of model output
Quality of model output
need feedback
Testing and feedback with results are needed
Testing and feedback with results are needed
refactoring
Refactoring
Refactoring
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
Generally require indepth knowledge of LLMs or GPUs
Status: Open (in progress).
ggml-org/llama.cpp
number 5675
#5675
In ggml-org/llama.cpp;
·
p-e-w
opened
on Feb 23, 2024
[RFC] common, server : add top-a sampler
enhancement
New feature or request
New feature or request
generation quality
Quality of model output
Quality of model output
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
Generally require indepth knowledge of LLMs or GPUs
Status: Open (in progress).
ggml-org/llama.cpp
number 5612
#5612
In ggml-org/llama.cpp;
·
Artefact2
opened
on Feb 20, 2024
Penalty threshold: A mechanism for improving repetition penalties
enhancement
New feature or request
New feature or request
generation quality
Quality of model output
Quality of model output
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
Generally require more time to grok but manageable by beginner to medium expertise level
Status: Open (in progress).
ggml-org/llama.cpp
number 5561
#5561
In ggml-org/llama.cpp;
·
p-e-w
opened
on Feb 18, 2024
llama : combined beam search + grammar sampling strategy
generation quality
Quality of model output
Quality of model output
good first issue
Good for newcomers
Good for newcomers
research 🔬
roadmap
Part of a roadmap project
Part of a roadmap project
Status: Open.
#2923
In ggml-org/llama.cpp;
·
ggerganov
opened
on Aug 31, 2023
llama : tool for evaluating quantization results per layer
enhancement
New feature or request
New feature or request
generation quality
Quality of model output
Quality of model output
roadmap
Part of a roadmap project
Part of a roadmap project
Status: Open.
#2783
In ggml-org/llama.cpp;
·
ggerganov
opened
on Aug 25, 2023
Implementation of a sequence repetition penalty sampler
enhancement
New feature or request
New feature or request
generation quality
Quality of model output
Quality of model output
need feedback
Testing and feedback with results are needed
Testing and feedback with results are needed
Status: Draft (not ready).
ggml-org/llama.cpp
number 2593
#2593
In ggml-org/llama.cpp;
·
KerfuffleV2
opened
on Aug 12, 2023
Study how LM Evaluation Harness works and try to implement it
enhancement
New feature or request
New feature or request
generation quality
Quality of model output
Quality of model output
help wanted
Needs help from the community
Needs help from the community
high priority
Very important issue
Very important issue
research 🔬
Status: Open.
#231
In ggml-org/llama.cpp;
·
ggerganov
opened
on Mar 17, 2023
You can’t perform that action at this time.