Feature Request: Exclude thinking tokens from server cache for reasoning models

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Remove the thinking tokens from server cache if webui also excludes thinking tokens

Motivation

For webui there is an option to remove thinking tokens when sending to server, but server still cache the thinking tokens untill the user sends the new prompt without thinking tokens. This works fine but will make server reprocess the last response and also wastes context size.

Possible Implementation

First find when tokens is part of the thinking tokens. Remove thinking tokens in cached tokens and kv cache in llama server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Exclude thinking tokens from server cache for reasoning models #14379

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Exclude thinking tokens from server cache for reasoning models #14379

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions