You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Remove the thinking tokens from server cache if webui also excludes thinking tokens
Motivation
For webui there is an option to remove thinking tokens when sending to server, but server still cache the thinking tokens untill the user sends the new prompt without thinking tokens. This works fine but will make server reprocess the last response and also wastes context size.
Possible Implementation
First find when tokens is part of the thinking tokens. Remove thinking tokens in cached tokens and kv cache in llama server.