What version of Codex CLI is running?
codex-cli 0.122.0
What subscription do you have?
Pro
Which model were you using?
gpt-5.4
What platform is your computer?
Microsoft Windows NT 10.0.19045.0 x64
What terminal emulator and version are you using (if applicable)?
Windows Terminal, PowerShell 7
What issue are you seeing?
This is related to #18203, which reports the app-server outbound websocket queue disconnect trigger. This issue is specifically about the TUI stale-state/reconciliation failure after that kind of disconnect: the server-side thread can be completed/idle, while the TUI remains in Working state and routes the next prompt as turn/steer against the completed turn.
In the captured reproduction, the TUI websocket connection saw a normal turn start:
seq 53 TUI -> app-server turn/start id=6
seq 54 app-server -> TUI response id=6 result.turn.status=inProgress
seq 55 app-server -> TUI thread/status/changed status=active
seq 56 app-server -> TUI turn/started
seq 57-247 app-server -> TUI item/hook/output frames for that turn
Then app-server stderr reported:
WARN codex_app_server::transport
disconnecting slow connection after outbound queue filled: ConnectionId(0)
It was followed by 65 dropping message for disconnected connection: ConnectionId(0) warnings.
The TUI relay log never received the terminal lifecycle frames for that turn:
- no
turn/completed
- no idle
thread/status/changed
At the same time, a separate passive observer against the same app-server reported the authoritative thread state as:
{
"threadStatusType": "idle",
"turnCount": 1,
"latestTurnStatus": "completed",
"inProgressTurnCount": 0,
"activeTurnId": null
}
A direct websocket turn/start against the same app-server and same thread then completed successfully:
{
"directTurnStartStatus": "inProgress",
"completed": true,
"notificationCounts": {
"turn/started": 1,
"turn/completed": 1,
"thread/status/changed": 2
}
}
This suggests the app-server/thread was healthy, while the original TUI connection retained stale active-turn state.
When a later visible prompt was entered into the stale TUI, the TUI sent:
TUI -> app-server turn/steer id=7
expectedTurnId=<previous completed turn id>
No response to that turn/steer request was observed in the TUI relay log.
Evidence chain:
| Claim |
Evidence |
| TUI started a normal turn |
Stale relay metadata: turn/start, response inProgress, active status, turn/started |
| app-server hit outbound websocket backpressure |
Transport analysis: one disconnecting slow connection after outbound queue filled warning |
| app-server dropped later messages for that disconnected connection |
Transport analysis: 65 dropped-message warnings for the same connection id |
| stale TUI did not receive terminal lifecycle frames |
Stale relay analysis: zero turn/completed, no idle status delivered |
| server-side thread was actually complete/idle |
Passive observer summary: threadStatusType=idle, latestTurnStatus=completed, inProgressTurnCount=0, activeTurnId=null |
| app-server/thread were still capable of work |
Direct websocket probe: new turn/start completed with turn/started, turn/completed, and status updates |
| same wrapper/relay path can complete normally |
Clean control: two turn/start, two turn/started, two turn/completed, zero turn/steer, zero backpressure events |
What steps can reproduce the bug?
I do not have a minimal upstream-only repro script for the stale-state recovery part. The strongest reproduction used a local remote-TUI wrapper/relay and a turn that produced a burst of output frames large enough to fill the app-server outbound websocket queue.
The queue-fill disconnect trigger itself is already reported with an upstream-only reproduction in #18203. The additional observation here is that after such a disconnect, the TUI can remain stale rather than clearly exiting/reconciling.
The useful maintainer-side repro direction is likely:
- Start Codex TUI in remote app-server websocket mode.
- Put a slow/throttled websocket client or proxy between the TUI and app-server.
- Run a turn that emits many output delta frames.
- Observe whether the app-server logs the slow-connection disconnect.
- Check whether the TUI exits/reconciles, or instead remains in
Working and routes the next prompt as turn/steer for the previous turn id.
In my captured Worker 06 reproduction, the relevant thread id was:
019db067-8e04-71e0-a0a4-e1106ee75148
The initial stale turn id was:
019db068-4be4-7063-a7dc-55d20ed439dd
The later stale turn/steer used that same completed turn id as expectedTurnId.
What is the expected behavior?
After app-server transport disconnects a slow websocket client, the remote TUI should do one of the following:
- receive a close/error and exit clearly;
- reconnect/resume and reconcile with authoritative server thread state;
- clear stale active-turn/running state before accepting the next user prompt.
It should not continue accepting prompts while still believing a completed turn is active.
Additional information
Mechanical analysis for the stale run:
{
"frameCount": 248,
"malformedLineCount": 0,
"turnStartCount": 1,
"turnStartedCount": 1,
"turnCompletedCount": 0,
"turnSteerCount": 1,
"staleTurnStateSuspected": true
}
App-server transport analysis for the stale run:
{
"slowConnectionDisconnectCount": 1,
"droppedDisconnectedMessageCount": 65,
"connectionIds": ["0"],
"backpressureDisconnectObserved": true
}
Clean control under the same wrapper/relay instrumentation:
{
"turnStartCount": 2,
"turnStartedCount": 2,
"turnCompletedCount": 2,
"turnSteerCount": 0,
"staleTurnStateSuspected": false,
"backpressureDisconnectObserved": false
}
Likely source areas:
- App-server bounded outbound queue and slow-client disconnect:
codex-rs/app-server/src/transport/mod.rs
- Websocket close/EOF propagation:
codex-rs/app-server/src/transport/websocket.rs
codex-rs/app-server-client/src/remote.rs
- TUI routing of next input as
turn/steer based on cached active turn id:
codex-rs/tui/src/app/thread_routing.rs
- Clearing active turn and visible
Working state after turn/completed:
codex-rs/tui/src/app/thread_events.rs
codex-rs/tui/src/chatwidget.rs
More detailed source links and a claim-to-evidence map are included in the attached redacted evidence package.
Hypothesis:
The trigger is app-server websocket outbound backpressure from a burst of output frames. The app-server intentionally disconnects the slow websocket and drops later messages for that connection. The TUI then misses turn/completed and the idle status frame, leaving both client-side state machines stale:
ThreadEventStore.active_turn_id remains set, so the next prompt is routed as turn/steer.
ChatWidget.agent_turn_running remains true, so the visible UI can remain in Working / queued-input mode.
Possible regression tests:
- TUI/client test: simulate loss of
turn/completed after turn/started, then verify the next prompt cannot be silently routed as turn/steer against a completed/non-active turn.
- Remote-client test: force the websocket read side to receive close/error/EOF after app-server disconnect and verify
AppServerEvent::Disconnected reaches the TUI fatal-exit/reconciliation path.
- App-server transport test: fill a per-connection outbound queue and verify the disconnect behavior is observable by that client, or that terminal lifecycle notifications cannot leave the client in stale state.
- Protocol test:
turn/steer with expectedTurnId for a completed turn should return a response/error that allows the TUI to clear stale state and start a fresh turn.
Attachment package:
I am attaching codex-stale-tui-evidence-redacted.zip.
It contains:
triage-summary.md - one-page maintainer summary.
claim-to-evidence-map.md - each claim mapped to exact redacted evidence.
worker-run-matrix.md - all supervised worker runs and outcomes.
source-analysis.md - upstream source pointers and inferred failure path.
redaction-report.md - what was removed from raw evidence.
stale-run-relay-analysis.redacted.json
stale-run-relay-metadata.jsonl
stale-run-transport-events.redacted.json
stale-run-observer-summary.redacted.json
direct-turn-probe-summary.redacted.json
clean-control-relay-analysis.redacted.json
clean-control-relay-metadata.jsonl
clean-control-transport-analysis.redacted.json
screenshots-redacted/*.png
Raw logs are not attached because they contain local paths, prompt text, command transcripts, hook paths, private repository remotes, and unrelated large remote response bodies. The package contains derived redacted metadata, summaries, and aggressively redacted screenshots.
Redacted screenshots can be inlined separately if useful; the protocol evidence in the zip is the primary evidence.
codex-stale-tui-evidence-redacted.zip
What version of Codex CLI is running?
codex-cli 0.122.0
What subscription do you have?
Pro
Which model were you using?
gpt-5.4
What platform is your computer?
Microsoft Windows NT 10.0.19045.0 x64
What terminal emulator and version are you using (if applicable)?
Windows Terminal, PowerShell 7
What issue are you seeing?
This is related to #18203, which reports the app-server outbound websocket queue disconnect trigger. This issue is specifically about the TUI stale-state/reconciliation failure after that kind of disconnect: the server-side thread can be completed/idle, while the TUI remains in
Workingstate and routes the next prompt asturn/steeragainst the completed turn.In the captured reproduction, the TUI websocket connection saw a normal turn start:
Then app-server stderr reported:
It was followed by 65
dropping message for disconnected connection: ConnectionId(0)warnings.The TUI relay log never received the terminal lifecycle frames for that turn:
turn/completedthread/status/changedAt the same time, a separate passive observer against the same app-server reported the authoritative thread state as:
{ "threadStatusType": "idle", "turnCount": 1, "latestTurnStatus": "completed", "inProgressTurnCount": 0, "activeTurnId": null }A direct websocket
turn/startagainst the same app-server and same thread then completed successfully:{ "directTurnStartStatus": "inProgress", "completed": true, "notificationCounts": { "turn/started": 1, "turn/completed": 1, "thread/status/changed": 2 } }This suggests the app-server/thread was healthy, while the original TUI connection retained stale active-turn state.
When a later visible prompt was entered into the stale TUI, the TUI sent:
No response to that
turn/steerrequest was observed in the TUI relay log.Evidence chain:
turn/start, responseinProgress, active status,turn/starteddisconnecting slow connection after outbound queue filledwarningturn/completed, no idle status deliveredthreadStatusType=idle,latestTurnStatus=completed,inProgressTurnCount=0,activeTurnId=nullturn/startcompleted withturn/started,turn/completed, and status updatesturn/start, twoturn/started, twoturn/completed, zeroturn/steer, zero backpressure eventsWhat steps can reproduce the bug?
I do not have a minimal upstream-only repro script for the stale-state recovery part. The strongest reproduction used a local remote-TUI wrapper/relay and a turn that produced a burst of output frames large enough to fill the app-server outbound websocket queue.
The queue-fill disconnect trigger itself is already reported with an upstream-only reproduction in #18203. The additional observation here is that after such a disconnect, the TUI can remain stale rather than clearly exiting/reconciling.
The useful maintainer-side repro direction is likely:
Workingand routes the next prompt asturn/steerfor the previous turn id.In my captured Worker 06 reproduction, the relevant thread id was:
The initial stale turn id was:
The later stale
turn/steerused that same completed turn id asexpectedTurnId.What is the expected behavior?
After app-server transport disconnects a slow websocket client, the remote TUI should do one of the following:
It should not continue accepting prompts while still believing a completed turn is active.
Additional information
Mechanical analysis for the stale run:
{ "frameCount": 248, "malformedLineCount": 0, "turnStartCount": 1, "turnStartedCount": 1, "turnCompletedCount": 0, "turnSteerCount": 1, "staleTurnStateSuspected": true }App-server transport analysis for the stale run:
{ "slowConnectionDisconnectCount": 1, "droppedDisconnectedMessageCount": 65, "connectionIds": ["0"], "backpressureDisconnectObserved": true }Clean control under the same wrapper/relay instrumentation:
{ "turnStartCount": 2, "turnStartedCount": 2, "turnCompletedCount": 2, "turnSteerCount": 0, "staleTurnStateSuspected": false, "backpressureDisconnectObserved": false }Likely source areas:
codex-rs/app-server/src/transport/mod.rscodex-rs/app-server/src/transport/websocket.rscodex-rs/app-server-client/src/remote.rsturn/steerbased on cached active turn id:codex-rs/tui/src/app/thread_routing.rsWorkingstate afterturn/completed:codex-rs/tui/src/app/thread_events.rscodex-rs/tui/src/chatwidget.rsMore detailed source links and a claim-to-evidence map are included in the attached redacted evidence package.
Hypothesis:
The trigger is app-server websocket outbound backpressure from a burst of output frames. The app-server intentionally disconnects the slow websocket and drops later messages for that connection. The TUI then misses
turn/completedand the idle status frame, leaving both client-side state machines stale:ThreadEventStore.active_turn_idremains set, so the next prompt is routed asturn/steer.ChatWidget.agent_turn_runningremains true, so the visible UI can remain inWorking/ queued-input mode.Possible regression tests:
turn/completedafterturn/started, then verify the next prompt cannot be silently routed asturn/steeragainst a completed/non-active turn.AppServerEvent::Disconnectedreaches the TUI fatal-exit/reconciliation path.turn/steerwithexpectedTurnIdfor a completed turn should return a response/error that allows the TUI to clear stale state and start a fresh turn.Attachment package:
I am attaching
codex-stale-tui-evidence-redacted.zip.It contains:
triage-summary.md- one-page maintainer summary.claim-to-evidence-map.md- each claim mapped to exact redacted evidence.worker-run-matrix.md- all supervised worker runs and outcomes.source-analysis.md- upstream source pointers and inferred failure path.redaction-report.md- what was removed from raw evidence.stale-run-relay-analysis.redacted.jsonstale-run-relay-metadata.jsonlstale-run-transport-events.redacted.jsonstale-run-observer-summary.redacted.jsondirect-turn-probe-summary.redacted.jsonclean-control-relay-analysis.redacted.jsonclean-control-relay-metadata.jsonlclean-control-transport-analysis.redacted.jsonscreenshots-redacted/*.pngRaw logs are not attached because they contain local paths, prompt text, command transcripts, hook paths, private repository remotes, and unrelated large remote response bodies. The package contains derived redacted metadata, summaries, and aggressively redacted screenshots.
Redacted screenshots can be inlined separately if useful; the protocol evidence in the zip is the primary evidence.
codex-stale-tui-evidence-redacted.zip