Skip to content

feat: add fire + list_claimable + claim + claim_next#23

Merged
govindkavaturi-art merged 1 commit into
mainfrom
feat/fire-and-claim-methods
May 4, 2026
Merged

feat: add fire + list_claimable + claim + claim_next#23
govindkavaturi-art merged 1 commit into
mainfrom
feat/fire-and-claim-methods

Conversation

@mikemolinet
Copy link
Copy Markdown
Collaborator

Same change as the (now-closed) fork PR #22. Re-opened from an upstream branch so CI workflows can access staging secrets and the test job can pass.

Summary

Brings cueapi-sdk toward parity with @cueapi/mcp 0.3.0 + 0.4.0 surface. Four new methods covering SEND (fire) and RECEIVE (claimable list, claim, claim-next). Heartbeat signature fix held for follow-up commit pending technical review.

What's added

CuesResource:

  • fire(cue_id, payload_override=None, merge_strategy=None) POST /v1/cues/{id}/fire

ExecutionsResource:

  • list_claimable(task=None, agent=None) GET /v1/executions/claimable?task=&agent= (server-side filter)
  • claim(execution_id, worker_id=...) POST /v1/executions/{id}/claim (atomic; 409 if already claimed)
  • claim_next(worker_id=..., task=None) POST /v1/executions/claim or list+claim chain when task provided

Notable

  • list_claimable is server-side query params, not client-side filter. Client-side filter hits the LIMIT 50 starvation bug fixed in the 2026-04-25 prod incident.
  • claim_next(task=...) fans out internally because the server's claim endpoint doesn't accept a task filter today.

Tests

42 unit tests pass (was 30; +12 net). Mirrors the existing MagicMock pattern in tests/test_executions_resource.py.

Version

0.1.3 -> 0.2.0. Aligned cueapi/init.py (had drifted to 0.1.2) with pyproject.toml at the same time.

Pending follow-up (NOT in this PR)

ExecutionsResource.heartbeat(execution_id) sends an empty body and does not include worker_id via the X-Worker-Id request header that the server reads from. Holding the signature change pending technical review of the deprecation cadence.

Copy link
Copy Markdown
Collaborator Author

@mikemolinet mikemolinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTO review (CC-cue-pm)

Reviewed the full diff. Substantively in good shape — server-side filtering is correctly chosen, the fire/claim/list_claimable surface mirrors @cueapi/mcp 0.3.0/0.4.0, and the heartbeat-fix-held-for-follow-up call is the right discipline given the X-Worker-Id signature change. Test coverage is solid (12 new tests, MagicMock pattern consistent with the existing executions test file).

Leaving formal approval to Govind / maintainer per the agent-self-approval rule (cross-agent / cross-author approval semantics are untested in this repo). Comments below are review notes, not blockers unless flagged.

What I checked + liked

  • Server-side task / agent query params on list_claimable — correct, avoids the LIMIT 50 starvation bug from the 2026-04-25 prod incident. Test test_list_claimable_passes_task_as_query_param pins this.
  • merge_strategy omitted-when-not-passed test (test_fire_omits_merge_strategy_when_not_passed) is exactly the right contract pin. Server's Pydantic default of \"merge\" should win when caller doesn't specify; client never silently overrides.
  • payload_override semantics — docstring correctly notes "persisted on the resulting execution row, never on the cue itself." Matches the server contract (relevant to the open #577 GET filter behavior on the hosted side).
  • claim_next fan-out branch — well-documented, handles empty list cleanly (no_executions_for_task shape).
  • CHANGELOG drift fix (__init__.py was 0.1.2, pyproject.toml was 0.1.3 — both now 0.2.0) — good cleanup.

Notes / suggestions (non-blocking unless called out)

  1. claim_next no-task branch shape divergence — worth a docstring line. When task=None, the SDK forwards whatever the server returns on POST /v1/executions/claim. When task=<x> and the listing is empty, the SDK invents {\"claimed\": False, \"reason\": \"no_executions_for_task\", \"task\": <x>}. Two slightly different shapes for "no claim available." Not wrong, but worth one line in the docstring noting that the SDK augments the empty-task case with reason for caller diagnostics, and that the server-direct path may return a different no-claim shape. Lets future callers write a single conditional cleanly.

  2. claim_next fan-out race + 409. Docstring says "caller retries" on 409. That's the right SDK posture (don't bury retries). But there's no test that simulates a 409 propagating through claim_next. Suggest one extra test where the inner _post raises (or returns a non-claimed shape if your error model is non-exception); pins the contract that the SDK doesn't swallow it. Nice-to-have, not a blocker.

  3. merge_strategy: Optional[str] — consider Literal[\"merge\", \"replace\"]. Cheap type-safety win; matches the server's enum surface. Won't break callers since str is a supertype.

  4. list_claimable returns dict (with \"executions\" key) rather than the list directly. Consistent with raw API shape, and claim_next uses .get(\"executions\") or [] defensively. Defensible. If future SDK passes ever introduce typed Execution models, this is the surface to revisit. Flagging for awareness, not asking for change.

  5. Heartbeat follow-up — agreed call. Holding the worker_id-via-X-Worker-Id signature change pending technical-direction review of deprecation cadence is correct. Worth opening a tracking issue against this repo with the deprecation plan (kw-only worker_id with a one-version DeprecationWarning?) so it doesn't fall off.

  6. CHANGELOG date says 2026-05-01, PR opened 2026-05-02. Trivial; bump on merge if the maintainer cares.

Parity context

This is the cueapi-python side of the 3-layer parity discipline (Layer 1 = hosted PR template at cueapi#615, Layer 2 = parity-manifest seeds at cueapi-python#24 / cueapi-cli#25, Layer 3 = Backlog rows on the project tracker). If this merges before #24, the manifest baseline at #24 will need a quick refresh to reflect that fire/list_claimable/claim/claim_next are now covered — easier to do as a follow-up commit on #24 than to block here.

Bottom line

Code looks good. Test coverage is appropriate. No blockers from CTO-review lens. Formal approval deferred to Govind/maintainer.

— CC-cue-pm

@mikemolinet
Copy link
Copy Markdown
Collaborator Author

Cross-agent review (cueapi-secondary, cueapi engineering)

Looks good to me. Detailed findings below; leaving formal approval to Govind/maintainer per the cross-agent self-approval pattern (validated by cue-mac-app on cueapi #592/#593).

The implementation is clean, tests are tight, and the design choice to fan out claim_next(task=…) client-side rather than push a server-side filter is the right v1 call given the existing POST /claim endpoint. Server endpoint confirmed alive at app/routers/executions.py:229 — won't 404 in production.


CI status — pre-existing flake, not a regression

The two test job failures are all pre-existing staging-secret access issues on tests/test_cues.py, not introduced by this PR:

6 failed, 43 passed, 7 errors
FAILED tests/test_cues.py::TestCueCreate::* — cueapi.exceptions.AuthenticationError: Invalid API key
ERROR  tests/test_cues.py::TestCueList::* — cueapi.exceptions.AuthenticationError: Invalid API key
…

None of those test files are in this diff. The 12 new mock-based unit tests in test_cues_resource.py and test_executions_resource.py (TestFire, TestListClaimable, TestClaim, TestClaimNext) all pass. The PR description already noted the staging-secret rebase intent; that part of the rebase didn't fully land and the staging-cred flake is still present, but it's the same flake mergeable: MERGEABLE and sdk-integration: SUCCESS runs around it.

Suggest tracking the staging-cred fix as a separate issue rather than blocking this PR on it.


Findings

1. claim_next empty-queue handling is asymmetric across branches — HIGH (behavioral)

# With task, empty queue: synthetic return value
return {"claimed": False, "reason": "no_executions_for_task", "task": task}

# Without task, empty queue: server returns 409 → SDK raises CueAPIError
return self._client._post("/v1/executions/claim", json={"worker_id": worker_id})

Server returns 409 {error: {code: "claim_failed", …}} on empty queue (confirmed at app/routers/executions.py:289-303). Caller code can't be polymorphic — they have to know which branch they took to either inspect result["claimed"] or wrap in try/except CueAPIError.

Suggested: in the no-task branch, catch CueAPIError with code == "claim_failed" and return {"claimed": False, "reason": "no_executions"} for symmetry. Keeps the surface area uniform: callers always check result["claimed"], never need to try/except for normal "queue is empty" flow.

2. Race window in claim_next(task=…) documented in PR but not in docstring or code — HIGH (doc)

PR description: "Tiny race window between list and claim is bounded by the atomic claim returning 409, in which case the caller retries."

That contract is not in the SDK:

  • The docstring says nothing about 409 retry semantics for the task branch
  • claim_next doesn't catch + retry internally
  • A caller reading just the SDK will not know they need to wrap in retry

Suggested: add to the docstring, e.g. "When task is set and another worker claims the listed execution between list and claim, raises CueAPIError(code='claim_failed') — callers should retry." Or, since this is the same shape as the server's empty-queue response, consider catching internally and re-listing once before bubbling up. Either is fine; documenting the current behavior is the minimum.

3. claim_next accepts task but not agent, despite list_claimable accepting both — MEDIUM (design)

list_claimable(task=None, agent=None) is symmetric. claim_next(worker_id=…, task=None) only forwards task. Single-purpose workers that filter by both will not have an idiomatic claim_next path.

Suggested: either add agent=None to claim_next and forward both filters in the fan-out branch, or drop agent from list_claimable until there's a use case for it. Asymmetric defaults across the SDK surface read like an oversight.

4. Heartbeat follow-up has correctness implication — track concretely — NOTE (not blocking)

PR description notes executions.heartbeat(execution_id) sends an empty body and doesn't include X-Worker-Id. Looking at server (app/routers/executions.py:332-338): if x_worker_id is provided AND execution.claimed_by_worker differs, the heartbeat returns 403. Without the header, the ownership enforcement is silently bypassed (the request still 200s).

Holding the signature change pending technical review is reasonable; just suggest opening a tracking issue with a concrete cadence (e.g. "ship in 0.3.0") so this doesn't sit indefinitely. Not blocking #23.

5. worker_id not validated client-side — LOW (polish)

claim(execution_id, worker_id="") and claim_next(worker_id="") would send {"worker_id": ""}. Server should reject via Pydantic, but a client-side if not worker_id: raise ValueError(...) would fail faster with a clearer error. Especially helpful since worker_id is keyword-only and a caller could plausibly pass "" thinking it's optional.

6. fire / list_claimable / claim / claim_next return raw dict, not Pydantic models — NIT (follow-up)

Existing methods return Cue (resources/cues.py create/get/update). New methods return dict. Defensible — the response shapes for fire/claim/claim_next are simple and may evolve while the surface is young — but the inconsistency is worth a follow-up issue tracking response-model coverage. Future SDK consumers will want type hints on result.execution_id etc.

7. CHANGELOG date is 2026-05-01 — NIT

PR was opened 2 days ago; today is 2026-05-04. If the convention is "merge date," this needs updating at merge. If "intended release date," fine — but worth picking one.


What works well

  • Server-side filter via query params on list_claimable is the right call — explicit comment in the test file pinning the rationale (LIMIT 50 starvation bug from the 2026-04-25 prod incident) is exactly the kind of context that prevents future regressions.
  • test_fire_omits_merge_strategy_when_not_passed is a tight contract pin — exactly the right kind of test to keep server-side defaults working as the SDK evolves.
  • Keyword-only args (*, worker_id) on claim and claim_next prevent positional-arg drift across versions. Good defensive choice.
  • The PR description's "Pending follow-up" callout on heartbeat is the right pattern — flagging known gaps in the description rather than letting them slip silently.

cueapi-secondary (CueAPI engineering)

Copy link
Copy Markdown
Member

@govindkavaturi-art govindkavaturi-art left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid foundation: fire + list_claimable + claim + claim_next. The CHANGELOG "Pending follow-up" callout for heartbeat() missing worker_id is exactly the kind of debt-acknowledgment that prevents silent drift later. Approve.

Brings the Python SDK toward parity with the @cueapi/mcp 0.3.0 + 0.4.0
surface. Adds four new methods covering the SEND side (fire) and the
RECEIVE side (claimable list, claim, claim-next). Heartbeat signature
fix is held for a follow-up commit pending technical review.

CuesResource:
- fire(cue_id, payload_override=None, merge_strategy=None)
  POST /v1/cues/{id}/fire. For ad-hoc one-shot triggers and for using
  cues as a messaging channel between agents (carry message, instruction,
  task, reply_cue_id in payload_override).

ExecutionsResource:
- list_claimable(task=None, agent=None)
  GET /v1/executions/claimable?task=&agent=
  Filters server-side via query params (NOT client-side). Required for
  single-purpose workers; client-side filter after fetch hits the LIMIT
  50 starvation bug fixed in the 2026-04-25 prod incident.

- claim(execution_id, worker_id=...)
  POST /v1/executions/{id}/claim
  Atomic; returns 409 if already claimed.

- claim_next(worker_id=..., task=None)
  POST /v1/executions/claim (no task) OR list+claim chain (with task).
  Server's claim endpoint does not accept a task filter today; with task
  the SDK fans out (list_claimable filtered, pick oldest, claim by ID).
  Tiny race window between list and claim is bounded by the atomic claim
  returning 409, in which case the caller retries.

Version: 0.1.3 -> 0.2.0. Aligned cueapi/__init__.py (had drifted to
0.1.2) with pyproject.toml at the same time.

Tests: 42 unit tests pass (was 30; +12 net). Mirrors the existing
MagicMock pattern in tests/test_executions_resource.py.

Pending follow-up:
- ExecutionsResource.heartbeat(execution_id) currently sends an empty
  body and does not include worker_id via the X-Worker-Id request
  header that the server reads from. Worker-id is what the server uses
  to enforce ownership on heartbeat (returns 403 on mismatch); without
  it the race-protection check is silently bypassed. A signature change
  to add worker_id is held pending technical review of the deprecation
  cadence (additive kwarg-only with default-warn-on-omit vs hard
  signature change in a major bump).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@govindkavaturi-art govindkavaturi-art force-pushed the feat/fire-and-claim-methods branch from b2c61c3 to caf33a4 Compare May 4, 2026 18:14
@govindkavaturi-art govindkavaturi-art merged commit ffecf24 into main May 4, 2026
mikemolinet added a commit that referenced this pull request May 9, 2026
…t recent ports (#36)

Manifest was 3 days stale; many endpoints listed as missing have
been ported since the last audit.

Moved from endpoints_missing → endpoints_covered (with PR refs):

  - POST /v1/cues/{id}/fire (PR #23; in-flight kwargs in #33)
  - POST /v1/executions/{id}/replay (PR #25)
  - GET /v1/executions/claimable (PR #23)
  - POST /v1/executions/{id}/claim (PR #23)
  - POST /v1/executions/claim (PR #23)
  - GET /v1/workers + DELETE /v1/workers/{id} (PR #26)
  - GET /v1/usage (PR #26)
  - POST /v1/agents + GET/PATCH/DELETE /v1/agents/{ref}
    + GET /v1/agents/{ref}/webhook-secret
    + GET /v1/agents/{ref}/inbox + /sent (PR #27)
  - POST /v1/messages + GET/read/ack (PR #28)

Added in-flight refs (open PRs):

  - GET /v1/agents/roster (in-flight PR #35; cueapi #630 parity)
  - GET /v1/agents/{ref}/presence (in-flight PR #35; cueapi #662 parity)
  - send_at + exit_criteria + idempotency_key kwargs on fire (PR #33)
  - send_at kwarg on messages.send (PR #34)

New endpoints_missing items (post-audit):

  - POST /v1/agents/{ref}/webhook-secret/regenerate (destructive; tracked)
  - DELETE /v1/messages bulk (cueapi #650; bounded by cueapi-cli upstream)
  - POST /v1/executions/{id}/live-claim (cueapi #664; handler-runtime, not SDK)

New "in_flight_ports_2026_05_07" section listing all 4 currently-open
SDK PRs with PR-overlap notes (PR #30/#33 lane-flagged with cueapi-main).

Bumped sdk_version_at_audit 0.1.3 → 0.2.x.

This refresh closes the Backlog row "Refresh cueapi-python parity-manifest.json"
filed earlier today (Self-flag 2026-05-07).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants