Skip to content

[ci-scan] Improve ci-failure-scan emission rules and add daily feedback loop with KPI tracker#128440

Draft
kotlarmilos wants to merge 11 commits into
mainfrom
kotlarmilos/ci-scan-noise-reduction
Draft

[ci-scan] Improve ci-failure-scan emission rules and add daily feedback loop with KPI tracker#128440
kotlarmilos wants to merge 11 commits into
mainfrom
kotlarmilos/ci-scan-noise-reduction

Conversation

@kotlarmilos
Copy link
Copy Markdown
Member

@kotlarmilos kotlarmilos commented May 21, 2026

Description

Two changes to the ci-failure-scan agentic workflow that converts dnceng-public outer-loop pipeline failures into Known Build Error issues and test-disable PRs:

  1. Tightens emission rules in the existing scanner: stricter stable-signature requirement and clearer skip cases so issues are only filed for failures actually reproducible in the scanned window.
  2. Adds a feedback footer to every emitted KBE issue and test-disable PR pointing readers at the workflow file and the new feedback channel. Maintainers can comment on any noisy issue/PR to push back inline; the next feedback tick reads those comments and proposes prompt edits.

Impact

  • Fewer false-positive KBEs from the scanner: stricter emission rules
  • Maintainers have a clear inline feedback channel from inside every emitted artifact
  • The feedback loop is self-improving: daily tick reads recent issues/PRs and pushes prompt edits to a single draft PR
  • Noise reduction is measurable: KPI tracker reports issues filed vs closed, % closed as not_planned, median and p90 time-to-close, and weekly trend charts over a running window since the scanner was established
  • All writes via gh-aw safe-outputs, no issues: write or contents: write permissions
  • No runs on personal forks
kotlarmilos and others added 2 commits May 21, 2026 12:19
…itted templates

- Tighter stable-signature requirements and clearer skip cases to reduce
  false-positive KBEs.
- Footer added to all three emitted templates (KBE literal, KBE regex,
  test-disable PR) pointing maintainers at the new ci-failure-scan-feedback
  workflow. Placed below the JSON fence so Build Analysis's
  single-fenced-block check still passes.
- Intro pointer in the workflow header.
- Fork guard so the scanner only runs on dotnet/runtime.
- Lock file: re-applies the manual pat_pool patch on the detection job.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sister workflow to ci-failure-scan that closes the loop:

- Runs every 3 days; fork-guarded.
- Reads the last 10 scanner runs (deep-dives the latest two) and gathers
  in-scope feedback: items labelled agentic-workflows OR with [ci-scan] in
  the title, restricted to min-integrity: approved.
- Scores recent emissions against a rubric (title scoping, classification,
  JSON validity, signature specificity, log cross-check, maintainer
  feedback).
- Translates findings into proposed prompt edits scoped to
  ci-failure-scan.md only.
- Single open draft PR at a time: pushes to the existing [ci-scan-feedback]
  PR when one exists, otherwise opens a new one.
- Maintains a pinned [ci-scan-feedback] KPI Tracker issue regenerated each
  tick. Running window starts at the scanner workflow's created_at.
  Metrics: issues filed/open/closed, closed last 7d/30d, median + p90
  time-to-close, % closed as not_planned (false-positive proxy), top 3
  pipelines, PR filed/merged/closed-unmerged. Charts: weekly filed-vs-closed
  and weekly median time-to-close as mermaid xychart-beta over the last 12
  weeks, plus raw weekly buckets in a collapsed details block.
- All writes via gh-aw safe-outputs; no issues: write / contents: write.
- Lock file: pat_pool patched onto the detection job.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

@kotlarmilos kotlarmilos changed the title Reduce ci-failure-scan noise and add feedback loop [ci-scan] Reduce ci-failure-scan noise and add feedback loop May 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the gh-aw driven ci-failure-scan workflow prompt to reduce false-positive emissions (tighter stability rules, explicit skip cases, same-run dedup, follow-up-build gate) and adds a companion ci-failure-scan-feedback workflow that periodically aggregates maintainer feedback and proposes prompt edits via a single draft PR and a regenerated KPI tracker issue.

Changes:

  • Add repo guard conditions and additional emission/skip gates to reduce ci-failure-scan noise (follow-up build presence, same-run dedup cache, expanded KBE/tracker/fix PR search heuristics).
  • Append a standard “how to provide feedback” footer to emitted KBE / PR templates and add match-verification guidance tied to a persisted failure log.
  • Introduce a new scheduled feedback workflow (and compiled lock file) to harvest in-scope feedback and generate prompt-edit PRs plus a KPI tracker snapshot.
Show a summary per file
File Description
.github/workflows/ci-failure-scan.md Tightens scan/emit rules, adds follow-up/dedup gates, and adds feedback footers to templates.
.github/workflows/ci-failure-scan.lock.yml Regenerated compiled workflow reflecting the new prompt and repo guard conditions.
.github/workflows/ci-failure-scan-feedback.md New gh-aw workflow prompt for periodic feedback ingestion, KPI tracking, and prompt-edit PR proposals.
.github/workflows/ci-failure-scan-feedback.lock.yml Compiled lock workflow for the new feedback workflow.

Copilot's findings

  • Files reviewed: 3/4 changed files
  • Comments generated: 6
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan-feedback.md
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
@github-actions

This comment has been minimized.

Tighter feedback loop. Schedule changes from every 3d to fuzzy 'daily'
(compiled to 03:39 UTC, stable via --schedule-seed dotnet/runtime).

Lock files: re-applies the manual pat_pool patch on both detection jobs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kotlarmilos kotlarmilos changed the title [ci-scan] Reduce ci-failure-scan noise and add feedback loop Tighten ci-failure-scan emission rules; add daily feedback loop with KPI tracker May 21, 2026
@github-actions

This comment has been minimized.

@kotlarmilos kotlarmilos changed the title Tighten ci-failure-scan emission rules; add daily feedback loop with KPI tracker [ci-scan] Improve ci-failure-scan emission rules and add daily feedback loop with KPI tracker May 21, 2026
Two fixes driven by the first scanner run on this branch:

- Step 4.4 (existing test-disable PR dedup): also search the test-name
  stem after stripping verb prefixes and platform suffixes. PR titles
  often abbreviate (e.g. DnsGetHostEntry_X -> X). Without the fallback,
  the scanner filed #128442 as a duplicate of #128425.

- KBE check #6 and bad-vs-good table: reject array-form signatures whose
  second element is a generic xunit assertion stem ('Assert.Equal()
  Failure: Values differ', 'Assert.True() Failure', etc). Require the
  unique 'Expected:'/'Actual:' value lines or the actual exception type
  + message. #128444 (now closed) emitted such a weak signature; Build
  Analysis would have mismatched it against unrelated failures.

Lock file: re-applies the manual pat_pool patch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 10:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/4 changed files
  • Comments generated: 3
Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
@github-actions

This comment has been minimized.

…elines

72h was blanket-suppressing actionable failures on the 12 JIT-stress
family pipelines (defs 109, 110, 111, 112, 115, 116, 118, 138, 140, 150,
153, 155, 159, 235) which run on a weekly-or-longer cadence by design.

Iteration 2 of the scanner on this branch surfaced this: 12 pipelines
skipped with 'stale build window (>72h)' on their normal cadence.
Extending to 14d accommodates the cadence; the 7-day 'no qualifying
build' rule still catches genuine inactivity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

The previous guard 'event_name == workflow_dispatch || repository ==
dotnet/runtime' had a flaw: workflow_dispatch from a fork bypasses the
repo check because the OR short-circuits as soon as the event matches.

David Hartglass observed the scanner running on dhartglassMSFT/runtime
twice a day (schedule trigger). pre_activation runs there with the
current guard because the fork inherits the old version of the
condition. Even with the corrected guard, the issue would persist on
forks via the UI 'Run workflow' button which dispatches with the same
event_name.

Replace with the single condition 'repository == dotnet/runtime'. Both
schedule and workflow_dispatch on a fork now skip every job. The
activation job's compound condition (activated && (repository ==
dotnet/runtime)) gets simplified by the compiler to the same effective
guard, plus the pre_activation skip propagates to every downstream job
via needs.

Both lock files: re-applies the manual pat_pool patch on the detection
job (gh aw compile strips it on every recompile).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 11:27
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/4 changed files
  • Comments generated: 2
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
…orce KBE match verification

Two failures observed on #128446 (Interpreter assert in CompressDebugInfo):

1. The agent never found in-flight fix PR #128428. Step 4.5 only
   searched by test-name / test-file-path / assembly. A JIT assert
   has none of those, so the search returned nothing and the agent
   filed a sibling KBE instead of skipping.

   Step 4.5 now adds three queries for failures without a test
   workitem: by C/C++ source-symbol (CompressDebugInfo, iOffsetMapping),
   by 6-12 word literal slice of the assertion text, and by
   'Fixes #<tracker>' once Step 4.3 linked a tracker.

2. The KBE matcher came up 0/0/0 in Build Analysis. Per the prompt
   the agent should grep-verify each signature element against
   KBE has that marker; the verification is being skipped uniformly.

   Step 7 is now a hard gate: emit the KBE only if the match-count
   marker is present and positive count >= 1. Records 'skipped:
   signature did not match failure.log' otherwise. Adds explicit
   guidance for JIT/runtime/build-level asserts: prefer Template C
   array form pairing source-symbol with assertion text (two anchors
   double match probability), and skip emission if neither greps
   positive (native asserts often don't appear in the xunit log
   Build Analysis indexes — rely on tracker + test-disable PR in
   that case).

Both fixes preserve the existing model that a plain area-team
tracker is NOT a KBE substitute (sibling KBE still required when no
fix PR is in flight).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

Resolves the actionable inline comments on #128440:

ci-failure-scan.md
- Footer at the bottom of all three emitted templates (KBE literal,
  KBE regex, test-disable PR) now uses absolute https://github.com/...
  URLs. Relative .github/workflows/... links 404 when clicked from a
  filed issue or PR.
- Step 2 says 'fetch at least 25 builds' but the Data Sources example
  showed %24top=20. Updated the example to %24top=25 for consistency.
- Step 3's note 'Step 4.0 and KBE check 7 read it back' was inaccurate
  (Step 4.0 does not reference failure.log). Reworded to 'KBE check 7
  greps it for the verbatim signature'.

ci-failure-scan-feedback.md
- Hard-rule clarification: 'min-integrity: approved' applies to reads
  of user-supplied content (issue/PR bodies and comments) which must
  go through the github MCP tool. gh is allowed for run-metadata and
  for enumerating the workflow's own [ci-scan-feedback] artifacts,
  but explicitly forbidden for substituting integrity-gated reads.
  Step 3's commands rewritten to use github MCP search_issues /
  search_pull_requests with updated:>=<today-30d> queries.
- Step 3 now covers PRs as well as issues (four search queries) and
  applies the documented 30-day window via updated:>= in the query.
- Step 4 rubric bullet escaped the literal '```json' fence with
  double backticks so it no longer breaks GFM list rendering.
- Step 7 dropped the 'pinned' adjective. The workflow cannot pin
  issues; reworded to note maintainers may pin manually if desired.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 13:55
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/4 changed files
  • Comments generated: 4
Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
… comments

ci-failure-scan.md
- Step 6 'Recognized values' list now includes the two new skip
  reasons introduced by KBE check 7 ('signature did not match
  failure.log (N=<count>)', 'native assert not in xunit log'). Added
  a clarifying sentence noting the list is non-exhaustive but new
  reasons should reuse one of these phrasings so the feedback
  workflow's tally aggregation stays stable.
- 'Emit each template verbatim except for <placeholder> slots' rule
  marker. Added an explicit placeholder line on all three KBE
  templates (literal, regex, array) so the agent doesn't have to
  choose between violating the verbatim rule and omitting the gate
  marker.

ci-failure-scan-feedback.md
- KPI window start source: previously used the workflow file's
  '.created_at' (file creation), which can predate any run. Changed
  to derive from the first recorded run via
  'gh api .../workflows/X.lock.yml/runs?per_page=1&order=asc'.
  Persists the resulting timestamp as
  and prefers that cached value on subsequent ticks (read via the
  github MCP tool, per the existing hard rule).
- top_pipelines metric previously said 'top 3 definition_id mentions
  in issue bodies' which contradicts the hard rule that body reads
  go through the github MCP tool. Reworded the metric to require
  per-item body fetches via 'issue_read get'/'pull_request_read get',
  count [Filtered] separately as integrity_filtered_pipelines, and
  exclude filtered items from the top 3.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

…add unknown-skip-reason rubric

Addresses three findings from PR #128440's latest code review:

- Step 2 'gh run view --log | grep -A 200' could surface arbitrary
  trailing agent log content that quotes maintainer-supplied
  comments, an indirect integrity-gate bypass. Replaced with an awk
  block that extracts only the tally table (header + body rows,
  terminated by the first non-pipe line).

- Step 4 rubric now scores tally rows whose 'skipped:' reason isn't
  in the Step 6 'Recognized values' list as 'unknown-skip-reason:
  <verbatim string>'. The recognized-values list is the source of
  truth; the feedback PR should propose adding new reasons there
  before they appear in tallies. Closes the loop on the advisory-
  only nature of the skip-reason catalogue.

- KPI tracker 'Raw weekly buckets' details block previously had no
  row cap and would grow unbounded as the scanner ran. Constrained
  to the same 12-week window the mermaid charts use; older buckets
  are dropped on each regeneration (tracker body is a current
  snapshot, not a permanent ledger).

The fourth finding ('schedule: daily syntax') was a false alarm —
gh-aw v0.71.5 compiles it correctly to cron: '39 3 * * *'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 14:23
@github-actions
Copy link
Copy Markdown
Contributor

Note

This review was generated by GitHub Copilot.

🤖 Copilot Code Review — PR #128440

Holistic Assessment

What this does: Adds a follow-up build gate (Step 3.5), same-run dedup cache (Step 4.0), mandatory failure.log grep verification (hardened KBE check 7), expanded skip-reason vocabulary, and a feedback footer on all emitted artifacts. A new companion workflow (ci-failure-scan-feedback.md) reads maintainer feedback daily and proposes prompt edits via draft PR, plus maintains a KPI tracker issue.

Motivation: Justified — the scanner has been filing noisy/false-positive KBEs. Structural tightening + a self-improving feedback loop is the right approach.

Approach: Sound — incremental prompt refinements rather than a rewrite, constrained companion workflow with strict safe-outputs caps.

Verdict: ✅ Approve (workflow prompt changes only — no runtime code affected)


Findings

⚠️ Warning: Integrity boundary gap in feedback workflow Step 2

ci-failure-scan-feedback.md Step 2 runs gh run view <run-id> --log and pipes through awk. The hard-rules section explicitly prohibits reading maintainer-supplied content via gh, yet the scanner agent's log may quote issue bodies/comments it read during its own run. The awk filter (/^\| pipeline \|/) is narrow, but if the tally table itself embeds user-supplied content (e.g., quoted signatures from issue titles), it leaks through the integrity gate. Consider adding a hard-rule clarification that the extracted tally lines must not be parsed for content beyond their structured columns.

⚠️ Warning: Step 3.5 follow-up gate may over-suppress on slow-cadence pipelines

Step 2 skip reason says: "No follow_up (source is the absolute latest) → pipeline-skipped: no follow-up build yet — defer to next run". For weekly JIT-stress pipelines (defs 109–160), if a failure appears in the latest build, there may not be a newer build for 7+ days. This means the scanner will defer for an entire week (or more) before filing. The PR description mentions the 14-day window accommodates these, but the follow-up gate (not the staleness window) is the bottleneck. Consider an escape hatch: if source is older than N days with no follow-up, treat as stable.

💡 Suggestion: ci-scan-match-count marker is outside the JSON fence

The <!-- ci-scan-match-count: N hits in failure.log --> marker is placed after the closing ``` of the JSON block. Build Analysis presumably ignores HTML comments, but this is implicit. If Build Analysis ever changes its body parsing to be stricter, the marker could interfere. Document explicitly that the marker is a sentinel for the feedback workflow's rubric and is invisible to Build Analysis.

💡 Suggestion: Feedback workflow KPI tracker — gh search issues vs MCP tool consistency

Step 7 uses gh search issues to discover the tracker and collect the universe of [ci-scan] items, then switches to the github MCP tool for body reads. The hard-rules section says gh is allowed for "enumerating this workflow's own artifacts" but the broader search for ALL [ci-scan] items in Step 7 goes beyond that narrow scope. This is probably fine since it only reads metadata (numbers, URLs), but the distinction could confuse future editors. Consider a one-line clarification in the hard-rules.

💡 Suggestion: Dedup cache race with parallel signature processing

Step 4.0's dedup cache uses a plain TSV file with grep -Fxq for lookup and tee -a for append. If the agent processes signatures sequentially (as the prompt implies), this works. But if a future edit allows parallel processing (or the agent batches API calls), the file-based cache has a TOCTOU race. Not a current bug, but worth a one-line comment: "Sequential processing assumed; do not parallelize signature walks without replacing the TSV cache."


Summary

No blocking issues. The emission tightening (follow-up gate, mandatory failure.log verification, expanded dedup) should meaningfully reduce false positives. The feedback workflow is well-scoped with appropriate safe-outputs caps. The warnings above are edge cases worth addressing but none are merge-blocking for a draft PR targeting workflow prompts.

Generated by Code Review for issue #128440 · ● 3.4M ·

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/4 changed files
  • Comments generated: 3
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
Comment thread .github/workflows/ci-failure-scan-feedback.md Outdated
…nk tracker workflow

Addresses three correct review findings:

- awk filter printed the first non-pipe line (the terminator) before
  exiting, defeating the purpose of restricting tally extraction to
  the pipe-table block. Restructured to only print lines starting
  with '|' and exit on the first non-pipe line after flag is set.

- The GitHub REST /runs endpoint does not accept order=asc; the
  previous command silently returned the most recent run, not the
  first. Switched to gh api --paginate (per_page=100) + jq sort_by
  + min(created_at), and added an explicit warning so future edits
  don't reintroduce the bug.

- '[ci-failure-scan-feedback.lock.yml]' in the tracker body template
  was reference-style with no defined target, so it rendered as
  literal text. Made it an explicit URL to the workflow file on
  main.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants