Sentinel

You don't know what your AI image pipeline ships.
Sentinel does.

Sentinel scores every input and every output image, so you can see what you are shipping and fix it at scale.

The problem

You're generating at scale.
Nobody's checking the output.

When AI pipelines run at volume, defects multiply. No team reviews 10,000 images a day. Without automated evaluation, broken images reach your users.

Invisible

You can't see what's broken

Face distortions, wrong backgrounds, skin-tone issues, logo placement errors. At API scale, defects stay invisible until a user complains.

Manual

Human review doesn't scale

Manual QA works for dozens of images. Not for thousands. The moment you scale, quality control becomes the bottleneck, or disappears.

Costly

Bad outputs cost real money

Every image that fails QA after delivery is a refund, a complaint, or a churn event. The cost does not show in COGS. It shows in your NRR.

Where it starts

Swap your API key.
See what you ship.

Point your key at any of the 100+ models, every Solution API, or your own custom workflows. Sentinel scores quality on all of it, so for the first time you see what you actually ship.

google / nano-banana-2

Last 30 days

Output qualitySentinel

Evaluations

2,924

last 30 days

The total number of generations scored by Sentinel in this period.

Pass rate

89.53%

of all evaluations

The percentage of evaluations where all judges passed. Higher is better.

Avg score

0.91

out of 1.00

The average weighted score from 0 to 1 across all evaluations. Each judge contributes based on its configured weight.

Gate failures

0.7% of evaluations

Evaluations terminated early by a pre-check, for example no face detected or not-safe-for-work content. These never reached the full judge panel.

Failed judges

1,133

0.4 per evaluation

The total individual judge failures across all evaluations. One generation can fail multiple judges.

Regenerate rate

9.68%

lower is better

The percentage of evaluations where Sentinel recommended regeneration. Lower is better.

Success rate by categoryclick to inspect

1,133failed judges

Input scoring

Catches a weak input before you spend a GPU second on it.

Output scoring

Every result graded against the checks that matter for its use case.

Trace on every call

One ID ties the input, the score, and the retry together.

How it works

Show your customers the quality of every image you ship.

Embed Sentinel in every image you deliver. Each one carries a quality score that tells your customer what came out right, and what would need one more edit.

FabelStudio

Runflow scores every image in the workflow. With strict gates on, a run halts the moment one image fails its checks, so you do not burn credits generating on top of a bad result.

Share ExportRG

Assets10

Upload Generate

Compare

Versions

Original

Custom

New recipe

Build another chain from scratch

Marketplace packages

Zalando bundle

Tag removal, mannequin, grey backdrop

Magic

Pinpoint a change

Click a spot, describe the change

Restyle with a reference

Brush an area, drop a reference

Add a logo

Place a brand logo with one instruction

Drop into a new scene

Place the subject anywhere

Compose

Cut out on white

Clean catalog composition on white

Smart resize

Recompose to any ratio, 1K to 4K

Reframe to a new ratio

Extend the canvas, keep the subject

How it works

Create custom workflows that scale with Sentinel.

Score the input and the output to deliver at scale, the way BetterPic does.

Step 02 below is Sentinel on the input, step 05 is Sentinel on the output.

How BetterPic delivers 36M headshots without a single human intervention

8selfies

Step 01 · You

User uploads.

Eight selfies to start. The customer sends their source photos.

8 source selfies

YouRunflow

35M+

Headshots scored

Candidates generated, best ones kept

Manual QA reviewers

100K+

Jobs scored per month

Read the BetterPic case study

Evaluation layers

Built for your use case,
not the generic one.

Sentinel evaluates at three levels, from universal quality checks to custom business rules you define. Every layer is configurable.

Layer 01

Generic quality

Universal checks that apply to any AI image generation. Runs automatically on every pipeline, zero configuration.

→Prompt-to-image alignment

→Artifact detection

→Composition scoring

→Sharpness & exposure

Most used

Layer 02

Use-case specific

Pre-tuned criteria for production pipelines. Headshots, fashion, on-model. Each one with its own quality standard.

→Face fidelity & likeness

→Garment & logo accuracy

→Background consistency

→Expression & skin tone

Layer 03

Your custom rules

Define what matters to your business. Sentinel checks your edge cases on every single run, at any volume.

→Custom quality schema

→Business-specific criteria

→Threshold configuration

→Team-shareable templates

Developer-first

Send a generated image. Get a verdict back.

One POST to the Sentinel API. You send the image, the task, and what good looks like. Sentinel returns a verdict you can branch on, no quality model of your own to build.

One endpoint

POST /api/v1/evaluate with the image URL, the task type, and a description of what it should be. Authenticate with an x-api-key header.

A verdict you can branch on

Every evaluation returns pass, soft_fail, or hard_fail, plus a weighted pass rate to threshold on yourself.

Tuned to your task

Pass reference images and evaluation instructions so Sentinel scores against your use case, not a generic one.

POST sentinel.runflow.io/api/v1/evaluate

// request

{
  "generated_image_url": "https://cdn.example.com/headshot_9a8b.jpg",
  "task_type": "headshot",
  "task_description": "Professional headshot, neutral background",
  "evaluation_instructions": "Skin natural, identity matches the reference",
  "reference_images": [{ "url": "...selfie.jpg", "role": "identity" }]
}

// poll the result

{
  "eval_id": "ev_3f2d8a91",
  "status": "completed",
  "verdict": "pass",
  "overall_passed": true,
  "weighted_pass_rate": 0.94
}

More from Runflow

ComfyUI DeployOne-click deploy any ComfyUI workflow as a production API with Sentinel built in.Learn more Solution APIs17 production-ready endpoints for headshots, background removal, try-on, and more.Learn more Image Quality ScorerFree tool to score your AI-generated images across multiple quality dimensions.Learn more

Stop shipping bad images.

Quality control is not optional at scale. It is what separates an AI product from a reliable one.

Start free

Create a free account

Add your API key and get your first quality insights on everything you ship. No call needed.

→Swap your key, see what you ship
→Input and output scoring on every call
→Free to start

Create a free account

Custom workflow

Build with Sentinel

Want Sentinel wired into a custom workflow, with input gates and automated next steps? We build that with you.

→A custom evaluation schema for your use case
→Input gates and automated next steps
→Built hands-on with our team

Talk to a founder

See quality scoring in action

Try the free fashion product scorer. Upload a garment image, get an instant AI readiness score.

Score my product

You don't know what your AI image pipeline ships.Sentinel does.

You're generating at scale.Nobody's checking the output.

You can't see what's broken

Human review doesn't scale

Bad outputs cost real money

Swap your API key.See what you ship.

Show your customers the quality of every image you ship.

Create custom workflows that scale with Sentinel.

Built for your use case,not the generic one.

Generic quality

Use-case specific

Your custom rules

Send a generated image. Get a verdict back.

More from Runflow

Stop shipping bad images.

Create a free account

Build with Sentinel

You don't know what your AI image pipeline ships.
Sentinel does.

You're generating at scale.
Nobody's checking the output.

Swap your API key.
See what you ship.

Built for your use case,
not the generic one.