feat: Content Provenance experiment (C2PA 2.3 §A.7 text authentication)#294
feat: Content Provenance experiment (C2PA 2.3 §A.7 text authentication)#294erik-sv wants to merge 1 commit into
Conversation
5aadfe6 to
6f950d1
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #294 +/- ##
=============================================
+ Coverage 68.52% 71.81% +3.28%
- Complexity 958 1398 +440
=============================================
Files 60 77 +17
Lines 4861 6744 +1883
=============================================
+ Hits 3331 4843 +1512
- Misses 1530 1901 +371
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
235f898 to
83e9dea
Compare
|
Plugin Check failure appears to be pre-existing on trunk, happy to investigate if needed. |
|
@erik-sv mind updating to branch from Separately, @dkotter and I have been exploring this sort of work for some time (see 10up/classifai#652) and am curious how your work here might overlap/relate to media content (and whether you'd consider also helping on that front in this plugin)? |
Hi @jeffpaul, just moved to the develop branch (thanks for the tips, James LePage just directed me here so I'm new). I'm the co-chair for C2PA's text task force and wrote their spec here so I am very familiar with content provenance technology. Also the CEO of Encypher . Happy to help integration efforts. I actually have two other PRs for this repo that I have in mind but I didn't want to overwhelm you all with code. Happy to put them up for your review:
Let me know if you have any feedback on this PR or would like me to submit the other two PRs. In regards to the 10up repo, we have developed ways to do exactly what you require for images and text content. One caveat is that to display the CR logo overlay, you need to go through the C2PA compliance program. |
I'll defer to @dkotter for code review on this PR, once you pull it out of Draft state. Otherwise, additional PRs would be amazing, thanks!
Is that required per site leveraging this WordPress AI plugin or could "we" (either the WordPress AI team, or the WordPress.org project itself) go through that on behalf of every WordPress site leveraging this plugin? |
Interested in this one as well. If the project were to get closer to the protocol, it feels like an audited plugin could work for the universe of sites that the CMS enables. |
|
@jeffpaul @Jameswlepage Great questions, let me break this down into the two separate pieces: conformance and signing identity/trust. To directly answer your question @jeffpaul yes, the WordPress AI team or WordPress.org project can absolutely go through conformance and serve as the signing identity on behalf of every WordPress site. That's the lowest-friction path and follows the same model as Adobe, Microsoft, and the camera manufacturers. The BYOK option remains available for users who need their own organizational identity on the manifest, and you can do both: Conformance ProgramThe C2PA conformance program operates at the implementation level, not per-site. So the WordPress AI plugin (or the WordPress.org project itself) would go through conformance once on behalf of every site using the plugin. Happy to help with that process once the implementation is substantially complete. Signing Identity & TrustThis is the more interesting question. For signatures to show as trusted in C2PA-aware applications (browsers, social platforms, search engines), the signing certificate needs to chain to the C2PA Trust List. There are a few options here, and they're not mutually exclusive: Option 1: WordPress as the signing identity (recommended starting point)WordPress operates a centralized signing service and holds a trusted certificate, similar to how Adobe signs content from Photoshop and camera manufacturers (Nikon, Sony, Leica) sign photos under their brand. Every site using the plugin would sign through this service via the Connected tier already in this PR.
Option 2: Publisher BYOK (organizational identity)Individual users obtain their own certificate from a CA on the trust list and configure it via the BYOK tier in this PR. The manifest would say "published by XYZ Press" or "published by example.com."
Option 3: Hybrid (Option 1 + 2 recommended)WordPress serves as the default signing identity out of the box, while users who want organizational attribution can override with BYOK. This is probably the right long-term answer, it gives every WordPress site provenance by default while letting orgs that care about brand-level attestation bring their own identity. Let me know which direction feels right and I can adjust the implementation accordingly. |
|
Option 2 is almost certainly a non-starter for the majority (or at least statistically significant) of WordPress installs. Thus going with Option 3 to allow flexibility for sites, especially enterprise installs or publishers, to be able to use BYOK seems most optimal. |
|
@erik-sv any ETA on getting a PR out of draft and ready for review here? I'd love to see us get this stable and into the plugin before the WordPress 7.0 launch on April 9th, which would likely mean getting the PR ready for review/testing by sometime next week. |
b96f9e2 to
9758181
Compare
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
Both PRs are now rebased onto latest What changed in this rebase:
PR #302 (Image Provenance + CDN) builds on top of this PR with the same API port applied. Ready for @dkotter's code review — aiming to be stable well before the April 9th WP 7.0 window. |
Update: Native C2PA signing + ingredient chains + sidebar fixTwo commits pushed since the last update:
|
| Tier | Crypto | Output | Status |
|---|---|---|---|
| Local | EC P-256 + self-signed X.509 | JUMBF manifest store | ✅ |
| Connected | Encypher API (api.encypher.com) |
JUMBF via base64 response | ✅ |
| BYOK | Publisher-supplied key + CA cert | JUMBF manifest store | ✅ |
This PR is ready for review. @jeffpaul @Jameswlepage — would appreciate your eyes on this when you have a chance. Happy to walk through any part of the C2PA pipeline.
dkotter
left a comment
There was a problem hiding this comment.
There's a lot here so I'm sure I missed some things but have given this an initial review. Might have been better to break this down into multiple PRs (maybe one PR for each signing tier) to make it easier to review and test.
Because of the size, I did run things through Codex to give me an overview of how everything works. I don't know much about C2PA so this may not be accurate / matter but it did flag the following:
Verification does NOT check the COSE_Sign1 signature — only the content hash
The verify ability reports verified: true if the SHA-256 hash of the stripped text appears anywhere in the JUMBF bytes. It never validates the ECDSA signature, certificate chain, or even parses the CBOR structure. This means:
- An attacker can construct a fake JUMBF blob containing the hash without a valid signature
- strpos() on binary data can produce false positives
- The green "verified" badge is misleading — it only confirms content wasn't altered, not that it was signed by the claimed publisher
Recommendation: Either implement proper COSE_Sign1 signature verification (parse CBOR, reconstruct Sig_structure1, verify ECDSA against certificate) or clearly label verification as "hash integrity check only — not full C2PA verification." The current verified: true response is a false sense of security.
And also:
This is a custom C2PA approximation, not a full implementation
What's implemented:
- JUMBF container construction
- COSE_Sign1 structure with ES256 signing
- CBOR encoding
- C2PA §A.7 Unicode variation selector embedding
- Content hash assertion
What's missing:
- Signature verification (critical)
- CBOR decoding/parsing
- Manifest path resolution
- Ingredient chain traversal
- Trust list validation
- Hard binding support
The PR description references C2PA 2.3 spec compliance, but the implementation produces spec-formatted manifests without being able to fully verify them. This should be clearly documented.
| **Status:** Experiment | ||
| **Category:** Editor | ||
| **Requires:** WordPress 6.0+, PHP 7.4+ | ||
| **Version added:** 0.5.0 |
There was a problem hiding this comment.
I'd suggest just removing all of this as we don't document this for any other experiment
There was a problem hiding this comment.
This file doesn't exist in our branch, nothing to remove.
There was a problem hiding this comment.
I think having documentation for this experiment is correct, should ideally follow our documentation for other experiments though. Curious for the reasoning around removing this entirely?
|
|
||
| ## Setup | ||
|
|
||
| 1. Go to **Settings → AI Experiments** |
There was a problem hiding this comment.
| 1. Go to **Settings → AI Experiments** | |
| 1. Go to **Settings → AI** |
There was a problem hiding this comment.
Same, file doesn't exist in our branch.
|
Note that with #340 and #376 now merged, this settings page is rendered differently and as such, how these individual settings are set/rendered will need to change (apologies for the extra work there). #376 should provide an example of how this is now handled within the Content Classification experiment
I don't think the BYOK will necessarily work there but potential for the Connected option to live there, though I think the ideal is to have that support multiple signing services not just Encypher (which is a separate question we probably need to discuss) and that may complicate making this a Connector.
Also just need more spacing here, everything just runs together. And just my opinion but not sure we need that "Signed with local key" warning? Just seems like an attempt to get people to switch to the Connected approach and probably causes more confusion than anything. I'd recommend either removing that or putting that as a tooltip under an icon, like the warning icon.
Was going to flag this as well. In my testing I don't ever see anything output on the front-end. One other bug I noticed, if I have content that was published prior to activating this experiment, if I open that item in the editor but I don't save anything (say I just refresh), it seems the content automatically is signed, even though we claim that only happens on publish and updates. |
I do agree with most everything noted in these comments. I would love to see multiple, smaller PRs that make this much easier to review and test and feel confident about merging things in. Wondering if there's a way to easily do this? I don't want to cause undue effort at this point but it is more likely that this will get merged in if we can break it down into smaller pieces, maybe a PR for just local signing and other PRs for connected and BYOK signing? |
…idebar UX, provider generalization - Migrate settings from render_settings_fields() PHP/HTML to declarative get_settings_fields() for the DataForm architecture (WordPress#340/WordPress#376) - Register all 8 settings with show_in_rest and proper schemas - Replace static "Signed with local key" sidebar copy with dynamic tier-aware notice: "Will be signed with [tier] key" - Add auto-sign content hash guard to prevent phantom re-signing on editor refresh without content changes - Wire up Verification_Badge frontend rendering with CSS enqueue - Mask API key in REST responses; bypass filter for signing operations - Generalize Connected tier from Encypher-specific to provider-agnostic with KNOWN_PROVIDERS constant for maintained compatible provider list - Update default signing endpoint to https://api.encypher.com/v1/sign
|
Pushed changes addressing review feedback. Summary: Settings page migration (#340/#376 compatibility)
Sidebar UX
Provider generalization (per @jeffpaul's comment)
Auto-sign bug fix (per @dkotter's comment)
Frontend badge
API key security
191/192 tests pass (1 pre-existing failure in On smaller PRs: We strongly recommend against breaking this into smaller PRs. The experiment is a single feature with tightly coupled components. Splitting by signing tier would require splitting the UI, the settings registration, the REST endpoints, and the test suite, all of which depend on the shared pipeline. To illustrate the composition:
The signing methods that would be candidates for splitting represent 10% of the code. The other 90%, the test suite, protocol layer, backend pipeline, and UI, cannot function without at least one signing method present and would need to be duplicated or stubbed across separate PRs. A split would increase review burden rather than reduce it. |
dkotter
left a comment
There was a problem hiding this comment.
Noting I've not had a chance to review and test the latest round of changes but in testing other things within the AI plugin after I had tested this PR, ran into a fairly major issue we'll need to solve here.
When a piece of content is signed, we add some invisible unicode characters to the bottom of the content. This content is then stored in the normal post_content database field. The issue here is if another plugin needs to use this content for something, these unicode characters come along for the ride, so to speak.
The issue I ran into is in testing other AI features within this plugin after I had tested this PR, the amount of tokens I was using for each AI request sky rocketed, going from ~800 input tokens to ~90,000 input tokens, all because of these invisible unicode characters.
I imagine other plugins (both those doing things with AI and those that don't) that need to use the content of a post will also run into issues with these unicode characters automatically being applied to the content.
It seems like we need some way to separate out these characters from the actual content we store in the database. One suggestion would be to store those as a separate meta field and then add those into the content when needed (for things like checking provenance). This way any other plugin that grabs the content field won't be getting those characters as well.
That is just one suggestion though and there's likely other ways to solve this but we need to figure that out before we can move forward with this.
|
@Jameswlepage @JasonTheAdams tagging you two for review / testing / validation here as it sounds like there was a call that preceded this PR being open that y'all were on and thus @dkotter + I are likely lacking context. I also know there's some other conversations to be had about how to get WPORG set up as a signing entity. Given we're targeting a 0.8.0 release this week, I've moved this PR to the 1.0.0 milestone in case that timing lines up when y'all are able to coordinate and have things ready here. Also worth noting I'm very excited about this work, and have been for years, so please let me know how I can help support y'all here. |
247fb6e to
f5d8d19
Compare
…idebar UX, provider generalization - Migrate settings from render_settings_fields() PHP/HTML to declarative get_settings_fields() for the DataForm architecture (WordPress#340/WordPress#376) - Register all 8 settings with show_in_rest and proper schemas - Replace static "Signed with local key" sidebar copy with dynamic tier-aware notice: "Will be signed with [tier] key" - Add auto-sign content hash guard to prevent phantom re-signing on editor refresh without content changes - Wire up Verification_Badge frontend rendering with CSS enqueue - Mask API key in REST responses; bypass filter for signing operations - Generalize Connected tier from Encypher-specific to provider-agnostic with KNOWN_PROVIDERS constant for maintained compatible provider list - Update default signing endpoint to https://api.encypher.com/v1/sign
0e41774 to
b8da54f
Compare
Add a Content Provenance experiment that embeds cryptographic C2PA 2.3 manifests into post content as invisible Unicode variation selectors. Publishers can prove authorship, detect tampering, and participate in the content authenticity ecosystem used by Google, BBC, Adobe, OpenAI, and Microsoft. Content Provenance experiment: - Auto-signs posts on publish/update with c2pa.created / c2pa.edited actions and provenance-chain ingredient references - Content hash guard prevents phantom re-signing on editor autosave - Three signing tiers: Local (self-signed), Connected (CA-verified provider), BYOK (publisher's own certificate) - c2pa/sign and c2pa/verify Abilities via wp_do_ability() - Gutenberg sidebar panel with 5-state shield badge - /.well-known/c2pa discovery endpoint per C2PA spec - Optional frontend verification badge on public posts - Declarative settings fields for the DataForm architecture - Signed bytes stored in post meta, not inline in post_content C2PA protocol layer (pure PHP, no external dependencies): - CBOR encoder (RFC 8949, CTAP2 canonical) - COSE_Sign1 builder and verifier (RFC 9052, ES256/P-256) - JUMBF reader and writer (ISO 19566-5) - Claim builder conforming to C2PA 2.4 assertion and claim structures - c2pa.hash.data exclusions field per spec - c2pa.ingredient.v3 provenance chains 192 tests, 447 assertions passing.
b8da54f to
14007a3
Compare
|
@dkotter Good catch on the post_content token increase. Yes, any C2PA signing of the post content that gets embedded into the post would increase how many tokens are used if the AI system doesn't ignore the embedding. I went with a variation of your meta-field suggestion. The signed content, Unicode markers included, now lives entirely in _c2pa_embedded_content post meta. post_content stays clean. A priority-1 the_content filter injects the signed bytes at render time and removes wpautop so the byte stream stays intact. Paragraph formatting is preserved visually with white-space: pre-line on the wrapper div. While I was in there, we also consolidated sign_on_publish and sign_on_update into a single sign_after_save callback on wp_after_insert_post. That hook fires once per save after all other hooks finish, which eliminates the double-signing race when both publish_post and post_updated fire for the same edit. A content hash guard skips re-signing when the text has not changed. Verification works end-to-end now. The editor verify button sends post_id to the REST endpoint, which reads the canonical signed bytes from meta rather than relying on sanitized input. Would welcome your thoughts on the approach when you get a chance to look at the latest changes. |
|
@jeffpaul Glad to have your energy behind this. The latest push addresses the post_content contamination issue @dkotter identified, which was the last architectural piece to sort out. Signed bytes now live in post meta, so other plugins reading post_content are unaffected. Happy to coordinate with @Jameswlepage and @JasonTheAdams on review and testing. The WPORG signing entity question can run in parallel without blocking the code review, so those conversations can happen on their own timeline. Let us know what you need from our side to keep things moving. |
|
@erik-sv I'm in a holding pattern until James and/or Jason weigh-in |
|
Updating here based on latest change in Core AI team rep to assign to @JasonTheAdams for next steps. |




Changes since last review
develop(includes Add modern wp-build DataForm route for AI settings page #340 and Use toggle UI, auto-save, and inline DataForm settings for experiments #376, the DataForm settings architecture)get_settings_fields()for the new DataForm-based settings page. Removed ~250 lines ofrender_settings_fields()PHP/HTML/JS. All 8 settings registered withshow_in_restand proper REST API schemas.KNOWN_PROVIDERSconstant as a maintained registry of compatible signing services.show_badgeoption defaulting tofalseon fresh installs, wired up CSS enqueuetest_enqueue_assets_runs_on_post_screen, unrelated)Summary
Adds a Content Provenance experiment that embeds cryptographic C2PA 2.3 section A.7 manifests into post content as invisible Unicode variation selectors. Publishers can prove authorship, detect tampering, and participate in the content authenticity ecosystem used by Google, BBC, Adobe, OpenAI, and Microsoft.
What this adds
Content Provenance experiment
c2pa.created/c2pa.editedactions and provenance-chain ingredient referencesc2pa/signandc2pa/verifyAbilities viawp_do_ability()/.well-known/c2padiscovery endpoint per C2PA section 6.4C2PA protocol layer (pure PHP, no external dependencies)
Post signing flow
flowchart TD A[Post Published or Updated] --> B{Content Provenance enabled?} B -->|No| Z[Skip] B -->|Yes| C{Already signed + unchanged?} C -->|Yes| Z C -->|No| D[Strip HTML to plain text] D --> E[Build C2PA Manifest] E --> E1[c2pa.actions.v2] E --> E2[c2pa.hash.data.v2 SHA-256] E --> E3[c2pa.ingredient.v2 edit chain] E1 & E2 & E3 --> F{Signing tier} F -->|Local| G[EC P-256 self-signed via OpenSSL] F -->|Connected| H[POST to CA-verified provider API] F -->|BYOK| I[Publisher certificate PEM file] G & H & I --> J[Unicode Embedder: VS1-VS256 invisible bytes] J --> K[wp_update_post with embedded content] K --> L[Store post meta + content hash] L --> M[Gutenberg sidebar badge updates]Signing tiers
The Connected tier is provider-agnostic.
Connected_Signer::KNOWN_PROVIDERSmaintains a registry of compatible services. Any CA-verified signing provider with a compatible API can be used.WordPress Abilities API
Codebase composition
Files
includes/.../Content_Provenance.phpincludes/.../Verification_Badge.phpincludes/.../Well_Known_Handler.php/.well-known/c2padiscovery endpointincludes/.../C2PA_Manifest_Builder.phpincludes/.../Unicode_Embedder.phpincludes/.../C2PA/CBOR_Encoder.phpincludes/.../C2PA/COSE_Sign1_Builder.phpincludes/.../C2PA/COSE_Sign1_Verifier.phpincludes/.../C2PA/Claim_Builder.phpincludes/.../C2PA/JUMBF_Reader.phpincludes/.../C2PA/JUMBF_Writer.phpincludes/.../Signing/Signing_Interface.phpincludes/.../Signing/Local_Signer.phpincludes/.../Signing/Connected_Signer.phpKNOWN_PROVIDERSincludes/.../Signing/BYOK_Signer.phpincludes/Abilities/.../C2PA_Sign.phpc2pa/signAbilityincludes/Abilities/.../C2PA_Verify.phpc2pa/verifyAbilitysrc/experiments/content-provenance/index.jssrc/experiments/content-provenance/index.scsssrc/experiments/content-provenance/frontend.scsswebpack.config.jstests/.../Content_ProvenanceTest.phptests/.../Verification_BadgeTest.phptests/.../Well_Known_HandlerTest.phptests/.../Unicode_EmbedderTest.phptests/.../C2PA/CBOR_EncoderTest.phptests/.../C2PA/COSE_Sign1_BuilderTest.phptests/.../C2PA/COSE_Sign1_VerifierTest.phptests/.../C2PA/Claim_BuilderTest.phptests/.../C2PA/C2PA_ConformanceTest.phptests/.../C2PA/JUMBF_ReaderTest.phptests/.../C2PA/JUMBF_WriterTest.phptests/.../Signing/Local_SignerTest.phptests/.../Signing/Connected_SignerTest.phptests/.../Signing/BYOK_SignerTest.phpTest plan
vendor/bin/phpunit --filter Content_Provenance(192 tests pass)_c2pa_manifestmeta is setc2pa.editedaction + ingredient reference to previous manifestwp_do_ability('c2pa/sign', ['text' => 'hello'])inwp shell, returns signed textwp_do_ability('c2pa/verify', ['text' => $signed]), returnsverified: true/.well-known/c2pa, returns valid JSON discovery document/wp/v2/settings)Related