Skip to content

Merge and diff

Originally, I wanted to blog about adding new properties (taxon data speficially, NCBI, GBIF, and iINaturalist) to my AC2WD tool (originally described here). If you have the user script installed on Wikidata, AC2WD will automatically show up on relevant taxon items.

But then I realized that the underlying tech might be useful to others, if exposed as an API. The tool checks at least one, and likely multiple, external IDs (eg GND, NCBI taxon) for useful information. Instead of trying to patch an existing item, I build new ones in memory, one for each external ID. Then, I merge all the new items into one. Finally, I merge that new item into the existing Wikidata item. Each merge gives me a “diff”, a JSON structure that can be send to the wbeditentity action in the Wikidata API. For the first merges of all the new items into one, I ignore the diffs (because none of these items exist, there is no point in keeping them), but rather I keep the merged item. On the last step, I ignore the resulting item itself, but keep the diff, which can then be applied to Wikidata. This is what the user script does; it retrieves the diff from the AC2WD API and applies it on-wiki. So I am now exposing the merge/diff functionality in the API.

Why does this matter? Because many edits to Wikidata, especially automated ones, are additions, either labels, statements, etc., or references to statements. But if you want to add a new statement, you will have to check if such a statement already exists. If it does, you will need to check the references you have; which ones are already in the satatement, and which should be added? This is very tedious and error-prone to do. But now, you can just take your input data, create the item you want in memory, send it and the Wikidata item in question, and apply the diff with wbeditentity. You can even use the same code to create a new item (with “new=item”).

Statements are considered the same if they have the same property, value, and qualifiers. If they are the same, references will be added if they do not already exist (excluding eg “retrieved” (P813) timetamps). A label in the new item will become an alias in the merged one, unless it is already the label or an alias. All you have to do is to generate an item that has the information you want to be in the Wikidata item, and the AC2WD merge API will do the rest. And if you write in Rust, you can use the merge functionality directly, without going through the API.

I see this as a niche between hand-rolled code to edit Wikidata, and using QuickStatements to offload your edits. The merge function is a bit opinionated at the moment (no deletions of statements etc, no changing values), but I can add some desired functionality if you let me know.