dcausse (David Causse)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Saturday

  • No visible events.

User Details

User Since
Jun 9 2015, 9:03 AM (547 w, 1 d)
Availability
Available
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

Tue, Dec 2

dcausse added a comment to T408431: Reindex all wikis.

forgot to mention that the reindex was started yesterday on the two other clusters (eqiad, cloudelastic)

Tue, Dec 2, 9:07 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse closed T408737: Enable Georgian Transliteration Second Try mappings for autocomplete as Resolved.
Tue, Dec 2, 8:20 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse closed T408737: Enable Georgian Transliteration Second Try mappings for autocomplete, a subtask of T127003: Transliterate Latin or Cyrillic script searches to Georgian script on Georgian wikis, as Resolved.
Tue, Dec 2, 8:20 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07)

Mon, Dec 1

dcausse moved T404858: A/B test using defaultsort with the completion suggester from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Mon, Dec 1, 3:42 PM · Patch-For-Review, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), Essential-Work, CirrusSearch
dcausse updated the task description for T404858: A/B test using defaultsort with the completion suggester.
Mon, Dec 1, 3:42 PM · Patch-For-Review, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), Essential-Work, CirrusSearch
dcausse added a comment to T404858: A/B test using defaultsort with the completion suggester.

A/B test results on other wikis: https://people.wikimedia.org/~dcausse/T404858-completion-default-sort-2.html

Mon, Dec 1, 3:41 PM · Patch-For-Review, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), Essential-Work, CirrusSearch
dcausse claimed T408431: Reindex all wikis.

Going to start the reindex today

Mon, Dec 1, 10:46 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse created T411347: New CirrusSearch dumps are not properly formatted.
Mon, Dec 1, 10:20 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch

Thu, Nov 27

dcausse claimed T408737: Enable Georgian Transliteration Second Try mappings for autocomplete.

went with the approach of enabling on the five georgian wikis at once, please let me know if a more conservative approach (one wiki first) is preferable.

Thu, Nov 27, 2:38 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse closed T410602: CirrusSearch metadata stores DEFAULTSORT overrides even after they've been removed as Resolved.

The update process has been fixed.
Existing stale data in the search index will get fixed when:

  • a new revision of the page is created
  • a template change propagates
  • when the continuous cleanup mechanism processes a page with stale data
Thu, Nov 27, 2:07 PM · MW-1.46-notes (1.46.0-wmf.3; 2025-11-19), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse renamed T411169: Improve & better document cirrus debug & explainability APIs from Improve & better document cirrus debug & exaplainability APIs to Improve & better document cirrus debug & explainability APIs.
Thu, Nov 27, 10:56 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse created T411169: Improve & better document cirrus debug & explainability APIs.
Thu, Nov 27, 10:55 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse closed T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete, a subtask of T402864: Integrate RU & HE DWIM-style mappings into autocomplete, as Resolved.
Thu, Nov 27, 8:18 AM · CirrusSearch, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.25; 2025-10-28), Essential-Work
dcausse closed T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete as Resolved.
Thu, Nov 27, 8:18 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse updated the task description for T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.
Thu, Nov 27, 8:12 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Wed, Nov 26

dcausse added a comment to T410758: Timeouts searching for terms and regular expressions too low.

Unless you noticed that that the regex got a lot slower recently and that more queries are timing out I think it is safer to keep the 15s internal timeout.

But Wikipedia content is more and more increasing and it will reach one time that 15s are not safer any more. What then? Shall we decrease the timeout more and more? I guess we need a search machine that is able to handle all the big content properly if CirrusSearch does not.

Wed, Nov 26, 2:33 PM · Discovery-Search, CirrusSearch
dcausse merged T410965: Using the search field on mobile does not yield suggestions until after a space has been inserted into T393819: Codex TypeaheadSearch doesn't work with mobile keyboard and predictive text.
Wed, Nov 26, 10:56 AM · Readers Essential Work 2025 (Codex), Reader Experience Team, Codex
dcausse merged task T410965: Using the search field on mobile does not yield suggestions until after a space has been inserted into T393819: Codex TypeaheadSearch doesn't work with mobile keyboard and predictive text.
Wed, Nov 26, 10:56 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse added a comment to T410758: Timeouts searching for terms and regular expressions too low.

CirrusSearch has to be careful when specifying timeouts of a regex query.
Regex queries are particularly costly and may cause a lot of stress on the servers if not properly protected.
The 15s timeouts has been setup for this, to ensure that the search backend return before any other timeouts are applied otherwise this might mean that a costly query mist continue to run outside of the concurrency protection (T152895).
Unless you noticed that that the regex got a lot slower recently and more queries I think it is safer to keep the 15s internal timeout.

Wed, Nov 26, 10:22 AM · Discovery-Search, CirrusSearch

Mon, Nov 24

dcausse updated the task description for T410899: Improve CirrusSearch consistency checks.
Mon, Nov 24, 4:25 PM · Discovery-Search, CirrusSearch
dcausse updated the task description for T410899: Improve CirrusSearch consistency checks.
Mon, Nov 24, 3:31 PM · Discovery-Search, CirrusSearch
dcausse updated the task description for T410899: Improve CirrusSearch consistency checks.
Mon, Nov 24, 3:28 PM · Discovery-Search, CirrusSearch
dcausse renamed T410899: Improve CirrusSearch consistency checks from Improve CirrusSearch consistancy checks to Improve CirrusSearch consistency checks.
Mon, Nov 24, 3:22 PM · Discovery-Search, CirrusSearch
dcausse created T410899: Improve CirrusSearch consistency checks.
Mon, Nov 24, 3:21 PM · Discovery-Search, CirrusSearch

Thu, Nov 20

dcausse claimed T410602: CirrusSearch metadata stores DEFAULTSORT overrides even after they've been removed.
Thu, Nov 20, 9:58 AM · MW-1.46-notes (1.46.0-wmf.3; 2025-11-19), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse added a comment to T410602: CirrusSearch metadata stores DEFAULTSORT overrides even after they've been removed.

Thanks for reporting this, I think there are two different issues that allowed such suggestions to appear:

  • defaultsort is indeed not properly removed from the search index when it's erased, a null value unfortunately tells the system to ignore it when updating it, this needs to be fixed for this field
  • defaultsort values are allowed to help completion only if they match a particular pattern, this pattern seems too permissive and should be corrected to limit the possibility of such vandalism to impact search suggestions in the future
Thu, Nov 20, 8:46 AM · MW-1.46-notes (1.46.0-wmf.3; 2025-11-19), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch

Wed, Nov 19

dcausse claimed T409218: Elastica\Exception\Connection\HttpException: Unknown error:52.
Wed, Nov 19, 9:01 AM · MW-1.46-notes (1.46.0-wmf.5; 2025-12-02), Discovery-Search (2025.10.20 - 2025.12.31), MediaWiki-extensions-Translate, Wikimedia-production-error
dcausse moved T410269: Wikibase CI broken: Unknown filter type [truncate_norm] for [truncate_keyword] from Incoming to Done on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Wed, Nov 19, 8:41 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, ci-test-error (WMF-deployed Build Failure), Wikidata
dcausse edited projects for T410269: Wikibase CI broken: Unknown filter type [truncate_norm] for [truncate_keyword], added: Discovery-Search (2025.10.20 - 2025.12.31); removed Discovery-Search.
Wed, Nov 19, 8:41 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, ci-test-error (WMF-deployed Build Failure), Wikidata
dcausse added a comment to T408431: Reindex all wikis.

Should be ready once 1.46.0-wmf.3 is deployed, earliest would be Thursday nov 20 but probably safer to wait til the following week in case we rollback.

Wed, Nov 19, 8:22 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Tue, Nov 18

dcausse moved T404597: Eventutilities Flink: port SerDe tests from SUP from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Tue, Nov 18, 6:19 PM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering-Radar, Event-Platform, Data-Engineering, Essential-Work, CirrusSearch
dcausse claimed T404597: Eventutilities Flink: port SerDe tests from SUP.
Tue, Nov 18, 1:47 PM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering-Radar, Event-Platform, Data-Engineering, Essential-Work, CirrusSearch
dcausse moved T406566: BadMethodCallException: MediaWiki\Session\SessionProvider::preventSessionsForUser must be implemented when canChangeUser() is false from Incoming to Blocked / Waiting on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Tue, Nov 18, 1:46 PM · MW-1.46-notes (1.46.0-wmf.4; 2025-11-25), Discovery-Search (2025.10.20 - 2025.12.31), MediaWiki-Platform-Team (Radar), NetworkSession, MW-1.45-notes (1.45.0-wmf.22; 2025-10-07), MediaWiki-Core-AuthManager, Wikimedia-production-error
dcausse added a comment to T408533: Initial task generation and ingestion to Cassandra and Search weight tags.

Hi @pfischer @dcausse, ML team wants to follow up on the initial ingestion process. As you mentioned before, the Search platform team has a manual script for this purpose. Can the ML team execute this on our end (e.g., in statbox)? Or can only the Search team execute it?

it is a bit cumbersome to run unfortunately and some adaptations have to be made (we only used it to backfill article countries). The script is in stat1009.eqiad.wmnet:~dcausse/articlecountry:

  • backfill_articlecountry.scala the spark job that reads hdfs://analytics-hadoop/user/dcausse/topic_model/wiki-region-groundtruth/regions-cirrus-upload.tsv.gz and convert it to classification.prediction.articlecountry weighted tags, this one would have to be adapted based on your source data
  • wiki.lst: the list of wikis to filter on
  • backfill.sh the shell script that orchestrates all this
Tue, Nov 18, 1:28 PM · Discovery-Search (2025.10.20 - 2025.12.31), Machine-Learning-Team

Mon, Nov 17

dcausse added a comment to T410269: Wikibase CI broken: Unknown filter type [truncate_norm] for [truncate_keyword].

Indeed, the new debian package wmf-opensearch-search-plugins version 1.3.20+12 has to be installed to run the lastest cirrus version. We generally maintain the cirrussearch-opensearch-image docket image that is used by MW developers and our cirrus integration test suite, but here I think that you install opensearch on the existing quibble image and thus refreshing this image with the new version of the plugin is indeed what should be needed.

Mon, Nov 17, 3:31 PM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, ci-test-error (WMF-deployed Build Failure), Wikidata
dcausse moved T40403: Sortable search results from Needs Review to To be Deployed on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Mon, Nov 17, 2:50 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse added a comment to T409898: Set up OpenSearch instance supporting vector search.

Do we need any specific plugins on this instance? At the moment, we're working on a minimal OpenSearch deployment, with no additional plugins, meant for the non-Search use cases.

Mon, Nov 17, 2:02 PM · Essential-Work, Discovery-Search, Research, Data-Platform-SRE (2025.11.07 - 2025.11.28)
dcausse reassigned T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete from dcausse to TJones.
Mon, Nov 17, 1:46 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse closed T410007: upstream request timeout, http-status 504 in the API as Resolved.

This should be fixed, I can see the partial search response instead of the error.

Mon, Nov 17, 1:30 PM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, MW-Interfaces-Team, MediaWiki-Action-API
dcausse updated the task description for T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.
Mon, Nov 17, 10:03 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse added a comment to T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.

@TJones the change should be live on hewiki and ruwiki, could you draft a message for the tech news possibly by adding some text to https://meta.wikimedia.org/wiki/Tech/News/2025/48?

Mon, Nov 17, 10:03 AM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse added a comment to T410007: upstream request timeout, http-status 504 in the API.

... there is now a component failing earlier than the allowed 50s.

How we can find out the component failing?

Mon, Nov 17, 9:32 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, MW-Interfaces-Team, MediaWiki-Action-API
dcausse added a comment to T408223: Action API via rest-gateway production rollout.

We received notifications from users that the search API which is configured to allow 50s timeouts to support costly search requests is now failing at 15s with an upstream request timeout (T410007). The user reported that the behavior started to change around nov 11th which is apparently when we started to roll out this new route on group2 wikis. I'm not 100% sure that this change is the cause of this new behavior but IIUC on all wikis except enwiki we now route api.php requests to the rest-gateway. If I'm not mistaken the rest-gateway has a default timeout of 15s which might explain this new behavior? Are there ways to vary this timeout based on the target action API?

Mon, Nov 17, 9:29 AM · OKR-Work, [MWI] FY2025-26 Q2, MW-Interfaces-Team (MWI-Roadmap)
dcausse added a comment to T410007: upstream request timeout, http-status 504 in the API.

Indeed, the internal timeout should be 50s to allow the regex to run. It is possible that something changed in the request flow that there is now a component failing earlier than the allowed 50s.

Mon, Nov 17, 8:11 AM · Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch, MW-Interfaces-Team, MediaWiki-Action-API

Fri, Nov 14

dcausse added a comment to T409469: Enable ChangeProp to consume mediawiki.page_content_change.v1.

Would we also need to explicitly create the topics in main? Is auto topic creation enabled there?

Fri, Nov 14, 5:35 PM · Data-Engineering, serviceops, Machine-Learning-Team
dcausse added a comment to T409469: Enable ChangeProp to consume mediawiki.page_content_change.v1.

If pushing to kafka-main you might need to increase broker's message.max.bytes see T344688.

Fri, Nov 14, 3:19 PM · Data-Engineering, serviceops, Machine-Learning-Team
dcausse moved T40403: Sortable search results from In Progress to Needs Review on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Fri, Nov 14, 2:49 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Thu, Nov 13

dcausse claimed T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete.
Thu, Nov 13, 2:04 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Fri, Nov 7

dcausse added a comment to T409070: Latest CirrusSearch is incompatible with ES7.10 and the corresponding WMF extra plugin.

It might be the only reasonable way is to remove anchored trigram support from REL1_45

Fri, Nov 7, 8:33 AM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch

Thu, Nov 6

dcausse created P84987 completion suggester events with second try hits.
Thu, Nov 6, 10:30 AM
dcausse closed T405475: Search for L7 shows incomplete drop-down box, a subtask of T379740: When searching by LID only the LID is shown, as Resolved.
Thu, Nov 6, 8:13 AM · Abstract Wikipedia team, Essential-Work, Design, WikiLambda Front-end
dcausse closed T405475: Search for L7 shows incomplete drop-down box as Resolved.

I think this is now fixed, the behavior of items and lexemes should be the same.
The API response looks like this now (on L7 when searching for L7):

{
Thu, Nov 6, 8:13 AM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Wikidata-Omega (Radar/Epics/Stalled), Wikidata-Query-Service, CirrusSearch, Wikidata
dcausse updated the task description for T409397: Adapt EntityIdSearchHelper for Lexemes.
Thu, Nov 6, 8:04 AM · Wikidata, Wikidata Lexicographical data
dcausse created T409397: Adapt EntityIdSearchHelper for Lexemes.
Thu, Nov 6, 8:01 AM · Wikidata, Wikidata Lexicographical data

Wed, Nov 5

dcausse moved T408431: Reindex all wikis from Incoming to Ready for Dev on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Wed, Nov 5, 2:36 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse moved T408154: AB Test doubling near match field weights on commonswiki from Incoming to In Progress on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Wed, Nov 5, 2:36 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse moved T408734: Enable RU & HE DWIM-style Second Try mappings for autocomplete from Incoming to Ready for Dev on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Wed, Nov 5, 2:36 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work
dcausse moved T408737: Enable Georgian Transliteration Second Try mappings for autocomplete from Incoming to Ready for Dev on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Wed, Nov 5, 2:35 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work

Tue, Nov 4

dcausse added a comment to T404858: A/B test using defaultsort with the completion suggester.

Draft uploaded at: https://people.wikimedia.org/~dcausse/T404858-completion-default-sort-en-fr-he.html

Tue, Nov 4, 6:05 PM · Patch-For-Review, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), Essential-Work, CirrusSearch
dcausse updated subscribers of T393966: Update WDQS SLO lag queries to reflect graph split changes.
Tue, Nov 4, 5:14 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, User-Elukey, Essential-Work, SRE-SLO, observability
dcausse added a comment to T393966: Update WDQS SLO lag queries to reflect graph split changes.

@dcausse In this updated version of the SLI we don't want to count throttled requests as either a success or failure, but rather exclude them entirely. However I'm having a bit of trouble understanding how all the pieces fit together.

Briefly, when a request comes in from a user and hits the throttling filter, does that avoid the request ever ultimately hitting blazegraph itself? In other words, we have the metrics of blazegraph_queries_done and blazegraph_queries_error which are scraped by the prometheus blazegraph exporter, and I want to know if blazegraph_queries_error implicitly contains the throttled requests in its count or not.

if it looks like request -> throttling filter -> (if not throttled) blazegraph then I think those are already excluded and therefore I don't have to do anything. But if the throttled request still makes it to blazegraph and blazegraph issues the 4xx at that point then they would be included. I think it's the former but figured you might be able to shed some light here.

Tue, Nov 4, 5:13 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, User-Elukey, Essential-Work, SRE-SLO, observability

Nov 3 2025

dcausse moved T405475: Search for L7 shows incomplete drop-down box from Needs Review to To be Deployed on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 3 2025, 4:10 PM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Wikidata-Omega (Radar/Epics/Stalled), Wikidata-Query-Service, CirrusSearch, Wikidata
dcausse moved T402864: Integrate RU & HE DWIM-style mappings into autocomplete from To be Deployed to Done on the Discovery-Search (2025.10.20 - 2025.12.31) board.
Nov 3 2025, 4:09 PM · CirrusSearch, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.25; 2025-10-28), Essential-Work
dcausse added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

@dcausse, regarding the weighted search tag, IIUC, for a few mock records, the simplest way to produce events to CirrusSearch would be using kafkacat like P84613, right?

yes the event you crafted is perfect it's just missing the meta section, also please use event-gate to push them (otherwise you would bypass the various event platform validations):

curl -H"User-Agent: achou-T401021/wmf" -H"Content-Type: application/json" -XPOST https://eventgate-main.discovery.wmnet:4492/v1/events -d '[events]'

Event:

{
  "meta": {
    "stream": "mediawiki.cirrussearch.page_weighted_tags_change.v1",
    "domain": "test.wikipedia.org"
  },
  "dt": "2025-11-03T10:56:00Z",
  "wiki_id": "testwiki",
  "page": {
    "page_id": 1,
    "page_title": "SomePage",
    "namespace_id": 0,
    "is_redirect": false
  },
  "weighted_tags": {
    "set": {
      "recommendation.tone": [
        {
          "tag": "exists",
          "score": 1.0
        }
      ]
    }
  }
}
Nov 3 2025, 1:38 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
dcausse added a comment to T403775: New search option: Sort results by page name.

An extra Advanced-Search ticket for something that's currently technically impossible is not useful.

Nov 3 2025, 9:11 AM · Discovery-Search, Essential-Work, CirrusSearch, MediaWiki-Search, RoadToWiki

Oct 31 2025

dcausse renamed T408909: The cirrus config dump API may produce unexpected json output from The cirrus config dump API may produce unexpected json ouput to The cirrus config dump API may produce unexpected json output.
Oct 31 2025, 10:13 AM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse created T408909: The cirrus config dump API may produce unexpected json output.
Oct 31 2025, 10:12 AM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch

Oct 29 2025

dcausse added a comment to T408701: Enable event logging for the mediawiki.product_metrics.suggested_investigations_interaction stream on loginwiki.

If only touching wgEnableEventBus to add TYPE_EVENT is there a risk to start producing data to unrelated streams where downstream consumers might be a bit puzzled to see things from loginwiki? If yes perhaps constraining loginwiki to only produce to mediawiki.product_metrics.suggested_investigations_interaction should be preferred?

Oct 29 2025, 5:54 PM · Product Safety and Integrity (Sprint Mint Choc Chip Ice Cream (Oct 20 - Nov 7)), CheckUser-SuggestedInvestigations, Metrics Platform
dcausse added a comment to T407514: Ignore MacOS .DS_Store in parent pom.
  • exclude .DS_Store from this duplicate-finder plugin in the parent pom, but we'd have to re-add it for projects that override the default settings

Couldn't the child pom use combine.child as described in the doc to not need to re-declare? This might not be obvious to child projets, so it might need some documentation. (I haven't checked the actual use case, so I might be missing something).

In general, the approach we've tried to take was to make projects as easy to build as possible. You should be able to checkout a project and run ./mvnw clean verify and have it work. Needing to add specific -D arguments breaks that rule.

Oct 29 2025, 10:11 AM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Java-Scala-Standardization, Essential-Work
dcausse updated subscribers of T407514: Ignore MacOS .DS_Store in parent pom.

I often open the target folder to find the jar so I can deploy it locally. I also use the Finder with Java projects to recursively open folders to get all the way to the bottom quickly. (Anything that relies on people not opening folders is doomed to fail eventually.)

Oct 29 2025, 8:28 AM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Java-Scala-Standardization, Essential-Work
dcausse added a project to T408533: Initial task generation and ingestion to Cassandra and Search weight tags: Discovery-Search.
Oct 29 2025, 7:58 AM · Discovery-Search (2025.10.20 - 2025.12.31), Machine-Learning-Team

Oct 28 2025

dcausse added a comment to T407514: Ignore MacOS .DS_Store in parent pom.

I believe we should find what is causing them to appear in the first place

That would be me opening folders. The MacOS Finder creates .DS_Store files to record information about the folder—how it is sorted, the position of icons, where the window is located, etc. https://en.wikipedia.org/wiki/.DS_Store#Purpose_and_location :

Oct 28 2025, 3:29 PM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Java-Scala-Standardization, Essential-Work
dcausse added a comment to T407514: Ignore MacOS .DS_Store in parent pom.

@TJones could you try to replicate this step by step from a fresh repository (cleanly fetched from gerrit and no target folders).
By default I think maven is supposed to ignore these folders when moving them to the target folders. I believe we should find what is causing them to appear in the first place because I'm afraid that fixing duplicate-finder-maven-plugin might be too late because it'd mean that these .DS_Store folders already have leaked into the build directories which could cause them to be published silently.
If on the other hand you found third-party jars (dependencies we use) with such folder in them this is a different story, we might possibly try to raise the issue to the lib owner and use the workaround you suggested to make our build pass (but haven't hit this problem myself so hopefully it's not the case).

Oct 28 2025, 10:29 AM · Discovery-Search (2025.10.20 - 2025.12.31), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Java-Scala-Standardization, Essential-Work
dcausse added a comment to T405475: Search for L7 shows incomplete drop-down box.

but this is perhaps good enough for now

Yeah I think so. And maybe we can have a ticket for adapting the EntityIdSearchHelper for Lexemes? (I'd do it but I don't understand enough of the details to create a meaningful ticket.)

Oct 28 2025, 10:07 AM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Wikidata-Omega (Radar/Epics/Stalled), Wikidata-Query-Service, CirrusSearch, Wikidata
dcausse added a comment to T405475: Search for L7 shows incomplete drop-down box.

I believe the issue is that we generally prefer EntityIdSearchHelper when matching lexeme IDs, this search helper does not have any customization for lexemes. A simple approach is to prefer the Cirrus version of the hits which contains the expected set of metadata, this might not fully solve the issue in cases where the Lexeme ID is searched rapidly after being created (searched before the search engine is updated) but this is perhaps good enough for now?

Oct 28 2025, 9:48 AM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Wikidata-Omega (Radar/Epics/Stalled), Wikidata-Query-Service, CirrusSearch, Wikidata
dcausse claimed T405475: Search for L7 shows incomplete drop-down box.
Oct 28 2025, 9:17 AM · MW-1.46-notes (1.46.0-wmf.1; 2025-11-05), Discovery-Search (2025.10.20 - 2025.12.31), Wikidata-Omega (Radar/Epics/Stalled), Wikidata-Query-Service, CirrusSearch, Wikidata

Oct 27 2025

dcausse moved T406920: deepcategory search fails to show all expected results from In Progress to Done on the Data-Platform-SRE (2025.10.17 - 2025.11.07) board.
Oct 27 2025, 5:22 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons
dcausse closed T406920: deepcategory search fails to show all expected results as Resolved.

The only thing I'm not sure about is one of your tests was this: for endpoint in $(cat wdqs_hosts.lst | grep ^w); do echo -n "$endpoint: "; curl -s -XPOST --data-urlencode [email protected] http://$endpoint/bigdata/namespace/categories/sparql?format=json | jq '.results.bindings[] | .C.value'; done but none of these have a .C. Not sure if that is intended or not.

Sorry I did not paste this query which is:

SELECT (COUNT(*) AS ?C) WHERE {
	<https://commons.wikimedia.org/wiki/Category:Current_De_Havilland_Canada_aircraft_of_Universal_Air> ?p <https://commons.wikimedia.org/wiki/Category:Universal_Air_current_fleet> .
}

I tested this it and all nodes seem to have the missing link.
I would consider this ticket done.

Oct 27 2025, 5:21 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons
dcausse updated the task description for T408399: Truncate labels.*.near_match fields.
Oct 27 2025, 3:52 PM · Essential-Work, Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse created T408399: Truncate labels.*.near_match fields.
Oct 27 2025, 3:05 PM · Essential-Work, Discovery-Search (2025.10.20 - 2025.12.31), CirrusSearch
dcausse added a comment to T407520: Deploy various plugins to fix various things.

We should wait for https://gerrit.wikimedia.org/r/c/search/extra/+/1198569 I think

Oct 27 2025, 10:46 AM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse moved T402864: Integrate RU & HE DWIM-style mappings into autocomplete from Needs Review to To be Deployed on the Discovery-Search (2025.09.26 - 2025.10.17) board.
Oct 27 2025, 10:45 AM · CirrusSearch, Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.25; 2025-10-28), Essential-Work

Oct 24 2025

dcausse added a comment to T406205: Investigate and cleanup broken weighted_tags in cirrus indices.

I think the problem is in the extra plugin, I could reproduce it when the weighted_tags is current null in opensearch with the following bulk sequence:

{"index": {"_index": "my_database_content", "_id": "10000"}}
{}
{"update": {"_index": "my_database_content", "_id": "10000"}}
{"script":{"source":"super_detect_noop","lang":"super_detect_noop","params":{"handlers":{"weighted_tags":"multilist","version":"documentVersion"},"source":{"version":1,"weighted_tags":["mytag/__DELETE_GROUPING__","myothertag/somedata|2"]}}},"upsert":{"version":1}}
Oct 24 2025, 1:56 PM · Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.25; 2025-10-28), Patch-For-Review, Essential-Work, CirrusSearch
dcausse moved T407520: Deploy various plugins to fix various things from Blocked / Waiting to Needs Review on the Discovery-Search (2025.09.26 - 2025.10.17) board.
Oct 24 2025, 1:02 PM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse moved T403775: New search option: Sort results by page name from needs triage to UI tickets on the Discovery-Search board.
Oct 24 2025, 12:53 PM · Discovery-Search, Essential-Work, CirrusSearch, MediaWiki-Search, RoadToWiki
dcausse edited projects for T403775: New search option: Sort results by page name, added: Discovery-Search; removed Discovery-Search (2025.09.26 - 2025.10.17).
Oct 24 2025, 12:52 PM · Discovery-Search, Essential-Work, CirrusSearch, MediaWiki-Search, RoadToWiki
dcausse added a comment to T403775: New search option: Sort results by page name.

CirrusSearch may provide such sorting options via T40403. I believe that this ticket refers to the UI work required to happen when this capability is available from CirrusSearch, given that there is no such selector in Special:Search I have the impression that the intent of this ticket was to adapt Advanced-Search, @thiemowmde am I missing something?

Oct 24 2025, 12:51 PM · Discovery-Search, Essential-Work, CirrusSearch, MediaWiki-Search, RoadToWiki
dcausse closed T405869: Tune the perfield_builder_relaxed query builder profile as Resolved.
Oct 24 2025, 12:33 PM · Discovery-Search (2025.10.20 - 2025.12.31), MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), CirrusSearch
dcausse closed T405869: Tune the perfield_builder_relaxed query builder profile, a subtask of T343148: Relax 'AND' operator in search queries, as Resolved.
Oct 24 2025, 12:33 PM · Discovery-Search, CirrusSearch
dcausse closed T402629: Monitor CirrusSearch index failures as Resolved.
Oct 24 2025, 12:32 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse added a comment to T407520: Deploy various plugins to fix various things.

Opened https://gitlab.wikimedia.org/repos/search-platform/opensearch-plugins-deb/-/merge_requests/12 that should have all this.

Oct 24 2025, 11:02 AM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Oct 23 2025

dcausse renamed T405842: Port/fork newer versions of ruflin/Elastica to opensearch from Port/fork newer version of ruflin/elastica to opensearch to Port/fork newer versions of ruflin/Elastica to opensearch.
Oct 23 2025, 5:03 PM · PHP 8.5 support, Discovery-Search, CirrusSearch
dcausse renamed T405842: Port/fork newer versions of ruflin/Elastica to opensearch from Upgrade ruflin/elastica to >= 8.2 in Elastica to Port/fork newer version of ruflin/elastica to opensearch.
Oct 23 2025, 5:02 PM · PHP 8.5 support, Discovery-Search, CirrusSearch
dcausse merged T406506: Migrate to opensearch-php into T405842: Port/fork newer versions of ruflin/Elastica to opensearch.
Oct 23 2025, 4:59 PM · PHP 8.5 support, Discovery-Search, CirrusSearch
dcausse merged task T406506: Migrate to opensearch-php into T405842: Port/fork newer versions of ruflin/Elastica to opensearch.
Oct 23 2025, 4:59 PM · Discovery-Search (2025.09.26 - 2025.10.17), CirrusSearch
dcausse updated the task description for T402629: Monitor CirrusSearch index failures.
Oct 23 2025, 4:52 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch

Oct 21 2025

dcausse claimed T40403: Sortable search results.
Oct 21 2025, 2:27 PM · MW-1.46-notes (1.46.0-wmf.2; 2025-11-12), Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, CirrusSearch
dcausse added a comment to T406920: deepcategory search fails to show all expected results.

Looking at the lag data of the category graph it seems to me that around june 12 2015 we started to have more lag issues:

image.png (640×3 px, 268 KB)

It's perhaps not enough data to draw any conclusions but perhaps something to look into if something has changed around this period.

Oct 21 2025, 1:57 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons
dcausse placed T406920: deepcategory search fails to show all expected results up for grabs.

Actually the graph seems outdated on all 4 endpoints I tested.
The query:

SELECT ?out WHERE {
      SERVICE mediawiki:categoryTree {
          bd:serviceParam mediawiki:start <https://commons.wikimedia.org/wiki/Category:Aircraft_of_Universal_Air> .
          bd:serviceParam mediawiki:direction "Reverse" .
          bd:serviceParam mediawiki:depth 5 .
      }
} ORDER BY ASC(?depth)
LIMIT 200

Should return more than two items:

for endpoint in wdqs-internal-main.svc.eqiad.wmnet wdqs-main.svc.eqiad.wmnet wdqs-internal-main.svc.codfw.wmnet wdqs-main.svc.codfw.wmnet; do
  echo -n "$endpoint: ";
  curl -s -XPOST --data-urlencode [email protected] https://$endpoint/bigdata/namespace/categories/sparql?format=json | jq '.results.bindings | length';
done
wdqs-internal-main.svc.eqiad.wmnet: 2
wdqs-main.svc.eqiad.wmnet: 2
wdqs-internal-main.svc.codfw.wmnet: 2
wdqs-main.svc.codfw.wmnet: 2
Oct 21 2025, 1:50 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons
dcausse claimed T406920: deepcategory search fails to show all expected results.
Oct 21 2025, 12:59 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons