2

Curious if there is some way to check if document ID is part of a large (million+ results) Elasticsearch query/filter.

Essentially I’ll have a group of related document ID’s and only want to return them if they are part of a larger query. Hoping to do database side. Theoretically seemed possible since ES has to cache stuff related to large scrolls.

0

1 Answer 1

1

It's a interesting use-case but you need to understand that Elasticsearch(ES) doesn't return all the matching documents ids in the search result and return by default only the 10 documents in the response, which can be changed by the size parameter.

And if you increase the size param and have millions of matching docs in your query then ES query performance would be very bad and it might bring even entire cluster down if you frequently fire such queries(in absence of circuit breaker) so be cautious about it.

You are right that, ES cache the stuff, but again that if you try to cache huge amount of data and that is getting invalidate very frequent then you will not get the required performance benefits, so better do the benchmark against it.

You are already on the correct path to use, scroll API to iterate on millions on search result, just see below points to improve further.

  1. First get the count of search result, this is included in default search response with eq or greater value which will give you idea that how many search results you have based on which you can give size param for subsequent calls to see if your id is present or not.
  2. See if you effectively utilize the filters context in your query, which is by default cached at ES.
  3. Benchmark your some heavy scroll API calls with your data.
  4. Refer this thread to fine tune your cluster and index configuration to optimize ES response further.
Sign up to request clarification or add additional context in comments.

7 Comments

Yeah that's what I thought/hoping to avoid. Whats going to happen is a bunch of scrolls will be created and the processing will happen user side. Was hoping for something to say "yeah this document is part of this query, its score was 3.87, etc".
Just found something called percolate, would that not be useful?
ohh yeah, percolate can be used in your use case, Please give it a try
@MarkII please let us know whether percolate was useful for you or not
Haven't been able to test it, trying to get the design down first. @Opster do you know if its possible to send a document ID from another index to a percolate query for checking?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.