Skip to content

Pushdown for LIKE (LIST) #129557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 23, 2025
Merged

Conversation

julian-elastic
Copy link
Contributor

@julian-elastic julian-elastic commented Jun 17, 2025

Improved performance of LIKE (LIST) by a factor of 5 by pushing an Automaton to do the evaluation down to Lucine

Before:
{"took":629,"documents_found":10000000,"values":[[26419752]]}
After:
{"took":122,"documents_found":2222222,"values":[[26419752]]}

Measured by this script, look at test like list with and without this change

passwd="redacted"
curl -sk -uelastic:$passwd -HContent-Type:application/json -XDELETE http://localhost:9200/test
curl -sk -uelastic:$passwd -HContent-Type:application/json -XPUT http://localhost:9200/test -d'{
    "settings": {
        "index.refresh_interval": -1
    },
    "mappings": {
        "properties": {
            "f": {
                "type": "keyword"
            }
        }
    }
}'
counter=1
for a in {1..10000}; do
    rm -f /tmp/bulk*
    for b in {1..1000}; do
        echo '{"index":{}}' >> /tmp/bulk
        printf '{"f":"text %d"}\n' "$counter" >> /tmp/bulk
        counter=$((counter + 1))
    done
    #ls -l /tmp/bulk*
    printf %05d: $a
    curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST http://localhost:9200/test/_bulk?pretty --data-binary @/tmp/bulk | grep errors
done
curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST http://localhost:9200/test/_forcemerge?max_num_segments=1
curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST http://localhost:9200/test/_refresh
echo
curl -sk -uelastic:$passwd http://localhost:9200/_cat/indices?v

test_like() {
    echo -n "test like"
    curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST 'http://localhost:9200/_query?pretty' -d'{
        "query": "FROM test | WHERE f like \"text 2*\" or f like \"text 3*\" or f like \"text 224*\" or f like \"text 225*\" or f like \"text 226*\"| STATS SUM(LENGTH(f))",
        "pragma": {
            "data_partitioning": "shard"
        }
    }' | jq -c '{took, documents_found, values}'
}

test_like_list() {
    echo -n "test like list"
    curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST 'http://localhost:9200/_query?pretty' -d'{
        "query": "FROM test | WHERE f like (\"text 2*\", \"text 3*\", \"text 224*\", \"text 225*\", \"text 226*\") | STATS SUM(LENGTH(f))",
        "pragma": {
            "data_partitioning": "shard"
        }
    }' | jq -c '{took, documents_found, values}'
}


for a in {1..100}; do
    test_like
    test_like_list
done

@julian-elastic julian-elastic self-assigned this Jun 17, 2025
@julian-elastic julian-elastic added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >enhancement labels Jun 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @julian-elastic, I've created a changelog YAML for you.

@julian-elastic julian-elastic added v8.19.0 auto-backport Automatically create backport pull requests when merged labels Jun 17, 2025
@julian-elastic julian-elastic marked this pull request as ready for review June 18, 2025 00:01
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@julian-elastic julian-elastic merged commit caae426 into elastic:main Jun 23, 2025
32 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 129557

@julian-elastic
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

julian-elastic added a commit to julian-elastic/elasticsearch that referenced this pull request Jun 23, 2025
Improved performance of LIKE (LIST)  by pushing an Automaton to do the evaluation down to Lucine.

(cherry picked from commit caae426)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/regex/WildcardLikeList.java
julian-elastic added a commit that referenced this pull request Jun 23, 2025
Improved performance of LIKE (LIST)  by pushing an Automaton to do the evaluation down to Lucine.

(cherry picked from commit caae426)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/regex/WildcardLikeList.java
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025
Improved performance of LIKE (LIST)  by pushing an Automaton to do the evaluation down to Lucine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
3 participants
close