0

Looks like I'm missing something obvious when trying to fuzzy match multi term query.

What I'd like to achieve is to get only "Goleniow Helenow" result when providing "Goleniow Heleniow" query (city + district name with typo). Instead I get all the docs.

Basically I think I've tried all combinations of minimum_should_match, operator and even fuzziness parameters with no satisfying result.

Anyone could point out what am I missing ?

Index setup

curl -X PUT "localhost:9200/test-index?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": { 
      "name": { 
        "type": "text",
        "index": true
      } 
    } 
  }   
}'

Docs to index

curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Goleniow Helenow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Goleniow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Goleniow Jaworow"
}'

Query and result

curl -X POST "localhost:9200/test-index/_search?pretty" -H 'Content-Type: application/json' -d'{
  "query": {
    "match": {
      "name": {
        "minimum_should_match": "100%",
        "operator": "and",
        "fuzziness": "2",
        "query": "Goleniow Heleniow"
      }
    }
  }
}'
{
  "took" : 104,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.32180583,
    "hits" : [
      {
        "_index" : "test-index",
        "_id" : "Waqnj5UBnvH7uZURvQTX",
        "_score" : 0.32180583,
        "_source" : {
          "name" : "Goleniow Helenow"
        }
      },
      {
        "_index" : "test-index",
        "_id" : "Wqqoj5UBnvH7uZUR2QSO",
        "_score" : 0.2793999,
        "_source" : {
          "name" : "Goleniow"
        }
      },
      {
        "_index" : "test-index",
        "_id" : "W6qoj5UBnvH7uZUR8AT4",
        "_score" : 0.21600665,
        "_source" : {
          "name" : "Goleniow Jaworow"
        }
      }
    ]
  }
}

1 Answer 1

0

You can use span_near query. Here is a similar discussion.

Example query:

GET test-index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "span_near": {
            "clauses": [
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "name": {
                        "value": "Goleniow",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              },
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "name": {
                        "value": "Heleniow",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              }
            ],
            "slop": 0,
            "in_order": true
          }
        }
      ]
    }
  }
}

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.