0

I have been reading up on ElasticSearch and couldn't find an answer for how to do the following:

Say, you have some records with, "study" in the title and a user uses the word "studying" instead of "study". How would you set up ElasticSearch to match this?

Thanks, Alex

ps: Sorry, if this is a duplicate. Wasn't sure what to search for!

1
  • A good answer will depend on what the documents in ES look like. Can you give an example of the document and the query you tried that didn't match? Commented May 23, 2013 at 17:27

2 Answers 2

3

You might be interested in this: http://www.elasticsearch.org/guide/reference/query-dsl/flt-query/

For eg: I have indexed book titles and on this query:

{
  "query": {
    "bool": {
      "must": [
        {
          "fuzzy": {
            "book": {
              "value": "ringing",
              "min_similarity": "0.3"
            }
          }
        }
      ]
    }
  }
}

I got

{
  "took" : "1",
  "timed_out" : "false",
  "_shards" : {
    "total" : "5",
    "successful" : "5",
    "failed" : "0"
  }
  "hits" : {
    "total" : "1",
    "max_score" : "0.19178301",
    "hits" : [
      {
        "_index" : "library",
        "_type" : "book",
        "_id" : "3",
        "_score" : "0.19178301",
        "_source" : {
          "book" : "The Lord of the Rings",
          "author" : "J R R Tolkein"
        }
      }
    ]
  }
}

which is the only correct result..

Sign up to request clarification or add additional context in comments.

1 Comment

This way you would get similar matches, even fixing potential mistakes in the query. On the other hand you could get false positives. Another way to go would be stemming.
2

You could apply stemming to your documents, so that when you index studying, you are beneath indexing study. And when you query you do the same, so that when you search for studying again, you'll be searching for study and you'll find a match, both looking for study and studying.

Stemming depends of course on the language and there are different techniques, for english snowball is fine. What happens is that you lose some information when you index data, since as you can see you cannot really distinguish between studying and study anymore. If you want to keep that distinction you could index the same text in different ways using a multi_field and apply different text analysis to it. That way you could search on multiple fields, both the non stemmed version and stemmed version, maybe giving different weights to them.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.