0

I'm trying to write a query to will give me all the documents where the field "id" is of the form: "SOMETHING-SOMETHING-4SOMETHING-SOMETHING-SOMETHING"

For instance, ab-ba-4a-b-a is a valid id.

I wrote this query

  "query": 
  {
    "regexp": 
    {
      "id":
      {
        "value": ".*-.*-4.*-.*-.*"
      }
    }
  }

It gets no hits. What's wrong with this? I can see many ids of this form.

6
  • What is the datatype of id? Commented Jul 6, 2020 at 14:32
  • The type is a string. Commented Jul 6, 2020 at 14:34
  • Could you also let me know which version of ES are you using? Commented Jul 6, 2020 at 14:45
  • I'm using 7.8, but the answer you wrote was great! you shouldnt have deleted it. it works Commented Jul 6, 2020 at 14:52
  • I've posted it once again, thought if you are using version 2.x, I may have to modify it a bit. But happy your query has been resolved!! Commented Jul 6, 2020 at 14:53

1 Answer 1

1

If the id field is of type keyword the regexp should be working fine.

However if it is of type text, notice how elasticsearch stores the token internally.

POST /_analyze
{
  "text": "abc-abc-4bc-abc-abc",
  "analyzer": "standard"
}

Response:

{
  "tokens" : [
    {
      "token" : "abc",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "abc",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "4bc",
      "start_offset" : 8,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "abc",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "abc",
      "start_offset" : 16,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]
}

Notice that it breaks down the token abc-abc-4abc-abc-abc into 5 strings. Take a look at what Analysis and Analyzers are and how they are only applied on text fields.

However, keyword datatype has been created only for the cases where you do not want your text to be analyzed (i.e. broken into tokens and stored in inverted indexes) and stores the string value as it is internally.

Now just in case if your mapping is dynamic, ES by default creates two different fields for string values. a text and its keyword sibling, something like below:

{
    "mappings" : {
      "properties" : {
        "id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }

In that case, just apply the query you have on id.keyword field.

POST <your_index_name>/_search
{
  "query": {
    "regexp": {
      "id.keyword": ".*-.*-4.*-.*-.*"
    }
  }
}

Hope that helps!

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.