2

I would like to run a query to see how many documents have an empty object stored as their value. For example it would return a document like:

"hits": [
  {
    "_source": {
      "otherfield": "abc",
      "somefield": {}
    }
  }
]

But not either no field / the field with an undefined value, or the field with an object containing attributes:

"hits": [
  {
    "_source": {
      "otherfield": "abc",
      // <-- note no "somefield"
    }
  },
  {
    "_source": {
      "otherfield": "abc",
      "somefield": { "field1": "value1" }
    }
  }
]

But the query I have will also return documents where the field is an object with attributes such as "somefield": { "field1": "value1" }

GET /documents/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field": "somefield.field1"
          }
        },
      ]
      "should": [
        {
          "exists": {
            "field": "somefield"
          }
        }
      ],
      "minimum_should_match": 2
    }
  }
}

Using Elasticsearch 5.4

1 Answer 1

1

The following query should be enough to find all documents with empty somefield field:

{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "somefield"
        }
      }
    }
  }
}

While your query is a bit confusing for me. First you are trying to find any documents where somefield.field1 does not exist. Then you combine content of must_not clause with mutually exclusive content of should clause that filters documents with non-empty somefield. Actually should clause is translated into

"should": [{
    "exists": {"field": "somefield.field1"}
}]

So your query should not match neither documents with somefield: {} nor documents with somefield: {field1: value1}.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. Have clarified question to make it clear I want to find all documents where the object is empty as opposed to the field not being present at all or the field value being an object with attributes.
@AJP Thank you for clarification. Now I understand your question. I think it is not possible to distinguish document with empty somefield from document without somefield at all. From the point of Elasticsearch they are the same. You can verify it by using Term Vectors query (GET /my_index/my_type/<id>/_termvectors?fields=*). Such differentiation should be done on the client side.
Interesting, thanks. Hadn't come across Term Vectors before. It's surprising that there's no query like match: { somefield: "{}" } as opposed to match: { somefield: undefined } (you'd actually use the must_not: { exists: { field: "somefield" }}) as the values of the field in the documents are clearly different from each other.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.