1

I want to calculate median on nested field. Nested field contains of list of objects that have some attributes. I want to filter out some of them before I calculate median. For example, let's say I have 10 objects in nested field, but only 7 of 10 will be taken for calculating median.

query_median = {
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "date": "2020-05-18"
                    }
                },
                {
                    "term": {
                        "group_name": "some_name"
                    }
                }
            ]
        }
    },
    "aggs": {
        "median_value": {
            "nested": {
                "path": "people"
            },
            "aggs": {
                "median": {
                    "percentiles": {
                        "field": "people.for_median_attr",
                        "percents": [50]
                    }
                }
            }
        }
    }
}

Above query works, but it has no filters. When I want to add additional filters in aggs, it gives me the same value as in case without any filter. Below what I tried:

query_median = {
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "date": "2020-05-18"
                    }
                },
                {
                    "term": {
                        "group_name": "some_name"
                    }
                }
            ]
        }
    },
    "aggs": {
        "median_value": {
            "nested": {
                "path": "people"
            },
            "aggs": {
                "filter_out": {
                    "filter": {
                        "bool": {
                            "must": [
                                {
                                    "term": {
                                        "people.attr_not_wanted1": False
                                    },
                                    "term": {
                                        "people.attr_not_wanted2": False
                                    }
                                }
                            ]
                        }
                    },
                    "aggs": {
                        "median": {
                            "percentiles": {
                                "field": "people.for_median_attr",
                                "percents": [50]
                            }
                        }
                    }
                }
            }
        }
    }
}

Example document:

{
        "_index" : "some_index",
        "_type" : "_doc",
        "_id" : "some_id",
        "_score" : 1.0,
        "_source" : {
          "date" : "2020-05-10",
          "group_name" : "some_name",
          "org_code" : "some_code",
          "people" : [
            {
              "nickname" : "xxx",
              "review_count" : 20.0,
              "not_wanted_1" : false,
              "not_wanted_2" : false
            },
            {
              "nickname" : "yyy",
              "review_count" : 18.0,
              "not_wanted_1" : false,
              "not_wanted_2" : false
            },
            {
              "nickname" : "zzz",
              "value_for_median" : 11.0,
              "not_wanted_1" : true,
              "not_wanted_2" : true
            },
            ...
          ]
        }
      }
    ]
  }

In this case, median is calculated only from two numbers: 20 and 18.

2
  • Can you show a sample document with a nested array containing some element to keep and some to leave out? Commented May 19, 2020 at 13:25
  • Val I added it in initial thread - thanks! Commented May 19, 2020 at 13:34

2 Answers 2

3

You're almost there. You're just missing a few curly braces in the nested filter and you should pick true instead of false since you want to keep the nested documents to calculate the median on them.

Your query should look like this:

{
  "query": {
     ...
  },
  "aggs": {
    "median_value": {
      "nested": {
        "path": "people"
      },
      "aggs": {
        "filter_out": {
          "filter": {
            "bool": {
              "must": [
                {
                  "term": {
                    "people.not_wanted_1": true
                  }
                },
                {
                  "term": {
                    "people.not_wanted_2": true
                  }
                }
              ]
            }
          },
          "aggs": {
            "median": {
              "percentiles": {
                "field": "people.value_for_median",
                "percents": [
                  50
                ]
              }
            }
          }
        }
      }
    }
  }
}

Results:

  "aggregations" : {
    "median_value" : {
      "doc_count" : 3,
      "filter_out" : {
        "doc_count" : 1,
        "median" : {
          "values" : {
            "50.0" : 11.0
          }
        }
      }
    }
  }
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for help. Acutally I want to exclude those with both values true. So in my case I take all objects that have not_wanted_1/not_wanted_2 = False. This query seems to show 'values': {'50.0': None}
Ok then you can just change true to false. however, note that in those elements you don't have any value_for_median field.
Actually I re-arranged your answer to my needs and it's working! Great, Thank you!
1

From the documentation on https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html , you could try updating your 'filter_out' part of the query to this :

    "filter_out" : {
      "filters" : {
        "filters" : [
          { "term" : { "people.attr_not_wanted1" : false   }},
          { "term" : { "people.attr_not_wanted2" : false }}
        ]
      }
    }

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.