I want to calculate median on nested field. Nested field contains of list of objects that have some attributes. I want to filter out some of them before I calculate median. For example, let's say I have 10 objects in nested field, but only 7 of 10 will be taken for calculating median.
query_median = {
"query": {
"bool": {
"filter": [
{
"term": {
"date": "2020-05-18"
}
},
{
"term": {
"group_name": "some_name"
}
}
]
}
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"median": {
"percentiles": {
"field": "people.for_median_attr",
"percents": [50]
}
}
}
}
}
}
Above query works, but it has no filters. When I want to add additional filters in aggs, it gives me the same value as in case without any filter. Below what I tried:
query_median = {
"query": {
"bool": {
"filter": [
{
"term": {
"date": "2020-05-18"
}
},
{
"term": {
"group_name": "some_name"
}
}
]
}
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"filter_out": {
"filter": {
"bool": {
"must": [
{
"term": {
"people.attr_not_wanted1": False
},
"term": {
"people.attr_not_wanted2": False
}
}
]
}
},
"aggs": {
"median": {
"percentiles": {
"field": "people.for_median_attr",
"percents": [50]
}
}
}
}
}
}
}
}
Example document:
{
"_index" : "some_index",
"_type" : "_doc",
"_id" : "some_id",
"_score" : 1.0,
"_source" : {
"date" : "2020-05-10",
"group_name" : "some_name",
"org_code" : "some_code",
"people" : [
{
"nickname" : "xxx",
"review_count" : 20.0,
"not_wanted_1" : false,
"not_wanted_2" : false
},
{
"nickname" : "yyy",
"review_count" : 18.0,
"not_wanted_1" : false,
"not_wanted_2" : false
},
{
"nickname" : "zzz",
"value_for_median" : 11.0,
"not_wanted_1" : true,
"not_wanted_2" : true
},
...
]
}
}
]
}
In this case, median is calculated only from two numbers: 20 and 18.