Filtered nested aggregation in ElasticSearch?

Question

I have following document list in ElasticSearch (where scores are nested):

{
    'type': 'typeA',
    'scores': [
       {'type': 'A', 'val': 45},
       {'type': 'A', 'val': 55},
       {'type': 'B', 'val': 65},
    ]
},
{
    'type': 'typeA',
    'scores': [
       {'type': 'A', 'val': 55},
       {'type': 'A', 'val': 50},
       {'type': 'A', 'val': 57},
    ]
},
{
    'type': 'typeB',
    'scores': [
       {'type': 'B', 'val': 40},
       {'type': 'A', 'val': 50},
       {'type': 'A', 'val': 60},
    ]
}

Is it possible to have a query that returns average scores per type, but only if scores.type is "A"?

Explanation (if I did it manually):

1) filter only "A" scores (simplified):

{'type': 'typeA', 'scores': [45, 55]},
{'type': 'typeA', 'scores': [55, 50, 57]},
{'type': 'typeB', 'scores': [50, 60]},

2) find AVG per document:

{'type': 'typeA', 'avg': 50}, // (45+55) / 2
{'type': 'typeA', 'avg': 54}, // (55+50+57) / 3
{'type': 'typeB', 'avg': 55}, // (50 + 60) / 2

3) Final aggregation per type:

'typeA' : 52, // (50+54) / 2
'typeB': 55, // (55) / 1

Is it possible or I should stick to client side for this?

Val · Accepted Answer · 2015-08-21 04:19:23Z

Yes, it is definitely possible to do it with a combination of terms, nested and avg aggregations, like this:

{
  "size": 0,
  "aggs": {
    "top_level_type": {                    <---- group by top-level type
      "terms": {
        "field": "type"
      },
      "aggs": {
        "nest": {
          "nested": {                      <---- "dive" your nested scores
            "path": "scores"
          },
          "aggs": {
            "type_filter": {
              "filter": {                  <---- filter only score type A
                "term": {
                  "scores.type": "A"
                }
              },
              "aggs": {
                "average": {
                  "avg": {                 <---- compute the average of the score values
                    "field": "scores.val"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

The resulting values would look like this:

{
  ...
  "aggregations" : {
    "top_level_type" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "typea",
        "doc_count" : 2,
        "nest" : {
          "doc_count" : 6,
          "type_filter" : {
            "doc_count" : 5,
            "average" : {
              "value" : 52.4
            }
          }
        }
      }, {
        "key" : "typeb",
        "doc_count" : 1,
        "nest" : {
          "doc_count" : 3,
          "type_filter" : {
            "doc_count" : 2,
            "average" : {
              "value" : 55.0
            }
          }
        }
      } ]
    }
  }
}

There's an error in computations: it computes the average across all documents: (45+55+55+50+57)/5 = 52.4, but it should compute average per scores in a single document and then an average per documents: ( (45+55) / 2 + (55+50+57) / 3) / 2 = 52.0
Ok, I see what you mean now. The thing is that with aggregations, there's no concept of "document". However, what you could do is to compute that average per type at indexing time and store it as top-level fields in your document (in the first document, you'd have typea_avg: 50 and typeb_avg: 60, etc). Then you'd only have to run an avg aggregation on those fields.

Collectives™ on Stack Overflow

Filtered nested aggregation in ElasticSearch?

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related