0

I have following document list in ElasticSearch (where scores are nested):

{
    'type': 'typeA',
    'scores': [
       {'type': 'A', 'val': 45},
       {'type': 'A', 'val': 55},
       {'type': 'B', 'val': 65},
    ]
},
{
    'type': 'typeA',
    'scores': [
       {'type': 'A', 'val': 55},
       {'type': 'A', 'val': 50},
       {'type': 'A', 'val': 57},
    ]
},
{
    'type': 'typeB',
    'scores': [
       {'type': 'B', 'val': 40},
       {'type': 'A', 'val': 50},
       {'type': 'A', 'val': 60},
    ]
}

Is it possible to have a query that returns average scores per type, but only if scores.type is "A"?

Explanation (if I did it manually):

1) filter only "A" scores (simplified):

{'type': 'typeA', 'scores': [45, 55]},
{'type': 'typeA', 'scores': [55, 50, 57]},
{'type': 'typeB', 'scores': [50, 60]},

2) find AVG per document:

{'type': 'typeA', 'avg': 50}, // (45+55) / 2
{'type': 'typeA', 'avg': 54}, // (55+50+57) / 3
{'type': 'typeB', 'avg': 55}, // (50 + 60) / 2

3) Final aggregation per type:

'typeA' : 52, // (50+54) / 2
'typeB': 55, // (55) / 1

Is it possible or I should stick to client side for this?

1 Answer 1

1

Yes, it is definitely possible to do it with a combination of terms, nested and avg aggregations, like this:

{
  "size": 0,
  "aggs": {
    "top_level_type": {                    <---- group by top-level type
      "terms": {
        "field": "type"
      },
      "aggs": {
        "nest": {
          "nested": {                      <---- "dive" your nested scores
            "path": "scores"
          },
          "aggs": {
            "type_filter": {
              "filter": {                  <---- filter only score type A
                "term": {
                  "scores.type": "A"
                }
              },
              "aggs": {
                "average": {
                  "avg": {                 <---- compute the average of the score values
                    "field": "scores.val"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

The resulting values would look like this:

{
  ...
  "aggregations" : {
    "top_level_type" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "typea",
        "doc_count" : 2,
        "nest" : {
          "doc_count" : 6,
          "type_filter" : {
            "doc_count" : 5,
            "average" : {
              "value" : 52.4
            }
          }
        }
      }, {
        "key" : "typeb",
        "doc_count" : 1,
        "nest" : {
          "doc_count" : 3,
          "type_filter" : {
            "doc_count" : 2,
            "average" : {
              "value" : 55.0
            }
          }
        }
      } ]
    }
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

There's an error in computations: it computes the average across all documents: (45+55+55+50+57)/5 = 52.4, but it should compute average per scores in a single document and then an average per documents: ( (45+55) / 2 + (55+50+57) / 3) / 2 = 52.0
Ok, I see what you mean now. The thing is that with aggregations, there's no concept of "document". However, what you could do is to compute that average per type at indexing time and store it as top-level fields in your document (in the first document, you'd have typea_avg: 50 and typeb_avg: 60, etc). Then you'd only have to run an avg aggregation on those fields.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.