2

Following are a couple of sample documents in my elasticsearch index:

{
  message: "M1",
  date: "date object",
  comments: [
    {
     "msg" :"good"
     date:"date_obj1"
   },
   {
    "msg" :"bad"
     date:"date_obj2"
   },
   {
    "msg" :"ugly"
     date:"date_obj3"
   }
  ]
}

and

{
  message: "M2",
  date: "date_object5",
  comments: [
    {
     "msg" :"ugly"
     date:"date_obj7"
    },
    {
     "msg" :"pagli"
     date:"date_obj8"
    }
  ]
}

Now I need to find number of documents per day and number of comments per day. I can get the number of documents per day by using the date histogram and it gives me the correct results. I make the following aggregation query

aggs : {
    "posts_over_days" : {
        "date_histogram" : { "field" : "date", "interval": "day" }
         }
    }

But when I try similar thing to get comments per day, it returns incorrect data, (for 1500+ comments it will only return 160 odd comments). I am making the following query:

aggs : {
    "comments_over_days" : {
        "date_histogram" : { "field" : "comments.date", "interval": "day" }
         }
    }

I want to know how to get the desired result? Is there a way in elasticsearch to get what I want? Please let me know if I need to provide any other info.

Expected Output:

buckets: [
 {
   time_interval: date_objectA,
   doc_count: x
 },
  {
   time_interval: date_objectB,
   doc_count: y
 }
]

1 Answer 1

3

use Value Count aggregation - this will count the number of terms for the field in your document. E.g. based on your data (5 comments in 2 documents):

curl -XGET 'http://localhost:9200/myindex/mydata/_search?search_type=count&pretty' -d '{
>    "aggs" : {
>         "grades_count" : { "value_count" : { "field" : "comments.date" } }
>     }
> }'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.0,
    "hits" : [ ]
   },
   "aggregations" : {
    "grades_count" : {
     "value" : 5
    }
  }
}

Adding the Date Buckets

the Value Count aggregation can be nested inside the date buckets:

curl -XGET 'http://localhost:9200/myindex/mydata/_search?search_type=count&pretty' -d '{
  aggs : {
   "posts_over_days" : {
     "date_histogram" : { "field" : "date", "interval": "day" },
     "aggs" : {
         "grades_count" : { "value_count" : { "field" : "comments.date" } }
       }
     }
  }
}'

with results:

  "aggregations" : {
    "posts_over_days" : {
     "buckets" : [ {
        "key_as_string" : "2014-11-27T00:00:00.000Z",
        "key" : 1417046400000,
        "doc_count" : 1,
        "grades_count" : {
          "value" : 2
        }
      }, {
        "key_as_string" : "2014-11-28T00:00:00.000Z",
        "key" : 1417132800000,
        "doc_count" : 1,
        "grades_count" : {
          "value" : 3
        }
      } 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.