0

I am new to Elastic search and need a help with an ES query. I have an elastic search index with record as follows(sorted based on timestamp)

TimeStamp StoreID BookStoreName RackID RackName BookID BookName
2023-07-31T09:33:41Z 122r3 ABC 122r34 XYZ 122e33r abc
2023-07-31T09:32:41Z 122r3 ABCD 122r35 XYZA 1298e33r hb78
2023-07-31T09:31:41Z 122r3 ABCE 122r34 XYZ1 9086795s 8hb7
2023-07-31T09:30:41Z 122r3 ABCF 122r34 XYZ2 132lkg h kho97

In this doc for a single StoreID there can have multiple RackID and with in this combination there will be lot of data available(BookID & BookName). Looking for an ES query to find the latest record for a StoreID & RackID combination. Eg here for StoreID(122r3) & RackID(122r34) there are 3 records, need to get the latest record which is below one.

| 2023-07-31T09:33:41Z | 122r3 | ABC | 122r34 | XYZ | 122e33r | abc |

I am using Elastic search 7 & Tried Bool Must query which returns all the record. Need only the latest record based on timestamp.

Also is it possible to get the result for multiple combination also so that with a single query latest records can be fetched for all StoreID & RackID combination.

1 Answer 1

1

There are two methods to achieve the latest record.

  1. size + sort by timestamp
  2. top_hits aggregation

Here is an example:

create the index
POST _bulk
{"index": {"_index": "test_sort_by_timestamp"}}
{"TimeStamp": "2023-07-31T09:33:41Z", "StoreID": "122r3", "BookStoreName": "ABC", "RackID": "122r34", "RackName": "XYZ", "BookID": "122e33r", "BookName":     "abc"}
{"index": {"_index": "test_sort_by_timestamp"}}
{"TimeStamp": "2023-07-31T09:32:41Z", "StoreID": "122r3", "BookStoreName": "ABCD", "RackID": "122r35", "RackName": "XYZA", "BookID": "1298e33r", "BookName":    "hb78"}
{"index": {"_index": "test_sort_by_timestamp"}}
{"TimeStamp": "2023-07-31T09:31:41Z", "StoreID": "122r3", "BookStoreName": "ABCE", "RackID": "122r34", "RackName": "XYZ1", "BookID": "9086795s", "BookName":    "8hb7"}
{"index": {"_index": "test_sort_by_timestamp"}}
{"TimeStamp": "2023-07-31T09:30:41Z", "StoreID": "122r3", "BookStoreName": "ABCF", "RackID": "122r34", "RackName": "XYZ2", "BookID": "132lkgh", "BookName":     "kho97"}
size + sort by timestamp
GET test_sort_by_timestamp/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "StoreID.keyword": "122r3"
          }
        },
        {
          "term": {
            "RackID.keyword": "122r34"
          }
        }
      ]
    }
  },
  "aggs": {
    "latest_record": {
      "top_hits": {
        "size": 1,
        "sort": [
          {
            "TimeStamp": {
              "order": "desc"
            }
          }
        ]
      }
    }
  }
}
top_hits aggregation
GET test_sort_by_timestamp/_search
{
  "size": 1,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "StoreID.keyword": "122r3"
          }
        },
        {
          "term": {
            "RackID.keyword": "122r34"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "TimeStamp": {
        "order": "desc"
      }
    }
  ]
}

EDIT:

GET test_sort_by_timestamp/_search
{
  "size": 1,
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "RackID.keyword": ["122r34", "122r35"]
          }
        },
        {
          "term": {
            "StoreID.keyword": "122r3"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "TimeStamp": {
        "order": "desc"
      }
    }
  ]
}

enter image description here

  1. EDIT

GET test_sort_by_timestamp/_search
{
  "size": 0,
  "aggs": {
    "store_rack_combinations": {
      "terms": {
        "script": {
          "source": "doc['StoreID.keyword'].value + '|' + doc['RackID.keyword'].value"
        },
        "size": 10
      },
      "aggs": {
        "latest_record": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "TimeStamp": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the quick answer. Is it possible to get the latest records for multiple combinations like StoreID(122r3) & RackID(122r34) and also for StoreID(122r3) & RackID(122r35). Result as below | 2023-07-31T09:32:41Z | 122r3| ABCD | 122r35 | XYZA | 1298e33r | hb78 | | 2023-07-31T09:33:41Z | 122r3 | ABC | 122r34 | XYZ | 122e33r | abc | Idea is to get the latest records for all StoreID & RackID combination.
this gives only the single record. I am trying to get latest record from each unique combination of StoreID & RackID. After lot of search found that this can can be achieved using collapse & inner_hits but the down side of this approach is that it gives duplicate data in the inner_hits when inner_hit size is 1. Any idea? when I tried inner_hits size=0 got the below error numHits must be > 0; please use TotalHitCountCollector if you just need the total hit count"
now you cleared the request :) answer updated.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.