0

There is a document with timestamp as 2015-05-18T19:08:35Z. This is correctly returned in the following query. But if I increase the time interval from 5 minutes to 10 minutes the same row does not show up in the results.

curl -XGET "http://localhost:9200/enwiki_content/page/_search?&pretty=true" -d' {
  "query": {
    "range": {
      "timestamp": {
        "gte": "2015-05-18T19:04:00Z",
        "lte": "2015-05-18T19:09:00Z"
      }
    }
}  } ' > out2.txt



curl -XGET "http://localhost:9200/enwiki_content/page/_search?&pretty=true" -d' {
  "query": {
    "range": {
      "timestamp": {
        "gte": "2015-05-18T19:01:00Z",
        "lte": "2015-05-18T19:09:00Z"
      }
    }
}  } ' > out2.txt

The second query above should return all the rows from first query + any other matching rows. Right? (This is wikipedia 5 million data (more than 100 GB) if that matters.)

8
  • Can you make sure you use the correct timezone, i.e. Z and not z? Probably won't change much, but never know. Commented Dec 3, 2015 at 7:06
  • Tried. that did not make any difference. It seems both "Z" and "z" are the same. Commented Dec 3, 2015 at 7:12
  • mapping is available here... en.wikipedia.org/w/… Commented Dec 3, 2015 at 7:14
  • Interesting... Can you switch to POST instead of GET (good practice when sending payload)? Commented Dec 3, 2015 at 7:15
  • 1
    I think the reason might be because the second query returns more results and you're only getting the first 10 (by default). Try increasing the size of the query and you should get the expected record. Commented Dec 3, 2015 at 7:19

1 Answer 1

1

The reason is because the second query probably returns more results and you're only getting the first 10 (by default).

Try increasing the size of the second query and you should get the expected record.

curl -XGET "http://localhost:9200/enwiki_content/page/_search?&pretty=true" -d' {
  "size": 100,              <--- add this line
  "query": {
    "range": {
      "timestamp": {
        "gte": "2015-05-18T19:01:00z",
        "lte": "2015-05-18T19:09:00z"
      }
    }
}  } ' > out2.txt

or

                                                             add size
                                                                |
                                                                V
curl -XGET "http://localhost:9200/enwiki_content/page/_search?size=100&pretty=true" -d' {
  "query": {
    "range": {
      "timestamp": {
        "gte": "2015-05-18T19:01:00z",
        "lte": "2015-05-18T19:09:00z"
      }
    }
}  } ' > out2.txt
Sign up to request clarification or add additional context in comments.

2 Comments

What is the max size allowed? In other words how much data can I return from elastic search?
As much as you want, but the logical limit will depend on many factors, such as your RAM (to contain the results), CPU (to process them), network (to send them). When returning results, it makes more sense to use paging with from/size and only request data if it is really needed/requested by the user.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.