Description
Elasticsearch 5.1
Following docs:
http://www.rubydoc.info/gems/elasticsearch-api/Elasticsearch/API/Actions#scroll-instance_method
When I'm trying to reproduce example 'Call the scroll
API until all the documents are returned', I notice that this call
# Call the `scroll` API until empty results are returned
while r = client.scroll(scroll_id: r['_scroll_id'], scroll: '5m') and not r['hits']['hits'].empty? do
puts "--- BATCH #{defined?($i) ? $i += 1 : $i = 1} -------------------------------------------------"
puts r['hits']['hits'].map { |d| d['_source']['title'] }.inspect
puts
end
doesn't contains the results of this initial call:
# Open the "view" of the index with the `scan` search_type
r = client.search index: 'test', search_type: 'scan', scroll: '5m', size: 10
So in the end we missing positions counting by size of initial scroll call.
Example.
If we have index ['test1', 'test2',' test3' ..... 'test100']
calling the scroll API with initial size 10 will return ['test11', 'test12', .... 'test100'] with missing first 10 results.
I have same results in elasticsearch console - first call of scroll does not include results of initial call, so seems that the scroll method works like it need.
But the question is in find_each
According docs:
Iterate effectively over models using the `find_in_batches` method.
#
# All the options are passed to `find_in_batches` and each result is yielded to the passed block.
#
# @example Print out the people's names by scrolling through the index
#
# Person.find_each { |person| puts person.name }
#
# # # GET http://localhost:9200/people/person/_search?scroll=5m&search_type=scan&size=20
# # # GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhbj...
# # Test 0
# # Test 1
# # Test 2
# # ...
# # # GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhbj...
# # Test 20
# # Test 21
# # Test 22
#
But, in fact it will return
# # Test 20
# # Test 21
# # Test 22
Think that the problem in rewriting of 'response' var in find_in_batches.