1

On my elasticsearch 1.4 I used to delete documents using the DeleteByQuery API like this :

curl -XDELETE http://my_elasticsearch:9200/_all/_query?q=some_field:some_value

This wasn't perfect (because of regular OutOfMemoryError) but this works enough for my needs (at this time).

But now I use the new elasticsearch 1.5 and in the documentation I have read that :

Deprecated in 1.5.0.

"Delete by Query will be removed in 2.0: it is problematic since it silently forces a refresh which can quickly cause OutOfMemoryError during concurrent indexing, and can also cause primary and replica to become inconsistent. Instead, use the scroll/scan API to find all matching ids and then issue a bulk request to delete them..

So I would like to do the same using scroll/scan API. But how to delete using this? I don't understand how. The documentation API and documentation Java API doesn't seems complete for me (missing example of deleting).

PS: I'm looking for understand with java or curl (no matter for me in final I need the both).

1
  • Try to search "bulk API", you might find the example. Currently I am learning elasticsearch, so I don't quite sure that this is the solution. But in that page, there are some code doing delete. Commented Mar 31, 2015 at 14:29

1 Answer 1

2

I ran into this issue as well and could not find a good code example. I'll show you what I came up with. I'm not sure if this is the best way to do it, so please feel free to comment about how this could be refined. Note that I set the size of the results for the query to Integer.MAX_VALUE so that the query will return all (or as many as possible) of the results that need to be deleted.

  1. Run query to get all IDs to be deleted
  2. Add delete requests for all IDs to a bulk request
  3. Run bulk request
  4. Re-run query to see if any more records need to be deleted
  5. Repeat if necessary

    private void deleteAllByQuery(final String index, final String type, final QueryBuilder query) {
        SearchResponse response = elasticSearchClient.prepareSearch(index)
                .setTypes(type)
                .setQuery(query)
                .setSize(Integer.MAX_VALUE)
                .execute().actionGet();
    
        SearchHit[] searchHits = response.getHits().getHits();
    
        while (searchHits.length > 0) {
            LOGGER.debug("Need to delete " + searchHits.length + " records");
    
            // Create bulk request
            final BulkRequestBuilder bulkRequest = elasticSearchClient.prepareBulk().setRefresh(true);
    
            // Add search results to bulk request
            for (final SearchHit searchHit : searchHits) {
                final DeleteRequest deleteRequest = new DeleteRequest(index, type, searchHit.getId());
                bulkRequest.add(deleteRequest);
            }
    
            // Run bulk request
            final BulkResponse bulkResponse = bulkRequest.execute().actionGet();
            if (bulkResponse.hasFailures()) {
                LOGGER.error(bulkResponse.buildFailureMessage());
            }
    
            // After deleting, we should check for more records
            response = elasticSearchClient.prepareSearch(index)
                .setTypes(type)
                .setQuery(query)
                .setSize(Integer.MAX_VALUE)
                .execute().actionGet();
    
            searchHits = response.getHits().getHits();
        }
    }
    
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this looks perfect! I changed of company and don't work with ES now, but I will try this as soon as possible to validate your answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.