Troubleshoot Ingest pipelines

Elastic Stack ECE ECK Elastic Cloud Hosted Self Managed

Elasticsearch Ingest Pipelines allow you to transform data during ingest. Per write model, they run from ingest node roles under the write thread pool.

You can edit ingest pipelines under Kibana's Stack Management > Ingest Pipelines or from Elasticsearch's Modify Pipeline API. They store under Elasticsearch's cluster state as accessed from List Pipelines.

Ingest pipelines can be Simulated during testing, but after go-live are triggered during event ingest from

The query parameter pipeline flag the create doc or update doc or bulk modify docs API request.
The ingest target's backing index setting for index.default_pipeline and/or index.final_pipeline.
An Ingest Pipeline may sub-call another as a pipelinen processor.

Symptoms

You might notice an Ingest Pipeline is not running as performant as possible under load testing from one of the following symptoms.

High CPU usage

While running, if Ingest Pipelines cause high CPU usage, their logger org.elasticsearch.ingest.Pipeline will show under Node Hot Threads. An example output might prefix like this:

::: {instance-0000000001}{XXXXX}{XXXXX}{instance-0000000001}{XXXXX}{XXXXX}{hirst}{9.0.0}{XXXXX}{XXXXX}
   Hot threads at XXXXX, interval=500ms, busiestThreads=10000, ignoreIdleThreads=true:

   100.0% [cpu=98.9%, other=1.1%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000001][write][T#8]'

     10/10 snapshots sharing following 93 elements
       app/[email protected]/org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:145)
       app/[email protected]/org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:129)

The most common sub-processor to flag is a Script processor running Painless as noted by logger org.elasticsearch.painless.

Task Queue

A non-performant Ingest Pipeline may cause a Task Queue Backlog potentially with Hot Spotting. Once sufficiently severe it would start causing Rejected Requests.

Delayed timestamps

There can be multiple timestamps associated with a single data event. By default in Elastic-built Integrations, every event records the following:

@timestamp for when an event originated.
event.created for when an event first reached an Elastic product.
event.ingested for when an event finished processing through an Elasticsearch Ingest Pipeline.

Kibana Data Views default to @timestamp to fit most user's default expectations. While troubleshooting ingestion lag, we recommend creating a temporary Data View based on event.ingested.

This potential timing difference is why Security Detection Rules allow for setting a "Timestamp override" which defaults to event.ingested when enabled.

For user-managed Ingest Pipelines, you can create similar functionality by adding Set Processor to the end of your Ingest Pipeline. As a simplified example if event.ingested is not already being set upstream in your data pipeline:

		 PUT _ingest/pipeline/my-pipeline {
  "processors": [
    {
      "set": {
        "description": "Log the event's ingestion timestamp as 'event.ingested'",
        "field": "event.ingested",
        "value": "{{{_ingest.timestamp}}}"
      }
    }
  ]
}

	

This will update every time the event is modifed when passing through pipeline. If you need to troubleshoot the original event creation only and the field event.created is not already being set upstream in your data pipeline you might consider adding this processor example at the beginning of the first Ingest Pipeline:

		 PUT _ingest/pipeline/my-pipeline {
  "processors": [
    {
      "set": {
        "description": "Log the event's first ingest timestamp as 'event.created', does not modify for updates.",
        "field": "event.created",
        "value": "{{{_ingest.timestamp}}}",
        "override": false
      }
    }
  ]
}

	

Metrics

When Ingest Pipelines run, they log their cumulative statistics to List Node Statistics.

 GET _nodes/stats/ingest

This reports the overall Ingest Pipelines's statistics as well as statistics for each of its processors. If a pipeline calls a sub-pipeline the parent's statistics will record the total time and not subtract time spent waiting on the other.

Storing this output into nodes_stats.json and then using third-party tool JQ to parse through this JSON, some common views reported during troubleshooting include:

Ingest Pipeline's processor has more than 10% failed events

		$ cat nodes_stats.json | jq -c '.nodes[]|.name as $node| .ingest.pipelines|to_entries[]| .key as $pipeline| .value.processors[] | to_entries[]|.key as $process| { pipeline:$pipeline, process:$process, node:$node, total:.value.stats.count, failed:.value.stats.failed, failed_percent:(try (100*.value.stats.failed/.value.stats.count|round) catch 0)}| select(.total>0 and .failed_percent>10)'

	

Especially where ignore_failure is enabled, this might reflect an incomplete setup or wasted processing time.

Ingest Pipeline's time per event is above 60 milliseconds or 1 second

		$ cat nodes_stats.json | jq -rc '.nodes[]|.name as $n| .ingest.pipelines|to_entries[]| .key as $pi|.value| { name:$n, pipeline:$pi, total_count:.count, total_millis:.time_in_millis, total_failed:.failed }| select(.total_count>0)|.+{time_per:(.total_millis/.total_count|round)}|select(.time_per>60)'

	

Ingest Pipeline processor's time per event if it's processed at least 1 event and takes more than 0 milliseconds after rounding

		$ cat nodes_stats.json | jq -rc '.nodes[]|.name as $n| .ingest.pipelines|to_entries[]| .key as $pi|.value.processors[]| to_entries[]|.key as $pr| .value.stats|{ name:$n, pipeline:$pi, processor:$pr, total_count:.count, total_millis:.time_in_millis, total_failed:.failed }|select(.total_count>0)|.+{time_per:(.total_millis/.total_count|round)}|select(.time_per>0)'

	

Elasticsearch's Ingest Pipeline processors don't have associated id like Logstash's Pipelines to distinguish them so these emit in sequential order as seen in pipeline's definition.

The statistics report per node since its uptime, so will reset with node restarts. As a rough heuristic, you could look at an individual node's output knowing proportions will usually be about equal.