Troubleshoot Ingest pipelines
Elastic Stack ECE ECK Elastic Cloud Hosted Self Managed
Elasticsearch Ingest Pipelines allow you to transform data during ingest. Per write model, they run from ingest
node roles under the write
thread pool.
You can edit ingest pipelines under Kibana's Stack Management > Ingest Pipelines or from Elasticsearch's Modify Pipeline API. They store under Elasticsearch's cluster state as accessed from List Pipelines.
Ingest pipelines can be Simulated during testing, but after go-live are triggered during event ingest from
- The query parameter
pipeline
flag the create doc or update doc or bulk modify docs API request. - The ingest target's backing index setting for
index.default_pipeline
and/orindex.final_pipeline
. - An Ingest Pipeline may sub-call another as a pipelinen processor.
You might notice an Ingest Pipeline is not running as performant as possible under load testing from one of the following symptoms.
While running, if Ingest Pipelines cause high CPU usage, their logger org.elasticsearch.ingest.Pipeline
will show under Node Hot Threads. An example output might prefix like this:
::: {instance-0000000001}{XXXXX}{XXXXX}{instance-0000000001}{XXXXX}{XXXXX}{hirst}{9.0.0}{XXXXX}{XXXXX}
Hot threads at XXXXX, interval=500ms, busiestThreads=10000, ignoreIdleThreads=true:
100.0% [cpu=98.9%, other=1.1%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000001][write][T#8]'
10/10 snapshots sharing following 93 elements
app/[email protected]/org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:145)
app/[email protected]/org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:129)
The most common sub-processor to flag is a Script processor running Painless as noted by logger org.elasticsearch.painless
.
A non-performant Ingest Pipeline may cause a Task Queue Backlog potentially with Hot Spotting. Once sufficiently severe it would start causing Rejected Requests.
There can be multiple timestamps associated with a single data event. By default in Elastic-built Integrations, every event records the following:
@timestamp
for when an event originated.event.created
for when an event first reached an Elastic product.event.ingested
for when an event finished processing through an Elasticsearch Ingest Pipeline.
Kibana Data Views default to @timestamp
to fit most user's default expectations. While troubleshooting ingestion lag, we recommend creating a temporary Data View based on event.ingested
.
This potential timing difference is why Security Detection Rules allow for setting a "Timestamp override" which defaults to event.ingested
when enabled.
For user-managed Ingest Pipelines, you can create similar functionality by adding Set Processor to the end of your Ingest Pipeline. As a simplified example if event.ingested
is not already being set upstream in your data pipeline:
PUT _ingest/pipeline/my-pipeline
{
"processors": [
{
"set": {
"description": "Log the event's ingestion timestamp as 'event.ingested'",
"field": "event.ingested",
"value": "{{{_ingest.timestamp}}}"
}
}
]
}
This will update every time the event is modifed when passing through pipeline. If you need to troubleshoot the original event creation only and the field event.created
is not already being set upstream in your data pipeline you might consider adding this processor example at the beginning of the first Ingest Pipeline:
PUT _ingest/pipeline/my-pipeline
{
"processors": [
{
"set": {
"description": "Log the event's first ingest timestamp as 'event.created', does not modify for updates.",
"field": "event.created",
"value": "{{{_ingest.timestamp}}}",
"override": false
}
}
]
}
When Ingest Pipelines run, they log their cumulative statistics to List Node Statistics.
GET _nodes/stats/ingest
This reports the overall Ingest Pipelines's statistics as well as statistics for each of its processors. If a pipeline calls a sub-pipeline the parent's statistics will record the total time and not subtract time spent waiting on the other.
Storing this output into nodes_stats.json
and then using third-party tool JQ to parse through this JSON, some common views reported during troubleshooting include:
Ingest Pipeline's processor has more than 10% failed events
$ cat nodes_stats.json | jq -c '.nodes[]|.name as $node| .ingest.pipelines|to_entries[]| .key as $pipeline| .value.processors[] | to_entries[]|.key as $process| { pipeline:$pipeline, process:$process, node:$node, total:.value.stats.count, failed:.value.stats.failed, failed_percent:(try (100*.value.stats.failed/.value.stats.count|round) catch 0)}| select(.total>0 and .failed_percent>10)'
Especially where
ignore_failure
is enabled, this might reflect an incomplete setup or wasted processing time.Ingest Pipeline's time per event is above 60 milliseconds or 1 second
$ cat nodes_stats.json | jq -rc '.nodes[]|.name as $n| .ingest.pipelines|to_entries[]| .key as $pi|.value| { name:$n, pipeline:$pi, total_count:.count, total_millis:.time_in_millis, total_failed:.failed }| select(.total_count>0)|.+{time_per:(.total_millis/.total_count|round)}|select(.time_per>60)'
Ingest Pipeline processor's time per event if it's processed at least 1 event and takes more than 0 milliseconds after rounding
$ cat nodes_stats.json | jq -rc '.nodes[]|.name as $n| .ingest.pipelines|to_entries[]| .key as $pi|.value.processors[]| to_entries[]|.key as $pr| .value.stats|{ name:$n, pipeline:$pi, processor:$pr, total_count:.count, total_millis:.time_in_millis, total_failed:.failed }|select(.total_count>0)|.+{time_per:(.total_millis/.total_count|round)}|select(.time_per>0)'
Elasticsearch's Ingest Pipeline processors don't have associated id
like Logstash's Pipelines to distinguish them so these emit in sequential order as seen in pipeline's definition.
The statistics report per node since its uptime, so will reset with node restarts. As a rough heuristic, you could look at an individual node's output knowing proportions will usually be about equal.