Create a datafeed
Generally available; Added in 5.4.0
Datafeeds retrieve data from Elasticsearch for analysis by an anomaly detection job.
You can associate only one datafeed with each anomaly detection job.
The datafeed contains a query that runs at a defined interval (frequency
).
If you are concerned about delayed data, you can add a delay (query_delay') at each interval.
By default, the datafeed uses the following query:
{"match_all": {"boost": 1}}`.
When Elasticsearch security features are enabled, your datafeed remembers which roles the user who created it had
at the time of creation and runs the query using those same roles. If you provide secondary authorization headers,
those credentials are used instead.
You must use Kibana, this API, or the create anomaly detection jobs API to create a datafeed. Do not add a datafeed
directly to the .ml-config
index. Do not give users write
privileges on the .ml-config
index.
Required authorization
- Index privileges:
read
- Cluster privileges:
manage_ml
Path parameters
-
A numerical character string that uniquely identifies the datafeed. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.
Query parameters
-
If true, wildcard indices expressions that resolve into no concrete indices are ignored. This includes the
_all
string or when no indices are specified. -
Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values.
Values are
all
,open
,closed
,hidden
, ornone
. -
If true, concrete, expanded, or aliased indices are ignored when frozen.
Body
Required
-
If set, the datafeed performs aggregation searches. Support for aggregations is limited and should be used only with low cardinality data.
-
A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
Controls how to deal with unavailable concrete indices (closed or missing), how wildcard expressions are expanded to actual indices (all, closed or open indices) and how to deal with wildcard expressions that resolve to no indices.
-
If a real-time datafeed has never seen any data (including during any initial training period), it automatically stops and closes the associated job after this many real-time searches return no documents. In other words, it stops after
frequency
timesmax_empty_searches
of real-time operation. If not set, a datafeed with no end time that sees no data remains started until it is explicitly stopped. By default, it is not set. -
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
Specifies scripts that evaluate custom expressions and returns script fields to the datafeed. The detector configuration objects in a job can contain functions that use these script fields.
-
The size parameter that is used in Elasticsearch searches when the datafeed does not use aggregations. The maximum value is the value of
index.max_result_window
, which is 10,000 by default.
curl \
--request PUT 'http://api.example.com/_ml/datafeeds/{datafeed_id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"indices\": [\n \"kibana_sample_data_logs\"\n ],\n \"query\": {\n \"bool\": {\n \"must\": [\n {\n \"match_all\": {}\n }\n ]\n }\n },\n \"job_id\": \"test-job\"\n}"'
{
"indices": [
"kibana_sample_data_logs"
],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"job_id": "test-job"
}