This product is not supported for your selected
Datadog site. (
).
With Data Streams Monitoring’s Kafka Monitoring, a Datadog Agent check connects to your Kafka cluster and starts collecting health and performance metrics. Kafka Monitoring allows you to:
- Monitor Kafka health: See cluster, broker, topic, and partition health with throughput, lag, and replication metrics
- Pinpoint root cause: Correlate configuration and schema changes with lag, throughput, and errors, and trace issues to the exact topic, schema version, or configuration change
- Connect services to topics: See which producers and consumers interact with each topic, with linked owners, repos, on-call rotations, traces, and error logs
- Inspect topic schemas and messages: View schemas, compare versions, and access messages to debug poison payloads or explore the topic
- Alert and automate responses: Use recommended monitor templates and trigger Workflow Automation or webhooks when a Kafka condition fires
To get started, see Kafka Monitoring Setup.
Workflows
The Clusters, Topics, and Brokers tabs display health status across your entire Kafka infrastructure. For each topic, you can see partition count, under-replicated and offline partitions, message throughput, and consumer lag.
Click into any topic to see a detailed summary, including incoming message rate, maximum lag across all partitions, and whether current lag is approaching the retention limit.
From any metric, you can create Datadog monitors, SLOs, and dashboards.
Correlate configuration and schema changes with health metrics
Change events are overlaid directly on throughput and lag graphs, so you can see whether a configuration or schema change coincided with a degradation.
To identify exactly what changed, click on detected changes on the overlay and select View config change.
Connect producer and consumer services to topics
The Producers and Consumers sections of each topic show which services are reading from and writing to that topic. Hovering over a service shows ownership information from the Service Catalog: team, code repository, on-call engineer, and Slack channel.
Use this information to contact the right team when a consumer is lagging or a producer is misbehaving.
Inspect topic schemas and messages
The Schema section shows the current schema for a topic’s key or value, with version history. Use the version selector to compare schemas across versions.
The Messages section lets you retrieve messages by partition and offset to inspect payloads directly. This is useful for debugging poison payloads or verifying message structure after a schema change. See Enable message inspection for the additional prerequisites and permissions required to retrieve messages.