You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This documentation page is a work in progress for an upcoming feature in Powertools for AWS Lambda. If you're seeing this page, it means the release process is underway, but the feature is not yet available on npm. Please check back soon for the final version.
9
-
10
7
The Kafka Consumer utility transparently handles message deserialization, provides an intuitive developer experience, and integrates seamlessly with the rest of the Powertools for AWS Lambda ecosystem.
11
8
12
9
```mermaid
@@ -72,15 +69,15 @@ Depending on the schema types you want to use, install the library and the corre
72
69
73
70
Additionally, if you want to use output parsing with [Standard Schema](https://github.com/standard-schema/standard-schema), you can install [any of the supported libraries](https://standardschema.dev/#what-schema-libraries-implement-the-spec), for example: Zod, Valibot, or ArkType.
74
71
75
-
### Required resources
72
+
<!--### Required resources
76
73
77
74
To use the Kafka consumer utility, you need an AWS Lambda function configured with a Kafka event source. This can be Amazon MSK, MSK Serverless, or a self-hosted Kafka cluster.
For debugging purposes, you can also access the original key, value, and headers in their base64-encoded form, these are available in the `originalValue`, `originalKey`, and `originalHeaders` properties of the `record`.
|`topic`| Topic name the record was published to | Routing logic in multi-topic consumers |
208
+
|`partition`| Kafka partition number | Tracking message distribution |
209
+
|`offset`| Position in the partition | De-duplication, exactly-once processing |
210
+
|`timestamp`| Unix timestamp when record was created | Event timing analysis |
211
+
|`timestamp_type`| Timestamp type (`CREATE_TIME` or `LOG_APPEND_TIME`) | Data lineage verification |
212
+
|`headers`| Key-value pairs attached to the message | Cross-cutting concerns like correlation IDs |
213
+
|`key`| Deserialized message key | Customer ID or entity identifier |
214
+
|`value`| Deserialized message content | The actual business data |
215
+
|`originalValue`| Base64-encoded original message value | Debugging or custom deserialization |
216
+
|`originalKey`| Base64-encoded original message key | Debugging or custom deserialization |
217
+
|`originalHeaders`| Base64-encoded original message headers | Debugging or custom deserialization |
218
+
|`valueSchemaMetadata`| Metadata about the value schema like `schemaId` and `dataFormat`| Used by `kafkaConsumer` to process Protobuf, data format validation |
219
+
|`keySchemaMetadata`| Metadata about the key schema like `schemaId` and `dataFormat`| Used by `kafkaConsumer` to process Protobuf, data format validation |
220
+
221
+
### Additional Parsing
223
222
224
223
You can parse deserialized data using your preferred parsing library. This can help you integrate Kafka data with your domain schemas and application architecture, providing type hints, runtime parsing and validation, and advanced data transformations.
Handle errors gracefully when processing Kafka messages to ensure your application maintains resilience and provides clear diagnostic information. The Kafka consumer utility provides specific exception types to help you identify and handle deserialization issues effectively.
246
+
247
+
!!! tip
248
+
Fields like `value`, `key`, and `headers` are decoded lazily, meaning they are only deserialized when accessed. This allows you to handle deserialization errors at the point of access rather than when the record is first processed.
245
249
246
-
| Exception | Description | Common Causes |
247
-
|-----------|-------------|---------------|
248
-
|`KafkaConsumerDeserializationError`| Raised when message deserialization fails | Corrupted message data, schema mismatch, or wrong schema type configuration |
249
-
|`KafkaConsumerAvroSchemaParserError`| Raised when parsing Avro schema definition fails | Syntax errors in schema JSON, invalid field types, or malformed schema |
250
-
|`KafkaConsumerMissingSchemaError`| Raised when a required schema is not provided | Missing schema for AVRO or PROTOBUF formats (required parameter) |
251
-
|`KafkaConsumerOutputSerializerError`| Raised when output serializer fails | Error in custom serializer function, incompatible data, or validation failures in Pydantic models |
1. If you want to handle deserialization and parsing errors, you should destructure or access the `value`, `key`, or `headers` properties of the record within the `for...of` loop.
1. The `cause` property of the error is populated with the original Standard Schema parsing error, allowing you to access detailed information about the parsing failure.
|`KafkaConsumerError`. | Base class for all Kafka consumer errors | General unhandled errors |
271
+
|`KafkaConsumerDeserializationError`| Thrown when message deserialization fails | Corrupted message data, schema mismatch, or wrong schema type configuration |
272
+
|`KafkaConsumerMissingSchemaError`| Thrown when a required schema is not provided | Missing schema for AVRO or PROTOBUF formats (required parameter) |
273
+
|`KafkaConsumerOutputSerializerError`| Thrown when additional schema parsing fails | Parsing failures in Standard Schema models |
252
274
253
275
### Integrating with Idempotency
254
276
255
-
When processing Kafka messages in Lambda, failed batches can result in message reprocessing. The idempotency utility prevents duplicate processing by tracking which messages have already been handled, ensuring each message is processed exactly once.
277
+
When processing Kafka messages in Lambda, failed batches can result in message reprocessing. The [Idempotency utility](./idempotency.md) prevents duplicate processing by tracking which messages have already been handled, ensuring each message is processed exactly once.
256
278
257
279
The Idempotency utility automatically stores the result of each successful operation, returning the cached result if the same message is processed again, which prevents potentially harmful duplicate operations like double-charging customers or double-counting metrics.
258
280
281
+
!!! tip
282
+
By using the Kafka record's unique coordinates (topic, partition, offset) as the idempotency key, you ensure that even if a batch fails and Lambda retries the messages, each message will be processed exactly once.
TIP: By using the Kafka record's unique coordinates (topic, partition, offset) as the idempotency key, you ensure that even if a batch fails and Lambda retries the messages, each message will be processed exactly once.
290
+
### Best practices
291
+
292
+
#### Handling large messages
293
+
294
+
When processing large Kafka messages in Lambda, be mindful of memory limitations. Although the Kafka consumer utility optimizes memory usage, large deserialized messages can still exhaust the function resources.
For large messages, consider these proven approaches:
303
+
304
+
***Store the data:** use Amazon S3 and include only the S3 reference in your Kafka message
305
+
***Split large payloads:** use multiple smaller messages with sequence identifiers
306
+
***Increase memory:** Increase your Lambda function's memory allocation, which also increases CPU capacity
307
+
308
+
#### Batch size configuration
309
+
310
+
The number of Kafka records processed per Lambda invocation is controlled by your Event Source Mapping configuration. Properly sized batches optimize cost and performance.
* Implement chunked or asynchronous processing patterns for time-consuming operations
312
379
* Monitor and optimize database operations, external API calls, or other I/O operations in your handler
313
380
314
-
???+ tip "Monitoring memory usage"
381
+
!!! tip "Monitoring memory usage"
315
382
Use CloudWatch metrics to track your function's memory utilization. If it consistently exceeds 80% of allocated memory, consider increasing the memory allocation or optimizing your code.
316
383
317
384
## Kafka consumer workflow
@@ -342,4 +409,10 @@ For timeout issues:
342
409
343
410
## Testing your code
344
411
345
-
TBD
412
+
Testing Kafka consumer code requires simulating Lambda events with Kafka messages. You can create simple test cases using local JSON files without needing a live Kafka cluster. Below an example of how to simulate a JSON message.
0 commit comments