Skip to content

ValueError in KafkaEvent.decoded_headers when processing non-ASCII characters in headers #6862

Open
@ipt-jbu

Description

@ipt-jbu

Expected Behaviour

record.decoded_headers should be able to correctly process header values that are represented as an array of signed bytes, without raising a ValueError. The signed bytes should be correctly converted to their unsigned representation before being decoded.

For a header {"test-header": "hello-world-ë"}, which AWS might deliver as {'test-header': [104, 101, 108, 108, 111, 45, 119, 111, 114, 108, 100, -61, -85]}, the decoded_headers property should return a dictionary like:

{'test-header': 'hello-world-ë'}

Current Behaviour

When the Lambda function is invoked with a Kafka record containing a non-ASCII character in a header, the code fails with a ValueError. This happens because the header value is passed by AWS as a list of signed bytes (e.g., [-61, -85] for ë), and the bytes() function in Powertools cannot process the negative numbers.

See the full traceback in the "Debugging logs" section.

Code snippet

from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.parser import event_source
from aws_lambda_powertools.utilities.parser.models import KafkaEvent

@event_source(data_class=KafkaEvent)
def lambda_handler(event: KafkaEvent, context: LambdaContext):
    # This is the event structure we receive from AWS for a header like
    # "test-header": "hello-world-ë"
    # The header value is serialized as:
    # [-27, -122, -103, 108, 111, 45, 119, 111, 114, 108, 100, -61, -85]

    for record in event.records:
        try:
            # The following line will raise the ValueError
            decoded_headers = record.decoded_headers
            print(f"Decoded headers: {decoded_headers}")
        except ValueError as e:
            print(f"Error decoding headers: {e}")
            # Log the raw header value that causes the issue
            for header in record.headers:
                if 'test-header' in header:
                     print(f"Raw header value: {header['test-header']}")

Possible Solution

A potential fix could be to convert the signed bytes to unsigned bytes before calling bytes().

For example:

# Example of converting signed to unsigned bytes for a value like [-61, -85]
signed_bytes = [-61, -85]
unsigned_bytes = [b & 255 for b in signed_bytes] # Results in [195, 171]
bytes(unsigned_bytes).decode('utf-8') # Correctly decodes to 'ë'

Steps to Reproduce

  1. Create a Lambda function with a Kafka trigger.
  2. Use the Powertools @event_source(data_class=KafkaEvent) decorator on the handler.
  3. Type-hint the event parameter as KafkaEvent: def lambda_handler(event: KafkaEvent, context: LambdaContext):.
  4. Send a Kafka message with a header containing a non-ASCII character (e.g., test-header: hello-world-ë).
  5. Inside the handler, try to access the decoded headers of a record: headers = event.records[0].decoded_headers.

Powertools for AWS Lambda (Python) version

3.13.0

AWS Lambda function runtime

3.12

Packaging format used

PyPi

Debugging logs

ValueError: bytes must be in range(0, 256)

at kafka_event.py: 83

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

Status

Working on it

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions