wpessers

Posted on Jun 22

Exporting AWS Lambda Logs - Cheaper and Easier than ever

#aws #webdev #cloud #programming

Previously, to get your AWS Lambda logs to your external observability platform of choice you would probably use a setup involving log subscriptions and some sort of forwarder lambda function. Like in the case of grafana, with the lambda-promtail solution.

Although straightforward at first, I had two issues with this setup.

Cost: Especially when you have a lot of lambda functions producing logs. You are paying for ingestion and storing in CloudWatch, and for each lambda invocation through the log subscription as well. The first point can be somewhat mitigated by setting log retention to a shorter duration, but the bulk of CloudWatch costs will come from the ingestion as that is much more expensive than storage.
Ease of use: You need to configure the promtail lambda to deliver logs to your observability backend. The delivery process will become quite a bit easier to setup with the solution proposed below.

The solution

On May 1st, AWS introduced volume-based tiered pricing along with new logging destinations for AWS Lambda logs!

In this I saw an opportunity to solve my lambda logging issues using AWS Firehose as logging destination.

Implementation

To find out exactly what I needed, I created a lambda function with firehose as logging destination and then went into CloudTrail to look at all the resources that were created behind the scenes by AWS. I then recreated the necessary infra in terraform. In the example outlined below, I will be forwarding logs to grafana cloud.

But with some minor tweaks, this setup can be easily adapted to forward logs to other vendors like Datadog, Honeycomb, New Relic,... Or even to your self-hosted observability platform.

Prerequisites

If you want to follow along for grafana cloud, you will need a couple of things before we can get started. More specifically you will need the instance id of your grafana cloud hosted loki instance, its url and a token with loki write permissions.

Both the instance id and the url can be found in the loki data source settings. Although counter-intuitive, instance id can be found as the value of the "User" setting within the authentication section. Both highlighted in blue here:

For the access token you'll have to do some more work. In your grafana cloud portal, navigate to access policies. You should be able to create a new access policy. Pick the correct realms to apply this to depending on your specific setup. And check the box for logs - write permissions. The next step is to add a token to that policy, give it a descriptive name and an expiration. Your access token will now be shown, save the value to use it later.

So with all of this noted down, we can now move on to the infrastructure. Some things may be skimped over, the entire setup can be found in my GitHub repo: https://github.com/wpessers/lambda-firehose-loki

Base Setup

For security reasons, I manually created 2 secrets for the instance id and the access token. I stored both as plaintext in separate secrets, because I find that easy to work with.

Then I added a variable to my variables.tf file with the url for our hosted logs instance:

variable "grafana_cloud_loki_endpoint" {
  type    = string
  default = "https://aws-logs-prod-012.grafana.net/aws-logs/api/v1/push"
}

Note the aws-logs/api/v1/push path should be appended to our url, this endpoint can be used to directly send (aws source) log entries to Loki.

Kinesis Firehose

Let's continue with the kinesis firehose infra that is needed to stream our logs to our observability backend of choice.

The first resource we'll need should be no surprise, a Kinesis Firehose Delivery Stream!

resource "aws_kinesis_firehose_delivery_stream" "logs_stream" {
  name        = "firehose-stream-logs-to-grafana-cloud"
  destination = "http_endpoint"

  http_endpoint_configuration {
    name = "Grafana Cloud Loki"
    url  = var.grafana_cloud_loki_endpoint
    access_key = format(
      "%s:%s",
      data.aws_secretsmanager_secret_version.loki_instance_id.secret_string,
      data.aws_secretsmanager_secret_version.loki_write_token.secret_string
    )

    role_arn       = aws_iam_role.s3_backup.arn
    s3_backup_mode = "FailedDataOnly"

    s3_configuration {
      role_arn           = aws_iam_role.s3_backup.arn
      bucket_arn         = aws_s3_bucket.firehose_backup.arn
      compression_format = "GZIP"
    }
  }
}

We will use an http_endpoint destination, with our loki endpoint variable we created earlier as the url. And we use 2 datasources for the secrets containing our instance id and access token, which we can interpolate to build our access_key.

When working with different observability vendors, you may not have to use the http_endpoint destination type. Like is the case for Datadog.

In the code above, you will notice we also need an S3 bucket to backup our data in case firehose fails when streaming it to our destination. And of course the delivery stream also needs to have permissions to read/write from/to that bucket. Feel free to refer to the repo for those resources.

Testing Firehose

With all of that done, you may want to test the connectivity from firehose to grafana cloud in the aws console. To do this simply navigate to the firehose data stream you just created (after terraform apply of course), and you should be able to send some demo data. If that's all good, we can move on to the next step.

Lambda Function

First things first we need a Lambda Function, mine will use a zip archive for the deployment package. And I'm using the nodejs runtime.

resource "aws_lambda_function" "hello_world" {
  function_name = local.lambda_function_name
  role          = aws_iam_role.hello_world_lambda.arn

  filename         = "../dist/lambdas.zip"
  handler          = "lambdas/helloWorldLambda.handler"
  source_code_hash = filebase64sha256("../dist/lambdas.zip")

  runtime       = "nodejs22.x"
  architectures = ["arm64"]

  logging_config {
    log_format = "JSON"
    log_group  = aws_cloudwatch_log_group.export.name
  }
}

The logging config here is very important. I'm using a structured logger that outputs json logs, hence why the log_format is set to JSON. And notice that even though we want Firehose as our log destination, we still need a log group! However, this is not just a normal log group. The log group we need is of the DELIVERY class:

resource "aws_cloudwatch_log_group" "export" {
  name            = "/aws/lambda/${local.lambda_function_name}"
  log_group_class = "DELIVERY"
}

This log group class is made specifically for storing logs in Amazon S3 or Amazon Data Firehose. Delivery class log groups have a retention of 1 day and don't offer rich CloudWatch Logs capabilities such as CloudWatch Logs Insights queries.

We also need a subscription filter on this delivery log group. This is how our logs will be forwarded from the delivery log group straight to firehose:

resource "aws_cloudwatch_log_subscription_filter" "lambda_log_export" {
  name            = "${local.lambda_function_name}-filter"
  log_group_name  = aws_cloudwatch_log_group.export.name
  filter_pattern  = ""
  destination_arn = aws_kinesis_firehose_delivery_stream.logs_stream.arn
  role_arn        = aws_iam_role.logs_log_export.arn
}

The IAM role attached to this subscription filter need to be assumed by the AWS CloudWatch logs service. We can allow this with following policy:

data "aws_iam_policy_document" "logs_assume_role" {
  statement {
    actions = ["sts:AssumeRole"]
    effect  = "Allow"
    principals {
      type        = "Service"
      identifiers = ["logs.amazonaws.com"]
    }
  }
}

And lastly, the role will obviously need some firehose permissions:

data "aws_iam_policy_document" "log_export_kinesis" {
  statement {
    actions = [
      "firehose:PutRecord",
      "firehose:PutRecordBatch",
      "firehose:ListTagsForDeliveryStream"
    ]
    effect    = "Allow"
    resources = ["${aws_kinesis_firehose_delivery_stream.logs_stream.arn}"]
  }
}

Again, some shortcuts may be taken here in the code samples, to see how it all ties together take a look at the github repo.

Wrapping up

We should now receive our logs in our observability backend. In my case for grafana it looks like this:

It may take a while for your logs to show up, as firehose by default buffers the logs before delivery. You can use buffer hints to reduce the latency. As soon as one of the buffer hints' values is reached, record delivery will be triggered. By default firehose will buffer for 300 seconds or up to 5MB of data.

As you can see there's some labels we can use out of the box. If you want more detailed / custom labels I believe you can use a transformation lambda function in firehose to set those. But I have yet to experiment with those myself.

Another option if you need additional properties to query or filter your logs, is to add properties to your structured logs at the source. If you're already using OpenTelemetry and an OTEL log instrumentation library for correlating logs with traces, some of these often include a service_name attribute for example. And it should be trivial to add that or other attributes.

DEV Community