When working with large-scale data pipelines, a common challenge is efficiently orchestrating high-volume API calls that have multiple layers of dependency. Recently, I encountered such a challenge where I needed to generate 409,600 JSON files from deeply nested API responses.
The Challenge
The requirements looked straightforward at first but quickly revealed significant complexity:
- Fetch 100 latest content items via an API call.
- For each content item, fetch the 16 latest comments.
- For each comment, fetch the 16 latest replies.
- For each reply, fetch the 16 latest nested replies.
- This results in a large combinatorial expansion:
100 (contents) × 16 (comments) × 16 (replies) × 16 (nested replies)
= 409,600 JSON files
Each generated JSON file represents the fully expanded content tree for one leaf node.
Why AWS Step Functions
To manage this orchestration at scale, I chose AWS Step Functions. Specifically, I leveraged the Map state, which allows parallel execution over dynamic lists without needing to manually code distributed concurrency handling.
Key benefits of using Step Functions here:
- Native parallelism → Handles thousands of executions concurrently.
- Error handling & retries → Simplifies recovery from transient API failures.
- Visibility → Provides execution history, making it easier to debug complex workflows.
- Integration → Works seamlessly with AWS Lambda for API calls and file generation.
High-Level Workflow
|-Initial API Call (Lambda): Get the 100 latest content items.
— —|-Step Functions Map State (Level 1): Iterate over the 100 content items.
— — — — -Inside this map:
— — — — — — — — — — Call API to fetch 16 comments.
— — — — — — — — — — |-Nested Map State (Level 2): Iterate over comments.
— — — — — — — — — — — — —Fetch 16 replies per comment.
— — — — — — — — — — — — —|-Nested Map State (Level 3): Iterate over replies.
— — — — — — — — — — — — — — — — -Fetch 16 nested replies per reply.
This recursive orchestration fans out into 409,600 parallel tasks at the deepest level.
Scalability Considerations
Handling such scale required careful planning:
- Map concurrency limits: Step Functions supports up to 40,000 concurrent executions per account (with quotas adjustable via AWS Support).
- API throttling: Implemented exponential backoff and batching logic in Lambda functions to avoid rate-limit errors.
- File storage: Each JSON file is stored in Amazon S3 with a structured key hierarchy for easy retrieval.
- Cost control: Since Step Functions are billed per state transition, I optimized workflows to reduce unnecessary states.
- Payload Size: AWS Step Functions has a payload size limit of 256 KB for both input and output data passed between states in a workflow execution. Instead we can save response to s3 bucket after combining results from each map iteration and pass object name as input to next stage.
Diagram
Outcomes
Using Step Functions significantly reduced the operational complexity of managing this pipeline. Instead of building custom queueing, concurrency, and retry logic, I relied on AWS-native orchestration. The final system was scalable, fault-tolerant, and easy to monitor.
Here is the stepfunction ASL
(you can use this to create your own workflow)
{
"Comment": "Content, Comments, Replies, and Reply Comments Processing Workflow",
"StartAt": "GenerateFeedJSON",
"States": {
"GenerateAnonymousFeedJSON": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${GenerateFeedLambdaArn}"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2,
"JitterStrategy": "FULL"
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "WorkflowFailed",
"ResultPath": "$.error"
}
],
"TimeoutSeconds": 300,
"Next": "Pass"
},
"WorkflowFailed": {
"Type": "Fail"
},
"Pass": {
"Type": "Pass",
"Next": "ProcessIndividualContent",
"Parameters": {
"input.$": "States.StringToJson($.Payload.body)"
},
"OutputPath": "$.input",
"Assign": {
"key.$": "$.input.key"
}
},
"ProcessIndividualContent": {
"Type": "Map",
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "DISTRIBUTED",
"ExecutionType": "STANDARD"
},
"StartAt": "GenerateIndividualContentJSON",
"States": {
"GenerateIndividualContentJSON": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${GenerateIndivcontentJsonsLambdaArn}",
"Payload.$": "$"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2,
"JitterStrategy": "FULL"
}
],
"TimeoutSeconds": 300,
"End": true
}
}
},
"ItemsPath": "$.detail",
"MaxConcurrency": 1000,
"Next": "ProcessComments",
"ResultPath": "$.processedContent",
"ItemBatcher": {
"MaxItemsPerBatch": 10
},
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "WorkflowFailed",
"ResultPath": "$.error"
}
]
},
"ProcessComments": {
"Type": "Map",
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "DISTRIBUTED",
"ExecutionType": "STANDARD"
},
"StartAt": "GenerateComments",
"States": {
"GenerateComments": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${GenerateContentCommentJsonsLambdaArn}",
"Payload.$": "$"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2,
"JitterStrategy": "FULL"
}
],
"TimeoutSeconds": 300,
"End": true,
"OutputPath": "$.Payload.body"
}
}
},
"ItemsPath": "$.detail",
"MaxConcurrency": 1000,
"ResultPath": "$",
"Next": "IsReplyLevelCountReached",
"ItemBatcher": {
"MaxItemsPerBatch": 10
},
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "WorkflowFailed",
"ResultPath": "$.error"
}
],
"ResultWriter": {
"Resource": "arn:aws:states:::s3:putObject",
"Parameters": {
"Bucket": "${content_sfn_artifact_bucket}",
"Prefix": "ProcessComments/"
},
"WriterConfig": {
"OutputType": "JSON",
"Transformation": "FLATTEN"
}
},
"ResultSelector": {
"output": {
"detail": {
"fileKey.$": "States.Format('ProcessComments/{}/SUCCEEDED_0.json', States.ArrayGetItem(States.StringSplit($.ResultWriterDetails.Key,'/'), 1))"
}
},
"counter": 0
}
},
"IsReplyLevelCountReached": {
"Type": "Choice",
"Choices": [
{
"Next": "consolidateoutputs",
"Variable": "$.counter",
"NumericLessThan": 2
}
],
"Default": "CopyToLatestJson"
},
"consolidateoutputs": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${ConsolidateOutputsLambdaArn}",
"Payload.$": "$"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2,
"JitterStrategy": "FULL"
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "WorkflowFailed",
"ResultPath": "$.error"
}
],
"TimeoutSeconds": 300,
"Next": "ProcessReplies",
"ResultSelector": {
"input.$": "States.StringToJson($.Payload.body)"
},
"OutputPath": "$.input"
},
"ProcessReplies": {
"Type": "Map",
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "DISTRIBUTED",
"ExecutionType": "EXPRESS"
},
"StartAt": "GenerateReplies",
"States": {
"GenerateReplies": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${GeneratCommentRepliesJsonsLambdaArn}",
"Payload.$": "$"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2,
"JitterStrategy": "FULL"
}
],
"TimeoutSeconds": 300,
"End": true,
"OutputPath": "$.Payload.body"
}
}
},
"MaxConcurrency": 500,
"ResultPath": "$.output",
"Next": "IsReplyLevelCountReached",
"ItemReader": {
"Resource": "arn:aws:states:::s3:getObject",
"ReaderConfig": {
"InputType": "JSON"
},
"Parameters": {
"Bucket.$": "$.bucketName",
"Key.$": "$.fileName"
}
},
"ItemBatcher": {
"MaxItemsPerBatch": 10
},
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "WorkflowFailed",
"ResultPath": "$.error"
}
],
"ResultWriter": {
"Resource": "arn:aws:states:::s3:putObject",
"Parameters": {
"Bucket": "${content_sfn_artifact_bucket}",
"Prefix": "ProcessReplies/"
},
"WriterConfig": {
"OutputType": "JSON",
"Transformation": "FLATTEN"
}
},
"ResultSelector": {
"detail": {
"fileKey.$": "States.Format('ProcessReplies/{}/SUCCEEDED_0.json', States.ArrayGetItem(States.StringSplit($.ResultWriterDetails.Key,'/'), 1))"
}
}
},
"CopyToLatestJson": {
"Type": "Task",
"Parameters": {
"Bucket": "${content_s3_bucket}",
"CopySource.$": "States.Format('${content_s3_bucket}/{}', $key)",
"Key": "feed/anonymous/latest.json"
},
"Resource": "arn:aws:states:::aws-sdk:s3:copyObject",
"Retry": [
{
"ErrorEquals": [
"S3.ServiceException",
"S3.AWSServiceException",
"S3.SdkClientException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2,
"JitterStrategy": "FULL"
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "WorkflowFailed",
"ResultPath": "$.error"
}
],
"Next": "ListObjectVersions"
},
"ListObjectVersions": {
"Type": "Task",
"Parameters": {
"Bucket": "${content_s3_bucket}",
"Prefix": "feed/anonymous/latest.json",
"MaxKeys": 2
},
"Resource": "arn:aws:states:::aws-sdk:s3:listObjectVersions",
"ResultSelector": {
"OldVersionId.$": "$.Versions[1].VersionId",
"LatestVersionId.$": "$.Versions[0].VersionId"
},
"Next": "CheckIfPreviousVersionExists",
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2,
"IntervalSeconds": 1,
"MaxAttempts": 3,
"JitterStrategy": "FULL"
}
],
"TimeoutSeconds": 30,
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.error",
"Next": "Fail"
}
]
},
"Fail": {
"Type": "Fail"
},
"CheckIfPreviousVersionExists": {
"Type": "Choice",
"Choices": [
{
"Next": "CopyOldVersion",
"Variable": "$.OldVersionId",
"IsPresent": true
}
],
"Default": "Success"
},
"CopyOldVersion": {
"Type": "Task",
"Parameters": {
"Bucket": "${content_s3_bucket}",
"CopySource.$": "States.Format('${content_s3_bucket}/feed/anonymous/latest.json?versionId={}', $.OldVersionId)",
"Key": "feed/anonymous/latest-previous.json",
"MetadataDirective": "COPY",
"TaggingDirective": "COPY"
},
"Resource": "arn:aws:states:::aws-sdk:s3:copyObject",
"Next": "HeadObject",
"ResultSelector": {
"ArchivedVersionId.$": "$.VersionId"
},
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2,
"IntervalSeconds": 1,
"MaxAttempts": 3,
"JitterStrategy": "FULL"
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Fail",
"ResultPath": "$.error"
}
],
"TimeoutSeconds": 30,
"ResultPath": "$.ArchivedVersion"
},
"HeadObject": {
"Type": "Task",
"Parameters": {
"Bucket": "${content_s3_bucket}",
"Key": "feed/anonymous/latest-previous.json",
"VersionId.$": "$.ArchivedVersion.ArchivedVersionId"
},
"Resource": "arn:aws:states:::aws-sdk:s3:headObject",
"Next": "DeleteObject",
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2,
"IntervalSeconds": 1,
"MaxAttempts": 3,
"JitterStrategy": "FULL"
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Fail",
"ResultPath": "$.error"
}
],
"TimeoutSeconds": 30,
"ResultPath": "$.verificationresults"
},
"DeleteObject": {
"Type": "Task",
"Parameters": {
"Bucket": "${content_s3_bucket}",
"Key": "feed/anonymous/latest.json",
"VersionId.$": "$.OldVersionId"
},
"Resource": "arn:aws:states:::aws-sdk:s3:deleteObject",
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2,
"IntervalSeconds": 1,
"MaxAttempts": 3,
"JitterStrategy": "FULL"
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Fail",
"ResultPath": "$.error"
}
],
"TimeoutSeconds": 30,
"Next": "ValidateAnonFeed"
},
"ValidateAnonFeed": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${validate_anonymous_feed}"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2,
"JitterStrategy": "FULL"
}
],
"End": true
},
"Success": {
"Type": "Succeed"
}
}
}
Lambda Functions in the Workflow
The workflow relies on multiple AWS Lambda functions, each designed with a single responsibility to keep the architecture modular and maintainable.
GenerateFeedLambda
Purpose: Calls the API to retrieve the latest 100 content items.
Input: Triggered at the start of the workflow.
Output: A list of content IDs passed into the first Map state.GenerateIndivcontentJsonsLambda
Purpose: Generate json for each of 100 items and save in s3 bucket. Output latest 100 content items to next stage.
Input: Single content ID.
Output: A list of content IDs passed into the first Map state.
- GenerateContentCommentJsonsLambda
Purpose: Given a content ID, calls the API to fetch the 16 latest comments for that content and save each in s3.
Input: Single content ID.
Output: List of comment IDs saved in s3. pass object name for the next processing stage.
- Replies Fetcher Lambda (Level 1/Level 2)
Purpose: Given a comment ID, fetches the 16 latest replies and save each in s3. In our case same API can be used to generate replies and nested replies. So same lambda is used in step function.
Input: Single comment ID.
Output: List of reply IDs saved in s3. pass object name for the next nested Map state.
- Consolidateoutput Lambda
Purpose: output from previous stage saved in s3 is not a list. we need a list type object to pass to map stage. this lambda does that conversion.
Input: s3 object name (output from previous stage saved in s3)
Output: JSON object written to S3 with a structured
Performance metrics
⚡ By distributing work across nested Map states, the pipeline scaled horizontally without requiring manual queue or concurrency management. (duration ~30s to generate all)
Step Functions vs Manual Orchestration
Manual Orchestration (Queues + Custom Logic)
- Must design and manage worker pools manually
- Requires custom retry/backoff logic
- Logs spread across services, harder to trace
- Pay for compute + queue infrastructure
- Higher engineering effort, more boilerplate
AWS Step Functions (Map State)
- Automatically scales with Map state parallelism
- Built-in retries, catchers, error paths
- Centralized execution history and visualization
- Pay per state transition + Lambda usage
- Faster implementation with declarative workflow
- Simplified workflow as state machine JSON
This demonstrated the power of combining AWS Step Functions with Lambda to handle large-scale, highly nested workloads. By leveraging the Map state effectively, I was able to orchestrate large amount of dependent API calls and generate 409,600 JSON files — all while keeping the system resilient, observable, and cost-efficient.
Top comments (0)