AWS Step Functions Distributed Map, announced at re:Invent 2022, enables processing up to 10,000 concurrent items in a Map state—compared to the previous 40-item limit. This makes Step Functions viable for large-scale ETL, data processing, and batch workflows that previously required custom orchestration or EMR. This guide covers architecture patterns, S3 integration, and cost optimization for high-volume processing.
Traditional Map vs Distributed Map
| Feature | Inline Map | Distributed Map |
|---|---|---|
| Max Concurrency | 40 | 10,000 |
| Max Items | ~40,000 | Unlimited (S3) |
| Input Source | JSON array in state | S3 objects/manifest |
| Output | JSON array | S3 (ResultWriter) |
| Billing | Per state transition | Per child execution |
Architecture Pattern: S3 Inventory Processing
flowchart TB
S3Inventory["S3 Inventory (CSV)"] --> DistMap["Distributed Map (10K concurrent)"]
subgraph ChildExecutions ["Child Executions"]
CE1["Process File 1"]
CE2["Process File 2"]
CE3["..."]
CE10000["Process File 10000"]
end
DistMap --> ChildExecutions
ChildExecutions --> ResultWriter["ResultWriter (S3)"]
ResultWriter --> S3Output["S3 Output Bucket"]
style DistMap fill:#FFF3E0,stroke:#E65100
State Machine Definition
{
"StartAt": "ProcessS3Objects",
"States": {
"ProcessS3Objects": {
"Type": "Map",
"ItemReader": {
"Resource": "arn:aws:states:::s3:getObject",
"ReaderConfig": {
"InputType": "CSV",
"CSVHeaderLocation": "FIRST_ROW"
},
"Parameters": {
"Bucket": "my-inventory-bucket",
"Key": "manifest.csv"
}
},
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "DISTRIBUTED",
"ExecutionType": "EXPRESS"
},
"StartAt": "TransformRecord",
"States": {
"TransformRecord": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:transform-function",
"End": true
}
}
},
"MaxConcurrency": 1000,
"ResultWriter": {
"Resource": "arn:aws:states:::s3:putObject",
"Parameters": {
"Bucket": "my-output-bucket",
"Prefix": "results/"
}
},
"End": true
}
}
}
Key Configuration Options
ItemReader Sources
- S3 Object List: Process every object in a prefix
- S3 Manifest: CSV/JSON file listing objects to process
- S3 Inventory: AWS-generated inventory report
ProcessorConfig Options
{
"Mode": "DISTRIBUTED",
"ExecutionType": "EXPRESS", // or "STANDARD"
}
Use EXPRESS for high-volume, short-duration child executions (up to 5 minutes). Use STANDARD for long-running or exactly-once semantics.
Error Handling
{
"ItemProcessor": {
"StartAt": "TryProcess",
"States": {
"TryProcess": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:process",
"Retry": [
{
"ErrorEquals": ["Lambda.TooManyRequestsException"],
"IntervalSeconds": 1,
"MaxAttempts": 5,
"BackoffRate": 2.0
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "RecordFailure"
}
],
"End": true
},
"RecordFailure": {
"Type": "Pass",
"Result": {"status": "FAILED"},
"End": true
}
}
},
"ToleratedFailurePercentage": 5
}
ToleratedFailurePercentage allows the workflow to succeed even if some items fail—essential for large-scale processing.
Cost Optimization
Distributed Map charges per child execution:
- EXPRESS: $0.000025 per 64KB of state + duration
- STANDARD: $0.025 per 1,000 state transitions
For 1 million items with EXPRESS child executions:
1,000,000 items × $0.000001 = ~$1.00 for Step Functions
+ Lambda costs for processing each item
Key Takeaways
- Distributed Map enables 10,000 concurrent child executions
- Read directly from S3 objects, manifests, or inventory
- Use EXPRESS mode for high-volume, short processing
- ResultWriter aggregates output to S3
- ToleratedFailurePercentage prevents total workflow failure
Discover more from C4: Container, Code, Cloud & Context
Subscribe to get the latest posts sent to your email.