AWS Lambda has matured significantly as a platform for .NET workloads. With first-class .NET 6 support and the recent introduction of ARM64 (Graviton2) processors, Lambda offers compelling economics for serverless compute. However, achieving optimal performance requires understanding the nuances of cold starts, memory allocation, and deployment strategies. In this comprehensive guide, I will share lessons learned from running production .NET 6 Lambda functions processing millions of requests daily.
Understanding the Lambda Execution Model
Before diving into optimization, we must understand how Lambda executes .NET code. Unlike traditional servers, Lambda employs a unique lifecycle:
flowchart TB
subgraph ColdStart ["Cold Start (First Invocation)"]
A["Download Code Package"] --> B["Initialize Runtime"]
B --> C["Load Assemblies"]
C --> D["Run Static Constructors"]
D --> E["Execute Handler"]
end
subgraph WarmStart ["Warm Start (Subsequent)"]
F["Execute Handler Only"]
end
style A fill:#FFCDD2,stroke:#C62828
style F fill:#C8E6C9,stroke:#2E7D32
Cold starts include downloading your deployment package, initializing the .NET runtime, loading assemblies, and executing static constructors. Warm starts skip all of this—they reuse the existing execution environment and only run your handler code.
Memory Configuration: More Than Just RAM
Lambda’s memory configuration is the single most misunderstood setting. Memory doesn’t just affect available RAM—it directly controls CPU allocation. AWS allocates CPU proportionally to memory:
- 128 MB: Fractional CPU (extremely slow JIT compilation)
- 1769 MB: 1 full vCPU
- 3538 MB: 2 vCPUs (beyond this, diminishing returns)
- 10240 MB: 6 vCPUs (maximum)
For .NET 6, I recommend starting at 1024 MB minimum. The JIT compiler is CPU-intensive, and fractional CPU dramatically slows cold starts. Here are benchmarks from our production workload (simple API Gateway integration):
| Memory | Cold Start | Warm Invocation | Cost per 1M Invocations |
|---|---|---|---|
| 256 MB | 4,500 ms | 150 ms | $0.42 |
| 512 MB | 2,800 ms | 80 ms | $0.83 |
| 1024 MB | 1,800 ms | 45 ms | $1.67 |
| 1769 MB | 1,200 ms | 25 ms | $2.92 |
| 3008 MB | 900 ms | 18 ms | $5.00 |
Notice that doubling memory often more than halves execution time, making the cost-per-request actually decrease. Always benchmark your specific workload.
ReadyToRun (R2R) Compilation
ReadyToRun pre-compiles your .NET code to native machine code during build time, reducing JIT overhead at runtime. This is the single most impactful optimization for cold starts.
<PropertyGroup>
<TargetFramework>net6.0</TargetFramework>
<PublishReadyToRun>true</PublishReadyToRun>
<RuntimeIdentifier>linux-x64</RuntimeIdentifier>
<!-- For Graviton2 (ARM64): -->
<!-- <RuntimeIdentifier>linux-arm64</RuntimeIdentifier> -->
</PropertyGroup>
In our benchmarks, R2R reduces cold start by 30-40%. The trade-off is a larger deployment package (typically 2-3x larger), but Lambda’s 250 MB unzipped limit is rarely a concern.
Trimming and Assembly Optimization
.NET 6 supports assembly trimming, which removes unused code. This reduces package size and improves cold start by loading fewer assemblies:
<PropertyGroup>
<PublishTrimmed>true</PublishTrimmed>
<TrimMode>link</TrimMode>
</PropertyGroup>
Warning: Trimming can break reflection-based code. Test thoroughly. Libraries using System.Text.Json source generators are trimming-safe; those using Newtonsoft.Json with dynamic deserialization are not.
ARM64 (Graviton2) Performance
AWS Graviton2 processors offer 20% better price-performance for Lambda. .NET 6 has excellent ARM64 support. Switching is straightforward:
# serverless.yml
functions:
myFunction:
handler: MyAssembly::MyNamespace.Function::Handler
runtime: dotnet6
architecture: arm64
In our tests, Graviton2 reduced costs by 18% while maintaining equivalent performance. The only caveat: ensure all native dependencies are ARM64-compatible.
Provisioned Concurrency: Eliminating Cold Starts
For latency-sensitive workloads (APIs with SLA requirements), Provisioned Concurrency keeps a specified number of execution environments warm:
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--qualifier prod \
--provisioned-concurrent-executions 10
This guarantees 10 warm instances are always available. Cold starts become impossible for the first 10 concurrent requests. The cost is approximately $0.015 per GB-hour of provisioned capacity—significant, but justified for user-facing APIs.
Code Optimization Patterns
1. Initialize Outside the Handler
public class Function
{
// Initialized once during cold start, reused across invocations
private static readonly HttpClient _httpClient = new HttpClient();
private static readonly AmazonDynamoDBClient _dynamoClient = new AmazonDynamoDBClient();
public async Task<APIGatewayProxyResponse> Handler(APIGatewayProxyRequest request, ILambdaContext context)
{
// Handler code uses shared clients
var result = await _dynamoClient.GetItemAsync(...);
return new APIGatewayProxyResponse { StatusCode = 200 };
}
}
2. Use Source Generators for JSON
[JsonSerializable(typeof(MyRequest))]
[JsonSerializable(typeof(MyResponse))]
public partial class AppJsonContext : JsonSerializerContext { }
// Usage - 10x faster than reflection-based deserialization
var request = JsonSerializer.Deserialize(body, AppJsonContext.Default.MyRequest);
Monitoring and Observability
Use AWS Lambda Powertools for .NET to add structured logging, distributed tracing, and custom metrics:
[Logging(LogEvent = true)]
[Tracing(CaptureMode = TracingCaptureMode.ResponseAndError)]
[Metrics(Namespace = "OrderService", Service = "ProcessOrder")]
public async Task<APIGatewayProxyResponse> Handler(APIGatewayProxyRequest request, ILambdaContext context)
{
Logger.LogInformation("Processing order");
Metrics.AddMetric("OrdersProcessed", 1, MetricUnit.Count);
using var segment = Tracing.AddSubsegment("ValidateOrder");
// ...
}
Key Takeaways
- Set memory to at least 1024 MB for .NET 6 Lambda functions
- Enable ReadyToRun compilation for 30-40% cold start reduction
- Use ARM64/Graviton2 for 20% cost savings
- Implement Provisioned Concurrency for latency-sensitive APIs
- Initialize SDK clients outside the handler for reuse
- Use System.Text.Json source generators for trimming-safe, high-performance serialization
References
Discover more from C4: Container, Code, Cloud & Context
Subscribe to get the latest posts sent to your email.