AWS Lambda for .NET 6: Complete Performance Optimization Guide

AWS Lambda has matured significantly as a platform for .NET workloads. With first-class .NET 6 support and the recent introduction of ARM64 (Graviton2) processors, Lambda offers compelling economics for serverless compute. However, achieving optimal performance requires understanding the nuances of cold starts, memory allocation, and deployment strategies. In this comprehensive guide, I will share lessons learned from running production .NET 6 Lambda functions processing millions of requests daily.

Understanding the Lambda Execution Model

Before diving into optimization, we must understand how Lambda executes .NET code. Unlike traditional servers, Lambda employs a unique lifecycle:

flowchart TB
    subgraph ColdStart ["Cold Start (First Invocation)"]
        A["Download Code Package"] --> B["Initialize Runtime"]
        B --> C["Load Assemblies"]
        C --> D["Run Static Constructors"]
        D --> E["Execute Handler"]
    end
    
    subgraph WarmStart ["Warm Start (Subsequent)"]
        F["Execute Handler Only"]
    end
    
    style A fill:#FFCDD2,stroke:#C62828
    style F fill:#C8E6C9,stroke:#2E7D32

Cold starts include downloading your deployment package, initializing the .NET runtime, loading assemblies, and executing static constructors. Warm starts skip all of this—they reuse the existing execution environment and only run your handler code.

Memory Configuration: More Than Just RAM

Lambda’s memory configuration is the single most misunderstood setting. Memory doesn’t just affect available RAM—it directly controls CPU allocation. AWS allocates CPU proportionally to memory:

128 MB: Fractional CPU (extremely slow JIT compilation)
1769 MB: 1 full vCPU
3538 MB: 2 vCPUs (beyond this, diminishing returns)
10240 MB: 6 vCPUs (maximum)

For .NET 6, I recommend starting at 1024 MB minimum. The JIT compiler is CPU-intensive, and fractional CPU dramatically slows cold starts. Here are benchmarks from our production workload (simple API Gateway integration):

Memory	Cold Start	Warm Invocation	Cost per 1M Invocations
256 MB	4,500 ms	150 ms	$0.42
512 MB	2,800 ms	80 ms	$0.83
1024 MB	1,800 ms	45 ms	$1.67
1769 MB	1,200 ms	25 ms	$2.92
3008 MB	900 ms	18 ms	$5.00

Notice that doubling memory often more than halves execution time, making the cost-per-request actually decrease. Always benchmark your specific workload.

ReadyToRun (R2R) Compilation

ReadyToRun pre-compiles your .NET code to native machine code during build time, reducing JIT overhead at runtime. This is the single most impactful optimization for cold starts.

<PropertyGroup>
  <TargetFramework>net6.0</TargetFramework>
  <PublishReadyToRun>true</PublishReadyToRun>
  <RuntimeIdentifier>linux-x64</RuntimeIdentifier>
  <!-- For Graviton2 (ARM64): -->
  <!-- <RuntimeIdentifier>linux-arm64</RuntimeIdentifier> -->
</PropertyGroup>

In our benchmarks, R2R reduces cold start by 30-40%. The trade-off is a larger deployment package (typically 2-3x larger), but Lambda’s 250 MB unzipped limit is rarely a concern.

Trimming and Assembly Optimization

.NET 6 supports assembly trimming, which removes unused code. This reduces package size and improves cold start by loading fewer assemblies:

<PropertyGroup>
  <PublishTrimmed>true</PublishTrimmed>
  <TrimMode>link</TrimMode>
</PropertyGroup>

Warning: Trimming can break reflection-based code. Test thoroughly. Libraries using System.Text.Json source generators are trimming-safe; those using Newtonsoft.Json with dynamic deserialization are not.

ARM64 (Graviton2) Performance

AWS Graviton2 processors offer 20% better price-performance for Lambda. .NET 6 has excellent ARM64 support. Switching is straightforward:

# serverless.yml
functions:
  myFunction:
    handler: MyAssembly::MyNamespace.Function::Handler
    runtime: dotnet6
    architecture: arm64

In our tests, Graviton2 reduced costs by 18% while maintaining equivalent performance. The only caveat: ensure all native dependencies are ARM64-compatible.

Provisioned Concurrency: Eliminating Cold Starts

For latency-sensitive workloads (APIs with SLA requirements), Provisioned Concurrency keeps a specified number of execution environments warm:

aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier prod \
  --provisioned-concurrent-executions 10

This guarantees 10 warm instances are always available. Cold starts become impossible for the first 10 concurrent requests. The cost is approximately $0.015 per GB-hour of provisioned capacity—significant, but justified for user-facing APIs.

Code Optimization Patterns

1. Initialize Outside the Handler

public class Function
{
    // Initialized once during cold start, reused across invocations
    private static readonly HttpClient _httpClient = new HttpClient();
    private static readonly AmazonDynamoDBClient _dynamoClient = new AmazonDynamoDBClient();
    
    public async Task<APIGatewayProxyResponse> Handler(APIGatewayProxyRequest request, ILambdaContext context)
    {
        // Handler code uses shared clients
        var result = await _dynamoClient.GetItemAsync(...);
        return new APIGatewayProxyResponse { StatusCode = 200 };
    }
}

2. Use Source Generators for JSON

[JsonSerializable(typeof(MyRequest))]
[JsonSerializable(typeof(MyResponse))]
public partial class AppJsonContext : JsonSerializerContext { }

// Usage - 10x faster than reflection-based deserialization
var request = JsonSerializer.Deserialize(body, AppJsonContext.Default.MyRequest);

Monitoring and Observability

Use AWS Lambda Powertools for .NET to add structured logging, distributed tracing, and custom metrics:

[Logging(LogEvent = true)]
[Tracing(CaptureMode = TracingCaptureMode.ResponseAndError)]
[Metrics(Namespace = "OrderService", Service = "ProcessOrder")]
public async Task<APIGatewayProxyResponse> Handler(APIGatewayProxyRequest request, ILambdaContext context)
{
    Logger.LogInformation("Processing order");
    Metrics.AddMetric("OrdersProcessed", 1, MetricUnit.Count);
    
    using var segment = Tracing.AddSubsegment("ValidateOrder");
    // ...
}

Key Takeaways

Set memory to at least 1024 MB for .NET 6 Lambda functions
Enable ReadyToRun compilation for 30-40% cold start reduction
Use ARM64/Graviton2 for 20% cost savings
Implement Provisioned Concurrency for latency-sensitive APIs
Initialize SDK clients outside the handler for reuse
Use System.Text.Json source generators for trimming-safe, high-performance serialization

References

Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Searching in