AWS CloudWatch is the foundation of AWS observability, providing metrics, logs, traces, and alarms. However, the service has grown into a complex ecosystem: CloudWatch Logs, Logs Insights, Metrics, Container Insights, Lambda Insights, Contributor Insights, Synthetics, RUM, and Evidently. This guide provides a unified approach to building a complete observability stack using CloudWatch components.
CloudWatch Components
flowchart TB
subgraph Sources ["Data Sources"]
Lambda["Lambda Functions"]
ECS["ECS/Fargate"]
EC2["EC2 Instances"]
API["API Gateway"]
end
subgraph CloudWatch ["CloudWatch"]
Logs["CloudWatch Logs"]
Metrics["CloudWatch Metrics"]
Traces["X-Ray Traces"]
Alarms["CloudWatch Alarms"]
end
subgraph Actions ["Actions"]
SNS["SNS Notifications"]
Lambda2["Lambda Remediation"]
Dashboards["Dashboards"]
end
Sources --> Logs
Sources --> Metrics
Sources --> Traces
Logs --> Alarms
Metrics --> Alarms
Alarms --> Actions
style Alarms fill:#FFCDD2,stroke:#C62828
Structured Logging
CloudWatch Logs Insights works best with JSON-structured logs:
import json
import logging
class JsonFormatter(logging.Formatter):
def format(self, record):
log_record = {
"timestamp": self.formatTime(record),
"level": record.levelname,
"message": record.getMessage(),
"function": record.funcName,
"correlation_id": getattr(record, 'correlation_id', None)
}
return json.dumps(log_record)
logger = logging.getLogger()
handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger.addHandler(handler)
Logs Insights Queries
# Find errors with correlation IDs
fields @timestamp, message, correlation_id
| filter level = "ERROR"
| sort @timestamp desc
| limit 100
# Aggregate request latency by endpoint
fields @timestamp, endpoint, duration_ms
| stats avg(duration_ms), p99(duration_ms), count() by endpoint
| sort count() desc
Custom Metrics with EMF
Embedded Metric Format (EMF) publishes custom metrics without CloudWatch API calls:
from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit
metrics = Metrics()
@metrics.log_metrics
def handler(event, context):
metrics.add_metric(name="OrdersProcessed", unit=MetricUnit.Count, value=1)
metrics.add_metric(name="OrderTotal", unit=MetricUnit.None, value=order.total)
metrics.add_dimension(name="Environment", value="production")
# Metrics published to CloudWatch via log output (EMF)
return {"statusCode": 200}
Composite Alarms
Combine multiple alarms with AND/OR logic to reduce noise:
resource "aws_cloudwatch_composite_alarm" "service_health" {
alarm_name = "service-health-composite"
alarm_rule = "ALARM(${aws_cloudwatch_metric_alarm.error_rate.alarm_name}) AND ALARM(${aws_cloudwatch_metric_alarm.latency.alarm_name})"
alarm_actions = [aws_sns_topic.alerts.arn]
}
Container Insights
Enable Container Insights for ECS/EKS cluster metrics:
# ECS
aws ecs update-cluster-settings \
--cluster my-cluster \
--settings name=containerInsights,value=enabled
# EKS
aws eks update-cluster-config \
--name my-cluster \
--logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
Key Takeaways
- Use structured JSON logging for Logs Insights queries
- EMF enables zero-latency custom metrics
- Composite alarms reduce alert noise
- Container Insights provides cluster-level visibility
- X-Ray integration enables distributed tracing
Discover more from C4: Container, Code, Cloud & Context
Subscribe to get the latest posts sent to your email.