Kubernetes events are one of the most valuable yet frequently overlooked data sources in a cluster. They provide insights into scheduling failures, pod restarts, node issues, and configuration problems. However, Kubernetes events are ephemeral and can disappear quickly if not centrally collected.
This blog explains a production-ready and highly available approach to collect Kubernetes cluster events and forward them to Amazon CloudWatch Logs using Fluent Bit and Kubernetes leader election.
Why Centralize Kubernetes Events?
By default, Kubernetes events are stored in etcd with limited retention. Once they expire, critical troubleshooting information is permanently lost.
Centralizing events in Amazon CloudWatch Logs provides:
Long-term retention
Search and filtering capabilities
Audit and compliance visibility
Correlation with metrics and application logs
Solution Overview
This solution uses a leader-elected event collector that streams Kubernetes
events using
kubectl. Only one pod acts as the leader at any given time,
ensuring that
duplicate events are not generated.
Fluent Bit tails the event log file and forwards events to CloudWatch Logs. Multiple replicas ensure high availability and automatic failover.
Architecture Overview
Multiple replicas of the event collector are deployed
Kubernetes Lease API is used for leader election
Only the leader pod streams events
Events are written to a log file
Fluent Bit ships events to Amazon CloudWatch Logs
Scope and Environment
Cluster Name: test
Region: ap-south-1
Namespace: amazon-cloudwatch
Log Group: /aws/eks/test/cluster-events
Required IAM Permissions
For Fluent Bit to successfully push Kubernetes events to Amazon CloudWatch Logs, the EKS node group IAM role must have the required CloudWatch Logs permissions.
Create log groups
Create log streams
Put log events
Describe log groups and streams
Core Components
Service Account with least privilege access
ClusterRole and ClusterRoleBinding
ConfigMaps for Fluent Bit and event collector
Deployment with multiple replicas
Pod Disruption Budget for availability
Deployment Procedure
Create the YAML file using the naming convention:
fluent-bit-events-ha-<cluster-name>.yaml
Validate the YAML:
kubectl apply --dry-run=client -f fluent-bit-events.yaml
Deploy to the cluster:
kubectl apply -f fluent-bit-events-ha-test.yaml
Validate Leader
kubectl get lease fluent-bit-events-leader -n amazon-cloudwatch
Complete Deployment YAML
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit-events-sa
namespace: amazon-cloudwatch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit-events-cluster-role
rules:
- apiGroups: [""]
resources: ["events"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["namespaces", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-events-cluster-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-events-cluster-role
subjects:
- kind: ServiceAccount
name: fluent-bit-events-sa
namespace: amazon-cloudwatch
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-events-config
namespace: amazon-cloudwatch
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Log_Level info
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
[INPUT]
Name tail
Tag kube.events
Path /var/log/events.log
Parser json
DB /var/log/flb_events.db
Refresh_Interval 2
Read_from_Head false
[FILTER]
Name modify
Match kube.events
Add cluster test
Add region ap-south-1
[OUTPUT]
Name cloudwatch_logs
Match kube.events
region ap-south-1
log_group_name /aws/eks/test/cluster-events
log_stream_prefix events-
auto_create_group true
log_retention_days 30
---
apiVersion: v1
kind: ConfigMap
metadata:
name: event-collector-script
namespace: amazon-cloudwatch
data:
collect-events.sh: |
#!/bin/sh
kubectl get events --all-namespaces --watch-only -o json >> /var/log/events.log
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fluent-bit-events
namespace: amazon-cloudwatch
spec:
replicas: 3
selector:
matchLabels:
app: fluent-bit-events
template:
metadata:
labels:
app: fluent-bit-events
spec:
serviceAccountName: fluent-bit-events-sa
containers:
- name: event-collector
image: bitnami/kubectl:latest
- name: fluent-bit
image: amazon/aws-for-fluent-bit:2.31.12
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: fluent-bit-events-pdb
namespace: amazon-cloudwatch
spec:
minAvailable: 1
selector:
matchLabels:
app: fluent-bit-events
Operational Verification
Verify pods are running
Verify leader lease
Verify Fluent Bit health endpoint
Verify CloudWatch Logs are receiving events
Failover and Recovery
Leader failover is automatic. If the leader pod fails, another replica acquires leadership using the Lease API and resumes event collection without manual intervention.
Security and Compliance
Least privilege RBAC
Namespace isolation
No hostPath usage
Cloud-native authentication