Prometheus Alert Rules
Learn to create effective alert rules using PromQL. Configure proactive monitoring to catch issues before they impact your applications and infrastructure.
Common Alert Rules
High CPU Usage
Alert when CPU usage exceeds threshold
Monitor CPU utilization across nodes and pods to prevent performance degradation.
PromQL Query:
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80Memory Usage Alert
Alert when memory usage is critically high
Track memory consumption to prevent out-of-memory errors and system instability.
PromQL Query:
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
Pod CrashLoopBackOff
Detect pods in crash loop state
Identify pods that are repeatedly failing and restarting, indicating application issues.
PromQL Query:
kube_pod_status_phase{phase="Running"} == 0 and kube_pod_status_phase{phase="Pending"} == 0Disk Space Warning
Monitor available disk space
Prevent disk space issues that could cause service failures or data loss.
PromQL Query:
(1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 85
Alert Rule Structure
PrometheusRule Components
Understanding the structure of PrometheusRule resources for effective alert configuration.
Alert Name
Unique identifier for the alert
HighCPUUsagePromQL Expression
Query that defines the alert condition
cpu_usage_percent > 80For Duration
How long the condition must persist before firing
5mLabels
Key-value pairs attached to the alert
severity: warningAnnotations
Additional metadata for notifications
description: "CPU usage is high"PromQL Examples
CPU Usage Percentage
Calculate CPU usage percentage for each instance
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)Memory Usage Percentage
Calculate available memory percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
Pod Restart Count
Calculate restart rate for pods
rate(kube_pod_container_status_restarts_total[15m])
HTTP Request Rate
Calculate HTTP request rate over 5 minutes
rate(http_requests_total[5m])
Best Practices
Alert Rule Guidelines
Follow these best practices to create effective and maintainable alert rules.
- Use meaningful alert names that clearly describe the issue
- Set appropriate severity levels (info, warning, critical)
- Include helpful descriptions and runbook links in annotations
- Test alert rules in staging environments first
- Use proper "for" durations to avoid false positives
- Group related alerts using consistent label naming
- Regular review and cleanup of unused alert rules
Sample PrometheusRule
Complete Example
A complete PrometheusRule resource with multiple alert definitions.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kubernetes-alerts
namespace: monitoring
spec:
groups:
- name: kubernetes.rules
rules:
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
service: cpu
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for more than 5 minutes"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 2m
labels:
severity: critical
service: memory
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 90% for more than 2 minutes"Ready to Configure Notifications?
Now that you have alert rules configured, learn how to set up Alertmanager for routing and notifications.