🚨 Alert Rules Configuration

Prometheus Alert Rules

Learn to create effective alert rules using PromQL. Configure proactive monitoring to catch issues before they impact your applications and infrastructure.

Common Alert Rules

High CPU Usage

Alert when CPU usage exceeds threshold

warning

Monitor CPU utilization across nodes and pods to prevent performance degradation.

PromQL Query:

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80

For:5m

Severity:warning

Memory Usage Alert

Alert when memory usage is critically high

critical

Track memory consumption to prevent out-of-memory errors and system instability.

PromQL Query:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90

For:2m

Severity:critical

Pod CrashLoopBackOff

Detect pods in crash loop state

critical

Identify pods that are repeatedly failing and restarting, indicating application issues.

PromQL Query:

kube_pod_status_phase{phase="Running"} == 0 and kube_pod_status_phase{phase="Pending"} == 0

For:1m

Severity:critical

Disk Space Warning

Monitor available disk space

warning

Prevent disk space issues that could cause service failures or data loss.

PromQL Query:

(1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 85

For:10m

Severity:warning

Alert Rule Structure

PrometheusRule Components

Understanding the structure of PrometheusRule resources for effective alert configuration.

Alert Name

Required

Unique identifier for the alert

HighCPUUsage

PromQL Expression

Required

Query that defines the alert condition

cpu_usage_percent > 80

For Duration

Optional

How long the condition must persist before firing

5m

Labels

Optional

Key-value pairs attached to the alert

severity: warning

Annotations

Optional

Additional metadata for notifications

description: "CPU usage is high"

PromQL Examples

CPU Usage Percentage

Calculate CPU usage percentage for each instance

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage Percentage

Calculate available memory percentage

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Pod Restart Count

Calculate restart rate for pods

rate(kube_pod_container_status_restarts_total[15m])

HTTP Request Rate

Calculate HTTP request rate over 5 minutes

rate(http_requests_total[5m])

Best Practices

Alert Rule Guidelines

Follow these best practices to create effective and maintainable alert rules.

Use meaningful alert names that clearly describe the issue
Set appropriate severity levels (info, warning, critical)
Include helpful descriptions and runbook links in annotations
Test alert rules in staging environments first
Use proper "for" durations to avoid false positives
Group related alerts using consistent label naming
Regular review and cleanup of unused alert rules

Sample PrometheusRule

Complete Example

A complete PrometheusRule resource with multiple alert definitions.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-alerts
  namespace: monitoring
spec:
  groups:
  - name: kubernetes.rules
    rules:
    - alert: HighCPUUsage
      expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
      for: 5m
      labels:
        severity: warning
        service: cpu
      annotations:
        summary: "High CPU usage detected"
        description: "CPU usage is above 80% for more than 5 minutes"
        
    - alert: HighMemoryUsage
      expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
      for: 2m
      labels:
        severity: critical
        service: memory
      annotations:
        summary: "High memory usage detected"
        description: "Memory usage is above 90% for more than 2 minutes"

Ready to Configure Notifications?

Now that you have alert rules configured, learn how to set up Alertmanager for routing and notifications.

Configure Alertmanager Back to Dashboards