Prometheus AlertManager

Route alerts from Prometheus to a11ops

Connect Prometheus AlertManager to a11ops to centralize your metrics-based alerts. This integration preserves all labels, annotations, and alert metadata.

Prerequisites

  • Prometheus with AlertManager installed and configured
  • An a11ops workspace with an active API key
  • Network access from AlertManager to api.a11ops.com

Setup Instructions

Step 1: Create Integration in a11ops

  1. Navigate to your workspace settings
  2. Go to “Integrations” → “Add Integration”
  3. Select “Prometheus AlertManager”
  4. Name it (e.g., “Production Prometheus”)
  5. Copy the generated webhook URL

Step 2: Configure AlertManager

Add a11ops as a webhook receiver in your AlertManager configuration:

# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'default'
  routes:
    - receiver: 'a11ops-critical'
      match:
        severity: 'critical'
    - receiver: 'a11ops-warning'
      match:
        severity: 'warning'

receivers:
  - name: 'default'
    webhook_configs:
      - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
        send_resolved: true
        
  - name: 'a11ops-critical'
    webhook_configs:
      - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
        send_resolved: true
        
  - name: 'a11ops-warning'
    webhook_configs:
      - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
        send_resolved: true

Step 3: Reload AlertManager

Apply the configuration changes:

# Reload AlertManager configuration
curl -X POST http://localhost:9093/-/reload

# Or if using systemd
sudo systemctl reload alertmanager

Example Alert Rules

High CPU Usage Alert

# prometheus-rules.yml
groups:
  - name: system
    interval: 30s
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
          team: infrastructure
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% (current value: {{ $value }}%)"
          runbook_url: "https://wiki.company.com/runbooks/high-cpu"
          dashboard_url: "https://grafana.company.com/d/node-exporter"

Service Down Alert

- alert: ServiceDown
  expr: up{job="api-service"} == 0
  for: 1m
  labels:
    severity: critical
    team: backend
  annotations:
    summary: "Service {{ $labels.job }} is down"
    description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."
    impact: "API requests will fail, affecting all users"
    action: "Check service logs and restart if necessary"

Disk Space Alert

- alert: DiskSpaceLow
  expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
  for: 10m
  labels:
    severity: warning
    team: infrastructure
  annotations:
    summary: "Low disk space on {{ $labels.instance }}"
    description: "Disk space on {{ $labels.instance }} is below 15% (current: {{ $value }}%)"
    filesystem: "{{ $labels.mountpoint }}"
    device: "{{ $labels.device }}"

Alert Data Mapping

a11ops automatically maps Prometheus alert data to our alert format:

Prometheus Fielda11ops FieldExample
alertnametitleHighCPUUsage
annotations.summarymessageHigh CPU usage on server-01
labels.severityseveritycritical
startsAttimestamp2024-01-15T10:30:00Z
labels + annotationsmetadataAll fields preserved

Advanced Configuration

Custom HTTP Headers

Add authentication or tracking headers:

webhook_configs:
  - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
    send_resolved: true
    http_config:
      headers:
        X-Environment: 'production'
        X-Source: 'prometheus-us-east-1'

TLS Configuration

For enhanced security with custom certificates:

webhook_configs:
  - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
    http_config:
      tls_config:
        insecure_skip_verify: false
        ca_file: /etc/ssl/certs/ca-certificates.crt

Retry Configuration

Customize retry behavior for failed webhook deliveries:

webhook_configs:
  - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
    max_alerts: 100  # Max alerts per webhook message
    timeout: 30s     # HTTP request timeout

Troubleshooting

Alerts not appearing in a11ops

  • Check AlertManager logs: journalctl -u alertmanager -f
  • Verify webhook URL is correct and includes integration ID
  • Ensure AlertManager can reach api.a11ops.com (check firewalls)
  • Test with curl: curl -X POST https://api.a11ops.com/v1/webhooks/YOUR_ID -d ''

Missing alert metadata

  • Ensure labels and annotations are properly defined in alert rules
  • Check that severity label matches expected values
  • Verify template variables are resolving correctly

Duplicate alerts

  • Review group_by configuration to ensure proper deduplication
  • Check repeat_interval settings
  • Verify you're not sending to multiple receivers unintentionally

Successfully Integrated?

Learn more about managing alerts and reducing noise.