Prometheus AlertManager

Route alerts from Prometheus to a11ops

Connect Prometheus AlertManager to a11ops to centralize your metrics-based alerts. This integration preserves all labels, annotations, and alert metadata.

Prerequisites

Prometheus with AlertManager installed and configured
An a11ops workspace with an active API key
Network access from AlertManager to api.a11ops.com

Setup Instructions

Step 1: Create Integration in a11ops

Navigate to your workspace settings
Go to “Integrations” → “Add Integration”
Select “Prometheus AlertManager”
Name it (e.g., “Production Prometheus”)
Copy the generated webhook URL

Step 2: Configure AlertManager

Add a11ops as a webhook receiver in your AlertManager configuration:

# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'default'
  routes:
    - receiver: 'a11ops-critical'
      match:
        severity: 'critical'
    - receiver: 'a11ops-warning'
      match:
        severity: 'warning'

receivers:
  - name: 'default'
    webhook_configs:
      - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
        send_resolved: true
        
  - name: 'a11ops-critical'
    webhook_configs:
      - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
        send_resolved: true
        
  - name: 'a11ops-warning'
    webhook_configs:
      - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
        send_resolved: true

Step 3: Reload AlertManager

Apply the configuration changes:

# Reload AlertManager configuration
curl -X POST http://localhost:9093/-/reload

# Or if using systemd
sudo systemctl reload alertmanager

Example Alert Rules

High CPU Usage Alert

# prometheus-rules.yml
groups:
  - name: system
    interval: 30s
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
          team: infrastructure
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% (current value: {{ $value }}%)"
          runbook_url: "https://wiki.company.com/runbooks/high-cpu"
          dashboard_url: "https://grafana.company.com/d/node-exporter"

Service Down Alert

- alert: ServiceDown
  expr: up{job="api-service"} == 0
  for: 1m
  labels:
    severity: critical
    team: backend
  annotations:
    summary: "Service {{ $labels.job }} is down"
    description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."
    impact: "API requests will fail, affecting all users"
    action: "Check service logs and restart if necessary"

Disk Space Alert

- alert: DiskSpaceLow
  expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
  for: 10m
  labels:
    severity: warning
    team: infrastructure
  annotations:
    summary: "Low disk space on {{ $labels.instance }}"
    description: "Disk space on {{ $labels.instance }} is below 15% (current: {{ $value }}%)"
    filesystem: "{{ $labels.mountpoint }}"
    device: "{{ $labels.device }}"

Alert Data Mapping

a11ops automatically maps Prometheus alert data to our alert format:

Prometheus Field	a11ops Field	Example
`alertname`	title	HighCPUUsage
`annotations.summary`	message	High CPU usage on server-01
`labels.severity`	severity	critical
`startsAt`	timestamp	2024-01-15T10:30:00Z
`labels + annotations`	metadata	All fields preserved

Advanced Configuration

Custom HTTP Headers

Add authentication or tracking headers:

webhook_configs:
  - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
    send_resolved: true
    http_config:
      headers:
        X-Environment: 'production'
        X-Source: 'prometheus-us-east-1'

TLS Configuration

For enhanced security with custom certificates:

webhook_configs:
  - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
    http_config:
      tls_config:
        insecure_skip_verify: false
        ca_file: /etc/ssl/certs/ca-certificates.crt

Retry Configuration

Customize retry behavior for failed webhook deliveries:

webhook_configs:
  - url: 'https://api.a11ops.com/v1/webhooks/YOUR_INTEGRATION_ID'
    max_alerts: 100  # Max alerts per webhook message
    timeout: 30s     # HTTP request timeout

Troubleshooting

Alerts not appearing in a11ops

Check AlertManager logs: journalctl -u alertmanager -f
Verify webhook URL is correct and includes integration ID
Ensure AlertManager can reach api.a11ops.com (check firewalls)
Test with curl: curl -X POST https://api.a11ops.com/v1/webhooks/YOUR_ID -d ''

Missing alert metadata

Ensure labels and annotations are properly defined in alert rules
Check that severity label matches expected values
Verify template variables are resolving correctly

Duplicate alerts

Review group_by configuration to ensure proper deduplication
Check repeat_interval settings
Verify you're not sending to multiple receivers unintentionally

Successfully Integrated?

Learn more about managing alerts and reducing noise.

Alert Management Best Practices