Grafana

Send unified alerts from Grafana to a11ops

Connect Grafana's unified alerting system to a11ops to centralize visualization-based alerts alongside your other monitoring data. Supports Grafana 8.0+ with unified alerting.

Prerequisites

  • Grafana 8.0 or higher with unified alerting enabled
  • Admin access to Grafana
  • An a11ops workspace with an active API key
  • Network access from Grafana to api.a11ops.com

Setup Instructions

Step 1: Create Integration in a11ops

  1. Navigate to your workspace settings
  2. Go to "Integrations""Add Integration"
  3. Select "Grafana"
  4. Name it (e.g., "Production Grafana")
  5. Copy the generated webhook URL

Step 2: Add Contact Point in Grafana

In Grafana, navigate to Alerting → Contact points:

  1. Click "New contact point"
  2. Name: "a11ops"
  3. Integration: Select "Webhook"
  4. URL: Paste your a11ops webhook URL
  5. HTTP Method: POST
  6. Save the contact point

Step 3: Configure Notification Policy

Route alerts to a11ops using notification policies:

# Example notification policy routing
Root policy
├─ Default contact point: a11ops
└─ Nested policies:
   ├─ Match: severity=critical
   │  └─ Contact point: a11ops
   │  └─ Override general timings: Yes
   │  └─ Group wait: 30s
   └─ Match: team=backend
      └─ Contact point: a11ops
      └─ Group by: [alertname, cluster]

Creating Alerts in Grafana

Query-based Alert

Create an alert based on your dashboard queries:

# Alert rule configuration
Name: High API Response Time
Evaluation group: api-performance
Evaluation interval: 1m

# Query
A: avg(http_request_duration_seconds{job="api"}) > 0.5

# Condition
WHEN last() OF A IS ABOVE 0.5

# Alert details
Summary: API response time is high
Description: Average API response time is {{ $values.A }} seconds
Runbook URL: https://wiki.company.com/runbooks/api-latency

# Labels
severity: warning
team: backend
service: api

Multi-dimensional Alert

Alert on multiple series with proper grouping:

# Multi-dimensional alert
Name: Service Error Rate High
Evaluation interval: 30s

# Query with labels
A: sum by(service, method) (
    rate(http_requests_total{status=~"5.."}[5m])
  ) / 
  sum by(service, method) (
    rate(http_requests_total[5m])
  ) > 0.05

# This creates separate alerts for each service/method combination

# Labels (applied to all instances)
severity: high
alert_type: error_rate

# Annotations
Summary: High error rate for {{ $labels.service }} {{ $labels.method }}
Description: Error rate is {{ $values.A | humanizePercentage }}

Custom Message Templates

Customize the webhook payload sent to a11ops:

{
  "alert": "{{ .GroupLabels.alertname }}",
  "description": "{{ range .Alerts.Firing }}{{ .Annotations.description }}{{ end }}",
  "severity": "{{ .CommonLabels.severity }}",
  "source": "grafana",
  "labels": {
    {{ range $k, $v := .CommonLabels }}
    "{{ $k }}": "{{ $v }}"{{ if not (last $k $.CommonLabels) }},{{ end }}
    {{ end }}
  },
  "annotations": {
    "alert_count": "{{ len .Alerts }}",
    "dashboard_url": "{{ .GeneratorURL }}",
    {{ range $k, $v := .CommonAnnotations }}
    "{{ $k }}": "{{ $v }}"{{ if not (last $k $.CommonAnnotations) }},{{ end }}
    {{ end }}
  }
}

Including Graphs in Alerts

Grafana can include graph images in alerts. To enable this:

  1. Enable image rendering

    Install the Grafana image renderer plugin or use the hosted service

  2. Configure the contact point

    In the webhook settings, enable “Include image” option

  3. Set capture timeout

    Adjust the capture timeout if graphs are complex

Note: Image URLs in the webhook will be temporary. Consider saving important graphs to your own storage if you need permanent access.

Common Alert Examples

Database Connection Pool

Alert when connection pool is nearly exhausted

mysql_global_status_threads_connected / mysql_global_variables_max_connections > 0.8

Disk Space Prediction

Alert 4 hours before disk fills up

predict_linear(node_filesystem_avail_bytes[1h], 4*3600) < 0

Application Error Logs

Alert on error log spike

rate(log_entries_total{level="error"}[5m]) > 10

SLO Violation

Alert when SLI drops below SLO

sli_availability{service="api"} < 0.999

Troubleshooting

Test the webhook manually

Use the “Test” button in the contact point configuration to send a test alert:

  1. Go to Alerting → Contact points
  2. Find your a11ops contact point
  3. Click the "Test" button
  4. Check if the alert appears in your a11ops workspace

Check Grafana logs

Look for webhook delivery errors:

grep “webhook” /var/log/grafana/grafana.log

Verify alert state

In Grafana's Alert list, ensure your alerts are in “Firing” state when conditions are met. Check the “State history” tab for debugging.

Best Practices

Use Alert Folders

Organize alerts in folders by service or team for easier management and routing to different contact points.

Set Proper Labels

Always include severity, team, and service labels to enable proper routing and filtering in a11ops.

Avoid Alert Fatigue

Use "for" duration in conditions to prevent flapping alerts. Start with 5m for most metrics.

Include Context

Add dashboard links and runbook URLs in annotations to help responders quickly understand and resolve issues.

Integration Complete!

Learn how to structure alerts effectively and reduce noise.