Grafana
Send unified alerts from Grafana to a11ops
Connect Grafana's unified alerting system to a11ops to centralize visualization-based alerts alongside your other monitoring data. Supports Grafana 8.0+ with unified alerting.
Prerequisites
- Grafana 8.0 or higher with unified alerting enabled
- Admin access to Grafana
- An a11ops workspace with an active API key
- Network access from Grafana to api.a11ops.com
Setup Instructions
Step 1: Create Integration in a11ops
- Navigate to your workspace settings
- Go to "Integrations" → "Add Integration"
- Select "Grafana"
- Name it (e.g., "Production Grafana")
- Copy the generated webhook URL
Step 2: Add Contact Point in Grafana
In Grafana, navigate to Alerting → Contact points:
- Click "New contact point"
- Name: "a11ops"
- Integration: Select "Webhook"
- URL: Paste your a11ops webhook URL
- HTTP Method: POST
- Save the contact point
Step 3: Configure Notification Policy
Route alerts to a11ops using notification policies:
# Example notification policy routing
Root policy
├─ Default contact point: a11ops
└─ Nested policies:
├─ Match: severity=critical
│ └─ Contact point: a11ops
│ └─ Override general timings: Yes
│ └─ Group wait: 30s
└─ Match: team=backend
└─ Contact point: a11ops
└─ Group by: [alertname, cluster]Creating Alerts in Grafana
Query-based Alert
Create an alert based on your dashboard queries:
# Alert rule configuration
Name: High API Response Time
Evaluation group: api-performance
Evaluation interval: 1m
# Query
A: avg(http_request_duration_seconds{job="api"}) > 0.5
# Condition
WHEN last() OF A IS ABOVE 0.5
# Alert details
Summary: API response time is high
Description: Average API response time is {{ $values.A }} seconds
Runbook URL: https://wiki.company.com/runbooks/api-latency
# Labels
severity: warning
team: backend
service: apiMulti-dimensional Alert
Alert on multiple series with proper grouping:
# Multi-dimensional alert
Name: Service Error Rate High
Evaluation interval: 30s
# Query with labels
A: sum by(service, method) (
rate(http_requests_total{status=~"5.."}[5m])
) /
sum by(service, method) (
rate(http_requests_total[5m])
) > 0.05
# This creates separate alerts for each service/method combination
# Labels (applied to all instances)
severity: high
alert_type: error_rate
# Annotations
Summary: High error rate for {{ $labels.service }} {{ $labels.method }}
Description: Error rate is {{ $values.A | humanizePercentage }}Custom Message Templates
Customize the webhook payload sent to a11ops:
{
"alert": "{{ .GroupLabels.alertname }}",
"description": "{{ range .Alerts.Firing }}{{ .Annotations.description }}{{ end }}",
"severity": "{{ .CommonLabels.severity }}",
"source": "grafana",
"labels": {
{{ range $k, $v := .CommonLabels }}
"{{ $k }}": "{{ $v }}"{{ if not (last $k $.CommonLabels) }},{{ end }}
{{ end }}
},
"annotations": {
"alert_count": "{{ len .Alerts }}",
"dashboard_url": "{{ .GeneratorURL }}",
{{ range $k, $v := .CommonAnnotations }}
"{{ $k }}": "{{ $v }}"{{ if not (last $k $.CommonAnnotations) }},{{ end }}
{{ end }}
}
}Including Graphs in Alerts
Grafana can include graph images in alerts. To enable this:
- Enable image rendering
Install the Grafana image renderer plugin or use the hosted service
- Configure the contact point
In the webhook settings, enable “Include image” option
- Set capture timeout
Adjust the capture timeout if graphs are complex
Note: Image URLs in the webhook will be temporary. Consider saving important graphs to your own storage if you need permanent access.
Common Alert Examples
Database Connection Pool
Alert when connection pool is nearly exhausted
mysql_global_status_threads_connected / mysql_global_variables_max_connections > 0.8Disk Space Prediction
Alert 4 hours before disk fills up
predict_linear(node_filesystem_avail_bytes[1h], 4*3600) < 0Application Error Logs
Alert on error log spike
rate(log_entries_total{level="error"}[5m]) > 10SLO Violation
Alert when SLI drops below SLO
sli_availability{service="api"} < 0.999Troubleshooting
Test the webhook manually
Use the “Test” button in the contact point configuration to send a test alert:
- Go to Alerting → Contact points
- Find your a11ops contact point
- Click the "Test" button
- Check if the alert appears in your a11ops workspace
Check Grafana logs
Look for webhook delivery errors:
grep “webhook” /var/log/grafana/grafana.logVerify alert state
In Grafana's Alert list, ensure your alerts are in “Firing” state when conditions are met. Check the “State history” tab for debugging.
Best Practices
Use Alert Folders
Organize alerts in folders by service or team for easier management and routing to different contact points.
Set Proper Labels
Always include severity, team, and service labels to enable proper routing and filtering in a11ops.
Avoid Alert Fatigue
Use "for" duration in conditions to prevent flapping alerts. Start with 5m for most metrics.
Include Context
Add dashboard links and runbook URLs in annotations to help responders quickly understand and resolve issues.
Integration Complete!
Learn how to structure alerts effectively and reduce noise.