Datadog

Forward Datadog monitors to a11ops

Integrate Datadog monitors with a11ops to centralize alerts from your APM, infrastructure monitoring, and log management. Preserves all tags, priority levels, and monitor metadata.

Prerequisites

  • Active Datadog account with monitor management permissions
  • An a11ops workspace with an active API key
  • Network access from Datadog to api.a11ops.com

Setup Instructions

Step 1: Create Integration in a11ops

  1. Navigate to your workspace settings
  2. Go to “Integrations” → “Add Integration”
  3. Select “Datadog”
  4. Name it (e.g., “Production Datadog”)
  5. Copy the generated webhook URL

Step 2: Create Webhook in Datadog

In Datadog, navigate to Integrations → Webhooks:

  1. Click “New” to create a webhook
  2. Name: “a11ops-alerts”
  3. URL: Paste your a11ops webhook URL
  4. Enable “Use custom payload”
  5. Add the custom payload template (see below)
  6. Save the webhook

Step 3: Configure Custom Payload

Use this payload template to ensure proper data mapping:

{
  "alert": "$EVENT_TITLE",
  "description": "$EVENT_MSG",
  "severity": "$ALERT_TRANSITION",
  "source": "datadog",
  "timestamp": "$LAST_UPDATED",
  "labels": {
    "monitor_id": "$MONITOR_ID",
    "alert_type": "$ALERT_TYPE",
    "priority": "$PRIORITY",
    "host": "$HOSTNAME",
    "environment": "$TAGS[env]",
    "service": "$TAGS[service]",
    "team": "$TAGS[team]"
  },
  "annotations": {
    "monitor_url": "$LINK",
    "event_url": "$EVENT_URL",
    "snapshot_url": "$SNAPSHOT",
    "aggregation_key": "$AGGREGATION_KEY",
    "alert_query": "$ALERT_QUERY",
    "metric_namespace": "$METRIC_NAMESPACE"
  }
}

Configuring Monitors

Add Webhook to Monitors

For each monitor you want to send to a11ops:

  1. Edit the monitor configuration
  2. In the “Notify your team” section, add: @webhook-a11ops-alerts
  3. Configure notification message with context
  4. Save the monitor

Example Monitor Message

Include helpful context in your monitor messages:

{{#is_alert}}
🚨 ALERT: {{monitor.name}}

**Issue:** {{value}} exceeds threshold of {{threshold}}
**Service:** {{service.name}}
**Environment:** {{env.name}}
**Time:** {{last_triggered_at}}

**Impact:** Customers may experience slow response times
**Action:** Check service logs and scale if necessary

📊 [View Dashboard](https://app.datadoghq.com/dashboard/abc-123)
📖 [Runbook](https://wiki.company.com/runbooks/high-latency)
{{/is_alert}}

{{#is_recovery}}
✅ RECOVERED: {{monitor.name}}
Service has returned to normal operation.
{{/is_recovery}}

@webhook-a11ops-alerts

Priority Mapping

Configure monitor priority in Datadog to map to a11ops severity levels:

Datadog Prioritya11ops SeverityUse Case
P1 - CriticalcriticalService outages, data loss risk
P2 - HighhighDegraded performance, errors
P3 - MediummediumWarning conditions
P4 - LowlowMinor issues
P5 - InfoinfoInformational

Monitor Examples

APM Service Monitor

Monitor service latency and error rates:

Monitor Type: APM Metrics
Metric: trace.servlet.request.p95
Evaluation: avg(last_5m)

Alert Conditions:
- Alert threshold: > 1000ms
- Warning threshold: > 500ms

Tags:
- env:production
- service:api-gateway
- team:backend
- priority:P2

Message: Include @webhook-a11ops-alerts

Log Pattern Monitor

Alert on error log patterns:

Monitor Type: Log Alert
Query: service:payment-api status:error @error.kind:*

Alert Conditions:
- Alert when count > 100 in last 5 minutes
- Group by: @error.kind

Tags:
- env:production
- service:payment-api
- team:payments
- priority:P1

Alert includes error type and count

Composite Monitor

Combine multiple conditions:

Monitor Type: Composite
Logic: (A && B) || C

Where:
A = High CPU usage (> 80%)
B = High memory usage (> 90%)
C = Service health check failing

Tags:
- env:production
- priority:P1
- team:infrastructure

Triggers when resource exhaustion detected

Tagging Best Practices

Use consistent tags across monitors for better organization in a11ops:

Required Tags

  • env: Environment (prod, staging, dev)
  • service: Service name
  • team: Responsible team
  • priority: P1-P5 priority level

Optional Tags

  • component: Specific component
  • region: Geographic region
  • customer: For multi-tenant
  • sla: SLA tier

Advanced Features

Include Metric Snapshots

Datadog can include graph snapshots in webhooks. The URL will be available in the $SNAPSHOT variable.

Multi-Alert Monitors

For monitors that trigger per host/tag combination, each alert instance will create a separate a11ops alert with specific metadata.

Anomaly Detection

Datadog's anomaly detection monitors work seamlessly with a11ops. The predicted bounds and deviation will be included in the alert metadata.

Troubleshooting

Test the webhook

In Datadog webhook settings, use the “Test” button to send a sample payload:

  1. Go to Integrations → Webhooks
  2. Find your a11ops webhook
  3. Click “Test”
  4. Verify the test alert appears in a11ops

Check webhook history

View webhook delivery status in Datadog:

  • Navigate to Events → Event Stream
  • Filter by source:webhooks
  • Look for delivery failures or errors

Variable substitution issues

If variables aren't substituting correctly, ensure you're using the correct syntax ($VARIABLE) and that the variable is available for your monitor type.

Integration Complete!

Start routing your Datadog monitors to a11ops for centralized alerting.