Skip to content

Escalation policies

Escalation policies define what happens when an alert is not acknowledged within a specified timeout. They ensure that critical alerts always reach a responder, even if the primary on-call person is unavailable.

How escalation works

  1. An alert is routed to the current on-call responder.
  2. A countdown starts based on the configured timeout (in minutes).
  3. If the alert is not acknowledged before the timeout expires, it is routed to the next target in the policy.
  4. This process repeats until the alert is acknowledged or the policy chain is exhausted.

Creating an escalation policy

Navigate to On-call > Escalation Policies and click Create Policy.

Policy model

Each escalation policy is a single node in a linked list. A policy defines:

FieldDescription
nameHuman-readable policy name
schedule_idThe on-call schedule to notify at this level
timeout_minutesHow long to wait before escalating to the next policy
next_policy_idOptional — the next policy to escalate to if this one times out
incident_typesWhich incident types trigger this policy (empty = all)

Policies chain together via next_policy_id. When an alert is not acknowledged within timeout_minutes, the system follows the link to the next policy. If next_policy_id is null, the chain ends and no further escalation occurs.

Example chain:

Policy: Backend Primary (schedule: backend-primary, timeout: 5 min)
  → next_policy_id: Backend Secondary
Policy: Backend Secondary (schedule: backend-secondary, timeout: 10 min)
  → next_policy_id: Engineering Manager
Policy: Engineering Manager (schedule: eng-manager, timeout: 15 min)
  → next_policy_id: null (chain ends)

If no one acknowledges within 30 minutes, the alert reaches the engineering manager and stops.

TIP

Keep the first timeout short (3-5 minutes) for high-severity incidents. Use longer timeouts for low-severity alerts to give the primary responder more time.

Incident type triggers

You can attach escalation policies to specific incident types so that different alert categories follow different escalation paths.

For example:

  • SEV1 - Critical: Escalate to the engineering manager after 5 minutes.
  • SEV2 - Warning: Escalate to the secondary schedule after 15 minutes.
  • SEV3 - Info: No escalation; alert expires after 60 minutes if unacknowledged.

Configure incident type triggers in the policy settings under Triggers.

INFO

If no incident type is specified, the policy applies to all alerts routed through it. You can create a default policy and override it with type-specific policies.

Policy chains

Each policy links to at most one other policy via next_policy_id. This forms a linear chain — not a tree or graph. To build multi-team escalation, point a team's last policy at another team's first policy.

Avoid long chains. Two or three policies in sequence is sufficient for most organizations.

Timeout behavior

The timeout countdown starts when the alert is created (the triggered_at timestamp).

Escalation timer

A background process runs every 30 seconds and checks for pending alerts whose timeout has elapsed. When an escalation fires, the system publishes an oncall.escalated event to NATS JetStream.

The timer respects business hours and holiday calendars configured on the target schedule. If the current time falls outside business hours or on a holiday, the timeout is paused and resumes when the next business window opens.

Built by the Batida team