Skip to content

Templates, mentions & testing

Tag specific users or groups in alert messages. Mentions are channel-agnostic: you write plain usernames in metric config, and each channel formats them in its native syntax.

alerting:
enabled: true
channels:
- mattermost_ops
consecutive_anomalies: 3
mentions:
- oncall_engineer
- devops_team

This appends @oncall_engineer @devops_team to alert messages in Mattermost.

detectkit automatically formats mentions for each platform:

Config ValueMattermostSlackTelegramEmail
username@username@username (display only)@usernameCC: username
here@here<!here> (broadcast)@here(ignored)
channel@channel<!channel> (broadcast)@channel(ignored)
all@all<!everyone> (broadcast)@all(ignored)
U04ABCD1234@U04ABCD1234<@U04ABCD1234> (real ping)@U04ABCD1234CC: U04ABCD1234

Slack note: Slack webhooks do not actually ping users with @username — it’s display-only. For real pings, use Slack User IDs (format: U + alphanumeric, found in user profile > “Copy member ID”).

Use these keywords for broadcast mentions:

  • here — Notify active members (Mattermost: @here, Slack: <!here>)
  • channel — Notify all channel members (Mattermost: @channel, Slack: <!channel>)
  • all — Notify everyone (Mattermost: @all, Slack: <!everyone>)

By default, mentions appear at the end of the message. Use template variables for custom placement:

  • {mentions} — Formatted mentions string (e.g., @user1 @user2), empty string if none
  • {mentions_line} — Same but with a leading newline, empty string if none
alerting:
mentions:
- oncall_engineer
# Place mentions at the top of the message
template_consecutive: |
{mentions}
Alert: {metric_name}
Time: {timestamp}
Value: {value} | CI: {confidence_interval}
Consecutive: {consecutive_count}

Mentions are included in both anomaly alerts and recovery notifications:

alerting:
mentions:
- oncall_engineer
notify_on_recovery: true
template_recovery: |
{mentions}
Resolved: {metric_name} at {timestamp}
Value: {value}
FieldTypeDefaultDescription
mentionsList[str][]Users/groups to mention. Plain usernames without @.

No @ prefix needed — detectkit adds the appropriate prefix for each channel.

Alerts display timestamps in UTC by default. Override per metric:

alerting:
timezone: "Europe/Moscow" # MSK (UTC+3)
timezone: "America/New_York" # EST/EDT
timezone: "Asia/Tokyo" # JST (UTC+9)

Note: This only affects alert display. All internal timestamps remain UTC.

Override default alert message format.

The default message foregrounds the alert — first how long it has been going on (the plain-language lead), then the rule that fired, with the anomaly as supporting evidence below. The order is the same on every channel and for both anomaly and recovery: description → Rule → Value/Expected.

🔴 {project_name_prefix}Alert: {metric_name}
{description_line}{anomaly_lead}
Rule: min_detectors={min_detectors} · direction={direction_policy} · consecutive={consecutive_required}
Value: {value_display} | Expected: {expected_range}
Quorum: {detector_count}/{min_detectors} · {direction}
Severity: {severity:.2f}
{window_line}Detectors: {detector_name}
Parameters: {detector_params}
{dashboard_line}{help_line}{mentions_line}

The first line names the alert and the metric, led by the project name as a [name] prefix (from detectkit_project.yml) so two projects posting to the same channel stay distinguishable while keeping the default brand bot name + avatar. See Channels for where each channel surfaces it. {anomaly_lead} answers “when did this start, how long has it been running?” — e.g. Anomalous for 2h 30m — 15 consecutive 10min intervals. (the metric interval, the true streak length and the wall-clock duration). The Rule: line sits right above the evidence it explains and restates the configured thresholds; {window_line} gives the problematic span as Anomaly began: … | Latest reading: … (and Anomaly began: … | Alert fired: … | Recovered: … on recovery — where Alert fired is the on-grid moment the rule first tripped, distinct from the onset). {expected_range} renders one-sided detector bounds cleanly (e.g. >= 7.00 for a lower-only manual_bounds) instead of [7.00, nan].

How long is “true”? The streak length / onset are resolved at fire time by looking back over the detection history until the run breaks (bounded — a run older than the lookback window renders as over …), so it reflects the real incident, not just the consecutive_anomalies points the rule needs. The recovery message reports the same span the just-cleared incident covered (Incident lasted …).

  1. Create template file in templates/ directory:
templates/custom_alert.j2
Alert: {{ metric_name }}
Current value: {{ value|round(2) }}
Expected range: [{{ confidence_lower|round(2) }}, {{ confidence_upper|round(2) }}]
Severity: {{ severity|round(2) }} ({{ direction }})
Detected by: {{ detector_name }}
Time: {{ timestamp }} {{ timezone }}
{% if consecutive_count > 1 %}
Persisting for {{ consecutive_count }} consecutive points!
{% endif %}
  1. Reference in metric config:
alerting:
template_consecutive: templates/custom_alert.j2
VariableDescriptionAvailable in
metric_nameMetric nameall
project_namedetectkit_project.yml name, empty if not setall — populated for every alert by the pipeline (v0.15.0)
project_name_prefix"[<project_name>] " if set, else empty; leads every default title/headline/subject so multiple projects on one channel stay distinctall (v0.15.0)
timestampTimestamp (formatted in {timezone})all
timezoneTimezone display nameall
valueCurrent metric value (numeric, or string "no data" for no-data)all
value_displayNaN-safe string version — always renders, falls back to "no data"all (v0.5.0)
confidence_lower / confidence_upperBounds of confidence intervalanomaly, recovery
confidence_intervalFormatted as [lower, upper] or "N/A"all
expected_rangeOne-sided aware expected band: >= lo, <= hi, [lo, hi], or "N/A". Renders one-sided detector bounds cleanly instead of [7.00, nan]all
detector_nameDetector that triggered (e.g., "MADDetector:threshold=3.0"); "N detectors" when several detectors formed the quorumanomaly, recovery
detector_paramsDetector parameters as a JSON string (empty for no-data/error)anomaly, recovery
detector_countObserved number of detectors that agreed (the quorum size that fired)anomaly
min_detectorsConfigured quorum threshold the alert fired on (the rule)anomaly, recovery
severitySeverity score; max across the quorum for multi-detector alertsanomaly
directionObserved/locked anomaly direction: "up" or "down"; also "mixed" for an any-policy quorum spanning both up and down, and "none" for no-data/recoveryall
direction_policyConfigured direction rule: "same", "any", "up", "down"anomaly, recovery
consecutive_countTrue consecutive streak length — resolved at fire time by looking back over the detection history, not capped at the rule threshold (recovery: the just-ended incident length)anomaly, recovery
consecutive_requiredConfigured consecutive threshold the alert fired on (the rule)anomaly, recovery
interval_displayMetric interval as a string (e.g. "10min") (v0.17.0)all
duration_displayHow long the streak/incident has run (e.g. "2h 30m"; "over …" when older than the lookback window) (v0.17.0)anomaly, recovery
onset_display / started_displayFirst anomalous timestamp of the run — the onset, not the alert-fire time (formatted in {timezone}); started_display adds "or earlier" when the run is capped (v0.17.0)anomaly, recovery
fired_displayOn-grid moment the alert first fired — onset + (consecutive_required − 1) × interval (formatted in {timezone}); empty when the run is capped or no interval is wired in (v0.35.0)recovery
anomaly_lead / recovery_leadReady-made plain-language lead — "Anomalous for …" / "… Incident lasted …" (falls back to "Latest X/Y consecutive points met the quorum." when no interval is wired in) (v0.17.0)anomaly, recovery
window_line"Anomaly began: … | Latest reading: …\n" (anomaly) / "Anomaly began: … | Alert fired: … | Recovered: …\n" (recovery), or a single "Detected at: …" line when the onset is unknown (v0.17.0; relabeled v0.35.0)all
status"ANOMALY", "RECOVERED", "NO_DATA", or "ERROR"all (v0.5.0 added NO_DATA / ERROR)
error_type / error_messageException detailserror only (v0.5.0)
description / description_lineMetric descriptionall
mentions / mentions_lineFormatted mentionsall
dashboard_urlRaw alerting.dashboard_url (empty string when unset); also surfaced natively as a clickable title on Slack/Mattermost, an inline link on Telegram, and an “Open dashboard” button in email (v0.13.0)all
dashboard_line"Dashboard: <url>\n" when set, else empty; appended to the default plain-text templates (v0.13.0)all
help_urlRaw “How to read this alert” URL (empty when hidden via alert_help_url: false); also surfaced natively as a clickable label in the webhook Links field, a links-line entry on Telegram, and a footer link in email (v0.16.0)all
help_line"How to read this alert: <url>\n" when set, else empty; appended to the default plain-text templates (v0.16.0)all

All variables are always substitutable in every alert kind — the “Available in” column marks where a value is meaningful, not where the placeholder is valid. Using a variable outside its listed kinds renders a neutral fallback rather than raising a KeyError.

Format-spec safety: if a template uses {value:.2f} (or any numeric format spec) on a no-data or error alert where there’s no real value, detectkit falls back to the kind-appropriate default template instead of crashing. Still cleaner to write kind-appropriate templates from the start.

  • template_single - Used when the alert has consecutive_count ≤ 1 (i.e. consecutive_anomalies: 1 configs)
  • template_consecutive - Used for streaks (consecutive_count > 1)
  • template_single and template_consecutive fall back to each other when only one is set
  • template_recovery - Used for recovery notifications
  • template_no_data - Used for no-data alerts
  • error_alerting.template - Used for project-level pipeline errors (in detectkit_project.yml)

Test alert configuration without waiting for real anomalies.

Terminal window
cd my_project
dtk test-alert api_response_time

This sends a mock alert through configured channels with fake data. The mock uses the alert config’s own rule (min_detectors / direction / consecutive_anomalies), so the preview matches what a real firing would look like — here with the defaults (min_detectors: 1, direction: same, consecutive_anomalies: 3):

🔴 [my_monitoring] Alert: api_response_time
Anomalous for 30m — 3 consecutive 10min intervals.
Rule: min_detectors=1 · direction=same · consecutive=3
Value: 0.8532 | Expected: [0.45, 0.62]
Quorum: 1/1 · up
Severity: 4.52
Anomaly began: 2026-06-12 14:10:00 (UTC) | Latest reading: 2026-06-12 14:30:00 (UTC)
Detectors: MADDetector:threshold=3.0
Parameters: {"threshold": 3.0, "window_size": 8640}

Use cases:

  • Verify webhook URLs work
  • Check alert formatting
  • Test custom templates
  • Validate channel permissions