Skip to content

Reading a detectkit alert

You probably landed here by clicking “How to read this alert” at the bottom of a notification in Slack, Mattermost, Telegram, or email. This page explains — in plain language — what that alert is telling you and what to do about it. No detectkit setup knowledge required.

  • A detectkit alert means one of your team’s metrics (an order count, an error rate, a signup number, …) just moved outside its normal range for long enough that it’s worth a human look.
  • The colored circle at the start of the title tells you the kind at a glance: 🔴 something looks wrong, 🟢 it’s back to normal, 🟡 the data stopped arriving, 🔵 the monitoring itself failed.
  • The alert shows the current value and the expected range. If the value is far outside the range, that’s the thing to look into.
  • You don’t need to fix detectkit — it’s just the messenger. Look at the metric it names and decide whether the business/system needs attention.

That’s enough to triage. The rest of this page explains each piece if you want to understand why it fired.

detectkit watches metrics over time and learns what “normal” looks like for each one — including daily and weekly rhythms (mornings are busier, weekends are quieter, and so on). When recent points fall outside that learned normal range, it sends one of these notifications. It is not a guess from a single weird data point: by default an alert only fires after several points in a row look abnormal and more than one independent check agrees (see Why did it fire? below).

The notification leads with the alert and the rule it fired on; the actual anomalous number is supporting evidence underneath.

Every alert title starts with a colored circle so you can read the status from color alone:

CircleStatusWhat it means
🔴AnomalyA metric moved outside its expected range and stayed there. This is the “please look” signal.
🟢RecoveredA previously-alerting metric is back inside its expected range. The incident is over — no action needed.
🟡No dataThe metric’s data stopped arriving for the latest period. Often a broken pipeline/job upstream, not the business metric itself.
🔵Pipeline errordetectkit’s own monitoring run failed (e.g. the database was unreachable). The metric might be fine — the monitor couldn’t check it.

The same colors are used on dashboards and accent bars, so 🔴 in chat, a red bar in email, and a red marker on a chart all mean the same thing.

A typical anomaly alert carries these fields. You don’t need all of them to triage — Value and Expected are usually enough — but here’s what each one means.

FieldPlain-language meaning
Metric (the title)Which metric fired. This is the thing to investigate. Often a clickable link to a dashboard.
Lead lineA one-sentence summary of how long this has been going on, e.g. “Anomalous for 2h 30m — 15 consecutive 10min intervals.” It tells you the metric’s measurement interval, how many points in a row have been abnormal, and the total wall-clock duration — so you instantly know whether it just started or has been running for hours. (over … means it’s been going on at least that long.)
ValueThe actual measured value at the flagged time.
ExpectedThe range detectkit considered normal for that moment. [12.0, 40.0] means “we expected somewhere between 12 and 40”. >= 100 / <= 5 are one-sided limits.
Anomaly began / Latest readingThe problematic stretch: when the anomaly first appeared and its most recent point. On a recovery the timeline is fuller — Anomaly began / Alert fired / Recovered: when the metric first went bad, when the rule first tripped and detectkit notified, and when it came back to normal. “Anomaly began” is the real onset, not when the alert fired — the two differ when the rule waits for several consecutive intervals. Together with the lead line, this is the “how long / since when” story.
SeverityRoughly how far outside normal the value was — bigger means more extreme. Use it to prioritize between several alerts, not as an exact unit.
QuorumHow many independent checks agreed it was abnormal (e.g. 2/2), and in which direction (up/down). More agreement = more confidence.
RuleThe condition that fired, shown as a chip: min_detectors=… · direction=… · consecutive=…how many checks had to agree, in which direction, for how many points in a row. It sits right above Value/Expected and appears on both 🔴 anomalies and 🟢 recoveries.
Detectors / ParametersThe technical checks that flagged it and their settings. Safe to ignore unless you’re tuning the monitoring.
[name] prefixIf the title starts with [something], that’s the project the alert came from — useful when several projects post to the same channel.

Value vs Expected — the one comparison that matters

Section titled “Value vs Expected — the one comparison that matters”

The fastest read is Value against Expected:

  • Value above the expected range → the metric spiked (e.g. errors jumped, latency rose).
  • Value below the expected range → the metric dropped (e.g. orders fell, signups stalled).
  • The further outside the range, and the higher the Severity, the more likely it’s real and worth acting on.

detectkit is deliberately conservative to avoid crying wolf. By default an anomaly alert requires several consecutive points to each look abnormal, and a quorum of independent checks to agree — so a single noisy reading won’t page anyone. The alert spells out the rule it fired on (for example: min_detectors=2, direction=same, consecutive=3 — “at least two checks agreed on the same direction, three points in a row”). If you see a 🔴, it cleared that bar.

A 🟢 recovery is sent once the metric comes back inside the expected range, so you know the incident closed without having to check yourself.

  1. Read the color. 🟢 means it’s already over. 🔵/🟡 point at the data pipeline, not (necessarily) the business.
  2. Look at Value vs Expected for a 🔴. How far out is it? Is the direction (up/down) good or bad for this metric?
  3. Open the dashboard if the alert links one (the title, an “Open dashboard” button/link) to see the trend around the flagged point.
  4. Decide and route. If it’s a real problem, loop in whoever owns that metric or system. If it’s expected (a launch, a known spike, a planned outage), you can ignore it — and the metric’s owner can tune the thresholds.
  5. You can’t break anything by ignoring it. detectkit keeps watching and will send a 🟢 when things normalize.

Is this an outage / does it page someone? Not by itself — it’s a heads-up that a watched number looks unusual. Whether it’s urgent depends on the metric and your team’s process.

The value looks fine to me — why did it alert? “Normal” is learned per metric and per time-of-day/week. A value that looks ordinary can still be unusual for that moment (e.g. very low traffic at peak hour).

Can I make these stop / change them? The person who set up detectkit for your team controls which metrics alert, how sensitive they are, and where they post. Share the alert with them.


For the people who configure these alerts: see the Alerting guide and Alert channels. The “How to read this alert” link points here by default and can be redirected to your own runbook (or hidden) with alert_help_url in detectkit_project.yml — see Configuration.