open source · python

Catch the spike
before it pages you.

Time-series anomaly detection and alerting with a dbt-like project layout. A metric is a SQL query plus a detector in YAML — run it with one command.

Get started pip install detectkit

3 stars 5.9k downloads/mo MIT licensed

api_error_rate last 24h · 5min

metric expected range anomaly recovery

Anomaly · api_error_rate

value 4.2 · expected ≤ 1.1 → Slack

Recovered · api_error_rate

value 1.0 · back within range → Slack

Works with

ClickHouse PostgreSQL MySQL

dtk init-claude

AI-native: build metrics with an assistant, out of the box.

One command sets up Claude Code for your project folder — a CLAUDE.md, a .claude/rules/detectkit/ reference, and four skills: dtk-setup-project (configure your database), dtk-new-metric (scaffold a metric), dtk-autotune (auto-tune a detector against labeled incidents), and dtk-feedback (file a redacted bug report or feature request upstream). Now an assistant writes metrics, tunes detectors, wires up alerts, and reports issues with full knowledge of detectkit. Re-run it after an upgrade to refresh the context.

dtk init-claude

Target: ~/monitoring

┌─ CLAUDE.md

└─ detectkit section created

┌─ .claude/rules/detectkit/

│ alerting.md (created)

│ autotune.md (created)

│ cli.md (created)

│ detectors.md (created)

│ metrics.md (created)

│ overview.md (created)

└─ project.md (created)

┌─ .claude/skills/

│ dtk-autotune/SKILL.md (created)

│ dtk-feedback/SKILL.md (created)

│ dtk-new-metric/SKILL.md (created)

└─ dtk-setup-project/SKILL.md (created)

Done. Claude context ready (12 created).

configuration

From a SQL query to a caught anomaly.

A metric is just a query plus a detector in YAML. dtk run handles the corridor, the quorum and the alert — nothing else to wire up.

metrics/api_errors.yml YAML

# metrics/api_errors.yml

name: api_error_rate

interval: "5min"

query: |

SELECT

toStartOfInterval(timestamp, INTERVAL 5 MINUTE) AS timestamp,

countIf(status_code >= 500) / count() * 100 AS value

FROM http_requests

WHERE timestamp >= '{{ dtk_start_time }}'

AND timestamp < '{{ dtk_end_time }}'

GROUP BY timestamp ORDER BY timestamp

detectors:

- type: mad

params:

threshold: 3.0

window_size: 2016 # 7d of 5-min points

window_weights: exponential

half_life: "1d"

alerting:

enabled: true

channels: [mattermost_ops]

consecutive_anomalies: 3

direction: "up"

mentions: [oncall_engineer, here]

$ dtk run --select api_error_rate▋

LOAD 12 points · resumed

DETECT 1 anomaly · detector mad

ALERT ✓ sent to mattermost_ops

api_error_rate anomaly

value 4.2 · expected ≤ 1.1 · severity 3.40

✓ pipeline completed 1 metric · 1 detector · 0.4s idempotent · resumable

SQL on your warehouse detector in YAML one command version-controlled

dtk run

A load → detect → alert run, in one tree.

The real output of dtk run — a load → detect → alert tree with cyan step headers and colored status lines. Idempotent: it resumes from the last saved point.

LOAD

Run the SQL on your warehouse, in batches, from the last checkpoint.

DETECT

Each detector scores points against its learned corridor of normal.

ALERT

Quorum met → post to chat with the rule up top, recovery on the way back.

dtk run --select api_error_rate

Project root: ~/monitoring

Found 1 metric(s) to process

Processing metric: api_error_rate

Config file: metrics/api_errors.yml

Steps: load, detect, alert

┌─ LOAD

│ Resuming from last saved: 2026-06-19 11:55:00

│ Loading from 2026-06-19 12:00:00 to 2026-06-19 12:05:00

│ Total points: ~1 | Batch size: 10,000

│ Loading in single batch...

└─ Loaded 1 datapoints

┌─ DETECT

│ Running 1 detector(s)...

│

│ [1/1] Detector: mad

│ Detecting from 2026-06-19 12:00:00 to 2026-06-19 12:05:00

│ Total points: ~1 | Batch size: 1,000

│ └─ Detected 1 anomalies

└─ Total anomalies: 1

┌─ ALERT

│ Checking alert conditions...

│ ⚠ Alert triggered! Sending to 1 channel(s)...

│ ✓ mattermost_ops

└─ Sent 1/1 alerts

✓ Pipeline completed successfully

detectors

Robust statistics, not magic.

Every detector learns a corridor of normal from recent history, then flags the moment a metric steps outside it. Switch the detector to see the kind of metric it's built for.

metric expected range anomaly

anomaly · 3.40 σ

7 days ago now

mad

Median absolute deviation

Measures the typical distance from the median. A handful of wild spikes barely move it — the most robust default.

Corridor

median ± 3 × MAD

Best for

Spiky, noisy metrics with outliers in their history.

metric expected range anomaly

anomaly · z = 4.1

7 days ago now

zscore

Z-score

Classic mean ± k standard deviations. Fast and simple, but one big outlier inflates the band — keep it for clean data.

Corridor

mean ± 3 × σ

Best for

Clean, roughly bell-shaped metrics.

metric expected range anomaly

anomaly · beyond fence

7 days ago now

iqr

Interquartile range

Builds the corridor from the middle 50% of values, then extends fences 1.5×IQR out. Comfortable with skewed, long-tailed data.

Corridor

[ Q1 − 1.5·IQR , Q3 + 1.5·IQR ]

Best for

Skewed distributions and one-sided outliers.

metric expected range anomaly

anomaly · > max

max min

7 days ago now

manual_bounds

Manual bounds

No statistics at all — you set hard floor and ceiling values. Alerts the instant a metric crosses a known SLA line.

Corridor

value < min or value > max

Best for

Known SLAs and hard business limits.

mad zscore iqr manual_bounds

// the corridor is recomputed per window with seasonality grouping & recency weighting — newer points count more
// each detector is shown on the metric shape it handles best — robust, bell-shaped, skewed or hard-bounded

try it · live

Shape a metric, then watch detection happen.

An interactive sandbox running the actual detectkit detector in your browser. Dial in a series that looks like one of yours, turn the detector's real knobs, and watch the corridor of normal, what gets flagged, and whether an alert would fire — nothing is sent anywhere.

Open the interactive playground → mad / zscore / iqr · live corridor · alert preview

autotune · labeling

Teach it your incidents — point, drag, done.

Autotune already tunes well with zero labels. To optimise against your real incidents, run dtk autotune --select <metric> --label — it opens a chart where you drag across each incident, add a note, and Export one self-contained file: offline, nothing leaves your browser, every round versioned.

Open the labeler demo → drag to label · scroll to zoom · export & re-tune

alerting

Alerts that lead with the rule that fired.

Direction-aware multi-detector quorum, cooldown, recovery and no-data alerts — posted to chat with the alert and its rule up top, anomaly evidence below.

Slack Mattermost Telegram Email

Anomaly Recovery

The same alert, posted by detectkit to each channel — rendered as that channel formats it: a fields attachment on Slack/Mattermost, escaped HTML on Telegram, a branded card in email. Each leads with the project name ([payments]) so several projects can share one channel while keeping the brand bot identity. The dashboard_url below becomes a first-class link on every channel.

detectkitAPP12:04

@oncall_engineer @here

🔴 [payments] Alert: api_error_rate ↗

Anomalous for 1h — 6 consecutive 10min intervals.
Rule min_detectors=1 · direction=same · consecutive=3

Value

4.2

Expected

<= 1.1

Quorum

1/1 · above

Severity

3.40

Started

2026-06-19 11:14:00 (Europe/Moscow)

Latest

2026-06-19 12:04:00 (Europe/Moscow)

Detectors

mad

Parameters

{"threshold": 3.0, "window_size": 2016, "half_life": "1d"}

Links

Dashboard · How to read this alert

detectkitAPP12:04

@oncall_engineer @here

🟢 [payments] Alert cleared: api_error_rate ↗

The alert condition no longer holds — the metric is back within expected bounds. Incident lasted 1h (6 consecutive 10min intervals).
Rule min_detectors=1 · direction=same · consecutive=3

Value

1.0

Expected

<= 1.1

Started

2026-06-19 11:36:00 (Europe/Moscow)

Cleared

2026-06-19 12:36:00 (Europe/Moscow)

Detectors

mad

Links

Dashboard · How to read this alert

detectkitBOT12:04

@oncall_engineer @here

🔴 [payments] Alert: api_error_rate ↗

Anomalous for 1h — 6 consecutive 10min intervals.
Rule min_detectors=1 · direction=same · consecutive=3

Value

4.2

Expected

<= 1.1

Quorum

1/1 · above

Severity

3.40

Started

2026-06-19 11:14:00 (Europe/Moscow)

Latest

2026-06-19 12:04:00 (Europe/Moscow)

Detectors

mad

Parameters

{"threshold": 3.0, "window_size": 2016, "half_life": "1d"}

Links

Dashboard · How to read this alert

detectkitBOT12:04

@oncall_engineer @here

🟢 [payments] Alert cleared: api_error_rate ↗

The alert condition no longer holds — the metric is back within expected bounds. Incident lasted 1h (6 consecutive 10min intervals).
Rule min_detectors=1 · direction=same · consecutive=3

Value

1.0

Expected

<= 1.1

Started

2026-06-19 11:36:00 (Europe/Moscow)

Cleared

2026-06-19 12:36:00 (Europe/Moscow)

Detectors

mad

Links

Dashboard · How to read this alert

detectkit

🔴 [payments] Anomaly · api_error_rate Anomalous for 1h — 6 consecutive 10min intervals. Rule min_detectors=1 · direction=same · consecutive=3 • Value: 4.2 · Expected: <= 1.1 • Quorum: 1/1 · above • Severity: 3.40 • Anomaly began: 2026-06-19 11:14:00 (Europe/Moscow) · Latest reading: 2026-06-19 12:04:00 (Europe/Moscow) • Detector: mad • Parameters: {"threshold": 3.0, "window_size": 2016, "half_life": "1d"} Open dashboard · How to read this alert @oncall_engineer

12:04

detectkit

🟢 [payments] Recovered · api_error_rate The alert condition no longer holds — the metric is back within expected bounds. Incident lasted 1h (6 consecutive 10min intervals). Rule min_detectors=1 · direction=same · consecutive=3 • Value: 1.0 · Expected: <= 1.1 • Anomaly began: 2026-06-19 11:36:00 (Europe/Moscow) · Alert fired: 2026-06-19 11:56:00 (Europe/Moscow) · Recovered: 2026-06-19 12:36:00 (Europe/Moscow) • Detector: mad Open dashboard · How to read this alert @oncall_engineer

12:04

detectkit alerts <alerts@detectkit.dev>

to team@example.com · 12:04

detectkitANOMALY

payments

api_error_rate

Anomalous for 1h — 6 consecutive 10min intervals.
Rule min_detectors=1 · direction=same · consecutive=3

Value

4.2

Expected

<= 1.1

Severity

3.40

Quorum

1/1 · above

Started

2026-06-19 11:14:00 (Europe/Moscow)

Latest

2026-06-19 12:04:00 (Europe/Moscow)

Detector · mad

{"threshold": 3.0, "window_size": 2016, "half_life": "1d"}

Open dashboard →

Sent by detectkit · payments · CC: oncall_engineer · How to read this alert →

detectkit alerts <alerts@detectkit.dev>

to team@example.com · 12:04

detectkitRECOVERED

payments

api_error_rate

The alert condition no longer holds — the metric is back within expected bounds. Incident lasted 1h (6 consecutive 10min intervals).
Rule min_detectors=1 · direction=same · consecutive=3

Value

1.0

Expected

<= 1.1

Started

2026-06-19 11:36:00 (Europe/Moscow)

Cleared

2026-06-19 12:36:00 (Europe/Moscow)

Detector

mad

Open dashboard →

Sent by detectkit · payments · CC: oncall_engineer · How to read this alert →

alerting:
  channels: [mattermost_ops]
  dashboard_url: https://grafana.ops/d/api-errors   # one line → a link on every channel

Ship your first detector in five minutes.

SQL + YAML, one command. No agents, no dashboards to babysit.

Get started pip install detectkit

Catch the spikebefore it pages you.

AI-native: build metrics with an assistant, out of the box.

From a SQL query to a caught anomaly.

A load → detect → alert run, in one tree.

Robust statistics, not magic.

Shape a metric, then watch detection happen.

Teach it your incidents — point, drag, done.

Alerts that lead with the rule that fired.

Ship your first detector in five minutes.

Catch the spike
before it pages you.