Skip to content

Overview

detectkit - Metric monitoring with automatic anomaly detection.

A Python library and CLI tool for data analysts and engineers to monitor time-series metrics with automatic anomaly detection and multi-channel alerting.

Terminal window
pip install detectkit[clickhouse]

Fastest path: let Claude Code set it up for you — run dtk init-claude, then the dtk-setup-project skill configures profiles.yml interactively for your database. See Quickstart → Fastest start.

Terminal window
# Initialize project
dtk init my_monitoring
cd my_monitoring
# (Optional) Set up Claude Code context for working with detectkit
dtk init-claude
# Edit profiles.yml (add database connection)
# Create metric config
cat > metrics/cpu_usage.yml <<EOF
name: cpu_usage
interval: 1min
query: "SELECT timestamp, cpu_percent AS value FROM system_metrics WHERE timestamp >= '{{ dtk_start_time }}' AND timestamp < '{{ dtk_end_time }}' ORDER BY timestamp"
detectors:
- type: mad
params:
threshold: 3.0
window_size: 1440
alerting:
enabled: true
channels:
- mattermost_ops
EOF
# Run
dtk run --select cpu_usage

Not sure which detector or threshold to use? After loading some history, let dtk autotune pick the detector type, hyperparameters and seasonality for you — see Auto-tuning.

  • Configuration - Complete configuration reference
  • Detectors - Choosing and configuring detectors
  • Auto-tuning - Let dtk autotune pick the detector, params and seasonality for you
  • Tuning by hand - Interactively tune a detector on its real data and write it back (dtk tune, the manual sibling of autotune)
  • Alerting - Setting up alerts and notifications
  • Reading an alert - For stakeholders who receive alerts: what they mean and what to do
  • Visualizing results - Build dashboards/charts on the _dtk_* tables in any BI tool
  • Examples - Real-world monitoring scenarios
    • Infrastructure monitoring (CPU, memory, disk)
    • Application monitoring (latency, errors, throughput)
    • Business metrics (users, revenue, conversions)
    • Advanced patterns (seasonality, multi-detector)

Multiple detector types for different data patterns:

  • MAD - Robust, general-purpose, supports seasonality
  • Z-Score - Fast, sensitive on normal distributions
  • IQR - Excellent for skewed distributions
  • Manual Bounds - Simple threshold-based detection

All windowed detectors (MAD, Z-Score, IQR) also support recency weighting (window_weights + half_life) and robust linear detrending (detrend) for metrics with a gradual trend.

Learn more →

Handle time-based patterns automatically:

seasonality_columns:
- hour
- day_of_week
detectors:
- type: mad
params:
seasonality_components:
- ["hour", "day_of_week"]

Learn more →

Send alerts to multiple platforms:

  • Mattermost - Team collaboration
  • Slack - Team notifications
  • Telegram - Mobile alerts
  • Email - Traditional notifications
  • Webhook - Generic HTTP endpoint
alerting:
channels:
- mattermost_ops
- slack_critical
- email_oncall
min_detectors: 1 # Quorum: detectors that must agree per point
consecutive_anomalies: 3 # Require confirmation
direction: "up" # Only alert on increases
alert_cooldown: "2h" # Recommended: without it a persisting anomaly re-alerts every run
notify_on_recovery: true # Send a follow-up when the metric recovers
no_data_alert: false # Alert when expected data is missing
# suppress_until: "2026-07-01" # Mute this config until a date
mentions: # Users/groups to @-mention
- "@oncall"

Define multiple independent blocks by giving alerting: a list — each block has its own channels and conditions, and keeps independent cooldown/recovery state:

alerting:
- enabled: true
channels:
- mattermost_ops
consecutive_anomalies: 3
- enabled: true
channels:
- slack_critical
consecutive_anomalies: 1 # More sensitive for this channel
direction: "up" # Only upward anomalies

Learn more →

  • Batch processing - Handle large datasets efficiently
  • Incremental loading - Only load new data
  • Idempotent operations - Safe to re-run
  • numpy-based detectors - numpy core, no pandas (windowed detectors use a per-point loop — fine incrementally, slower for large backfills)

Works with your existing data warehouse — all three backends are first-class:

  • ClickHouse - native protocol, detectkit[clickhouse]
  • PostgreSQL - 12+, detectkit[postgres]
  • MySQL - 8.0+, detectkit[mysql]

Only the connection and the SQL dialect of your metric queries differ; detectors, alerting and the CLI are identical. See the Databases guide for the per-backend breakdown.

dtk run
┌──────────────────────────────────────────────────┐
│ Pipeline orchestration │
│ load → detect → alert │
└──────────────────────────────────────────────────┘
├─▶ Data source ClickHouse query, gap-filled to the grid
├─▶ Detectors MAD · Z-Score · IQR · manual_bounds
└─▶ Channels Mattermost · Slack · Telegram · Email · Webhook
┌──────────────────────────────────────────────────┐
│ Internal _dtk_* tables │
│ _dtk_datapoints loaded points │
│ _dtk_detections detection results │
│ _dtk_tasks run / lock state │
└──────────────────────────────────────────────────┘

Monitor system resources:

# CPU, memory, disk, network
detectors:
- type: manual_bounds
params:
upper_bound: 90.0
- type: zscore
params:
threshold: 3.0

Example →

Track application health:

# Response time, error rate, throughput
detectors:
- type: iqr
params:
threshold: 1.5
window_size: 1440

Example →

Monitor KPIs:

# Users, revenue, conversions
detectors:
- type: mad
params:
threshold: 3.0
seasonality_components:
- "day_of_week"

Example →

Terminal window
# Run all metrics (typically in cron)
dtk run --select "*"
Terminal window
# Load data only
dtk run --select cpu_usage --steps load
# Detect without loading new data
dtk run --select cpu_usage --steps detect
Terminal window
# Load last 30 days
dtk run --select cpu_usage --from "2024-01-01"
Terminal window
# Test alert channels
dtk test-alert cpu_usage
Terminal window
# Clear a stuck lock left by a crashed run (e.g. DB restarted mid-run)
dtk unlock --select cpu_usage
Terminal window
# Prune detector/alert data orphaned by a config change (dry-run by default)
dtk clean --select cpu_usage
dtk clean --select cpu_usage --execute
# Purge data for metrics no longer defined in the project
dtk clean --orphaned-metrics --execute
Terminal window
# Scaffold AI-assistant context (CLAUDE.md, rules, skills) so Claude Code
# can help create metrics, tune detectors and run the pipeline. The content
# ships with detectkit and is idempotent — re-run it after upgrading.
dtk init-claude

Full CLI Reference →

detectkit uses three main configuration files:

Project-level settings:

name: my_monitoring
version: '1.0'
default_profile: prod
tables:
datapoints: _dtk_datapoints
detections: _dtk_detections
tasks: _dtk_tasks
timeouts:
load: 1800 # 30 minutes
detect: 3600 # 1 hour
alert: 300 # 5 minutes

Database connections and alert channels:

profiles:
prod:
type: clickhouse
host: localhost
port: 9000
internal_database: analytics
data_database: default
alert_channels:
mattermost_ops:
type: mattermost
webhook_url: "https://mattermost.example.com/hooks/xxx"

Individual metric definitions:

name: cpu_usage
interval: 1min
query: "..."
detectors:
- type: mad
params:
threshold: 3.0
alerting:
enabled: true
channels:
- mattermost_ops

Full Configuration Guide →

DetectorBest ForRobustnessSeasonalitySpeed
MADGeneral-purpose, seasonal dataHighYesFast
Z-ScoreNormal distributionsLowYesVery Fast
IQRSkewed distributionsHighYesFast
Manual BoundsKnown thresholdsN/ANoFastest

Choosing a Detector →

Detectors are fast enough for routine incremental runs, so choose primarily on accuracy. Note that the windowed detectors (MAD, Z-Score, IQR) use a per-point loop, which is fine incrementally but can be slow for large historical backfills over big windows.

MAD is a safe default for most metrics:

detectors:
- type: mad
params:
threshold: 3.0
window_size: 100

2. Add Seasonality for Time-Based Patterns

Section titled “2. Add Seasonality for Time-Based Patterns”

If your metric varies by hour/day/week:

seasonality_columns:
- hour
detectors:
- type: mad
params:
seasonality_components:
- "hour"

If your metric has a gradual trend (slow growth or decline), use recency weighting and/or detrending so the drift itself is not flagged:

detectors:
- type: mad
params:
window_weights: exponential
half_life: "3d" # weight halves every 3 days of age
detrend: linear # optional: remove in-window linear trend

Reduce false positives:

alerting:
consecutive_anomalies: 3 # Wait for confirmation

Only alert on meaningful changes:

alerting:
direction: "up" # Only alert on increases (e.g., errors, latency)
# direction: "down" # Or only on decreases (e.g., users, revenue)
Terminal window
# Test query
dtk run --select my_metric --steps load
# Test detection
dtk run --select my_metric --steps detect
# Test alert
dtk test-alert my_metric

More Best Practices →

Check:

  1. alerting.enabled: true
  2. Recent anomalies detected (query _dtk_detections)
  3. Consecutive threshold met
  4. Webhook URLs correct
Terminal window
dtk test-alert my_metric

Solutions:

  1. Increase threshold parameter
  2. Increase consecutive_anomalies
  3. Add seasonality_components (if metric is seasonal)
  4. Use direction filter

Full Troubleshooting →

MIT License - see LICENSE file for details.

Contributions welcome! Please:

  1. Open an issue to discuss changes
  2. Fork and create pull request
  3. Ensure tests pass
  4. Follow existing code style

See CHANGELOG.md for complete version history.


Get Started →