Overview

detectkit - Metric monitoring with automatic anomaly detection.

A Python library and CLI tool for data analysts and engineers to monitor time-series metrics with automatic anomaly detection and multi-channel alerting.

Quick Links

Installation - Install detectkit
Quickstart - Create your first metric in 5 minutes
Examples - Common monitoring scenarios
CLI Reference - Complete CLI documentation

Getting Started

Installation

pip install detectkit[clickhouse]

First Metric

Fastest path: let Claude Code set it up for you — run dtk init-claude, then the dtk-setup-project skill configures profiles.yml interactively for your database. See Quickstart → Fastest start.

# Initialize project
dtk init my_monitoring
cd my_monitoring

# (Optional) Set up Claude Code context for working with detectkit
dtk init-claude

# Edit profiles.yml (add database connection)

# Create metric config
cat > metrics/cpu_usage.yml <<EOF
name: cpu_usage
interval: 1min
query: "SELECT timestamp, cpu_percent AS value FROM system_metrics WHERE timestamp >= '{{ dtk_start_time }}' AND timestamp < '{{ dtk_end_time }}' ORDER BY timestamp"

detectors:
  - type: mad
    params:
      threshold: 3.0
      window_size: 1440

alerting:
  enabled: true
  channels:
    - mattermost_ops
EOF

# Run
dtk run --select cpu_usage

Not sure which detector or threshold to use? After loading some history, let dtk autotune pick the detector type, hyperparameters and seasonality for you — see Auto-tuning.

Documentation Structure

Getting Started

Installation - Install detectkit and dependencies
Quickstart - Create your first metric

Guides

Configuration - Complete configuration reference
Detectors - Choosing and configuring detectors
Auto-tuning - Let dtk autotune pick the detector, params and seasonality for you
Tuning by hand - Interactively tune a detector on its real data and write it back (dtk tune, the manual sibling of autotune)
Alerting - Setting up alerts and notifications
Reading an alert - For stakeholders who receive alerts: what they mean and what to do
Visualizing results - Build dashboards/charts on the _dtk_* tables in any BI tool

Reference

CLI Reference - Command-line interface documentation
Auto-tune Reference - dtk autotune flags, labels schema, autotune: block, scoring metrics
Internal Tables - Schema of the _dtk_* tables (columns, primary keys, engines)
Detectors - Detector-specific documentation
- Shared Parameters — preprocessing, weighting, detrending, identity (MAD/Z-Score/IQR)
- MAD Detector
- Z-Score Detector
- IQR Detector
- Manual Bounds Detector

Examples

Examples - Real-world monitoring scenarios
- Infrastructure monitoring (CPU, memory, disk)
- Application monitoring (latency, errors, throughput)
- Business metrics (users, revenue, conversions)
- Advanced patterns (seasonality, multi-detector)

Key Features

Statistical Detectors

Multiple detector types for different data patterns:

MAD - Robust, general-purpose, supports seasonality
Z-Score - Fast, sensitive on normal distributions
IQR - Excellent for skewed distributions
Manual Bounds - Simple threshold-based detection

All windowed detectors (MAD, Z-Score, IQR) also support recency weighting (window_weights + half_life) and robust linear detrending (detrend) for metrics with a gradual trend.

Learn more →

Seasonality Support

Handle time-based patterns automatically:

seasonality_columns:
  - hour
  - day_of_week

detectors:
  - type: mad
    params:
      seasonality_components:
        - ["hour", "day_of_week"]

Learn more →

Multi-Channel Alerting

Send alerts to multiple platforms:

Mattermost - Team collaboration
Slack - Team notifications
Telegram - Mobile alerts
Email - Traditional notifications
Webhook - Generic HTTP endpoint

alerting:
  channels:
    - mattermost_ops
    - slack_critical
    - email_oncall
  min_detectors: 1            # Quorum: detectors that must agree per point
  consecutive_anomalies: 3    # Require confirmation
  direction: "up"             # Only alert on increases
  alert_cooldown: "2h"        # Recommended: without it a persisting anomaly re-alerts every run
  notify_on_recovery: true    # Send a follow-up when the metric recovers
  no_data_alert: false        # Alert when expected data is missing
  # suppress_until: "2026-07-01"  # Mute this config until a date
  mentions:                   # Users/groups to @-mention
    - "@oncall"

Define multiple independent blocks by giving alerting: a list — each block has its own channels and conditions, and keeps independent cooldown/recovery state:

alerting:
  - enabled: true
    channels:
      - mattermost_ops
    consecutive_anomalies: 3

  - enabled: true
    channels:
      - slack_critical
    consecutive_anomalies: 1   # More sensitive for this channel
    direction: "up"            # Only upward anomalies

Learn more →

Efficient Processing

Batch processing - Handle large datasets efficiently
Incremental loading - Only load new data
Idempotent operations - Safe to re-run
numpy-based detectors - numpy core, no pandas (windowed detectors use a per-point loop — fine incrementally, slower for large backfills)

Database Support

Works with your existing data warehouse — all three backends are first-class:

ClickHouse - native protocol, detectkit[clickhouse]
PostgreSQL - 12+, detectkit[postgres]
MySQL - 8.0+, detectkit[mysql]

Only the connection and the SQL dialect of your metric queries differ; detectors, alerting and the CLI are identical. See the Databases guide for the per-backend breakdown.

Architecture

   dtk run
      │
      ▼
   ┌──────────────────────────────────────────────────┐
   │  Pipeline orchestration                          │
   │  load  →  detect  →  alert                       │
   └──────────────────────────────────────────────────┘
      │
      ├─▶  Data source   ClickHouse query, gap-filled to the grid
      ├─▶  Detectors     MAD · Z-Score · IQR · manual_bounds
      └─▶  Channels      Mattermost · Slack · Telegram · Email · Webhook
      │
      ▼
   ┌──────────────────────────────────────────────────┐
   │  Internal _dtk_* tables                          │
   │    _dtk_datapoints   loaded points               │
   │    _dtk_detections   detection results           │
   │    _dtk_tasks        run / lock state            │
   └──────────────────────────────────────────────────┘

Use Cases

Infrastructure Monitoring

Monitor system resources:

# CPU, memory, disk, network
detectors:
  - type: manual_bounds
    params:
      upper_bound: 90.0
  - type: zscore
    params:
      threshold: 3.0

Example →

Application Monitoring

Track application health:

# Response time, error rate, throughput
detectors:
  - type: iqr
    params:
      threshold: 1.5
      window_size: 1440

Example →

Business Metrics

Monitor KPIs:

# Users, revenue, conversions
detectors:
  - type: mad
    params:
      threshold: 3.0
      seasonality_components:
        - "day_of_week"

Example →

Common Workflows

Daily Monitoring

# Run all metrics (typically in cron)
dtk run --select "*"

Partial Pipeline

# Load data only
dtk run --select cpu_usage --steps load

# Detect without loading new data
dtk run --select cpu_usage --steps detect

Historical Backfill

# Load last 30 days
dtk run --select cpu_usage --from "2024-01-01"

Testing

# Test alert channels
dtk test-alert cpu_usage

Recovery

# Clear a stuck lock left by a crashed run (e.g. DB restarted mid-run)
dtk unlock --select cpu_usage

Cleanup After Editing Configs

# Prune detector/alert data orphaned by a config change (dry-run by default)
dtk clean --select cpu_usage
dtk clean --select cpu_usage --execute

# Purge data for metrics no longer defined in the project
dtk clean --orphaned-metrics --execute

Claude Code Context

# Scaffold AI-assistant context (CLAUDE.md, rules, skills) so Claude Code
# can help create metrics, tune detectors and run the pipeline. The content
# ships with detectkit and is idempotent — re-run it after upgrading.
dtk init-claude

Full CLI Reference →

Configuration Files

detectkit uses three main configuration files:

1. `detectkit_project.yml`

Project-level settings:

name: my_monitoring
version: '1.0'

default_profile: prod

tables:
  datapoints: _dtk_datapoints
  detections: _dtk_detections
  tasks: _dtk_tasks

timeouts:
  load: 1800      # 30 minutes
  detect: 3600    # 1 hour
  alert: 300      # 5 minutes

2. `profiles.yml`

Database connections and alert channels:

profiles:
  prod:
    type: clickhouse
    host: localhost
    port: 9000
    internal_database: analytics
    data_database: default

alert_channels:
  mattermost_ops:
    type: mattermost
    webhook_url: "https://mattermost.example.com/hooks/xxx"

3. `metrics/*.yml`

Individual metric definitions:

name: cpu_usage
interval: 1min
query: "..."

detectors:
  - type: mad
    params:
      threshold: 3.0

alerting:
  enabled: true
  channels:
    - mattermost_ops

Full Configuration Guide →

Detector Comparison

Detector	Best For	Robustness	Seasonality	Speed
MAD	General-purpose, seasonal data	High	Yes	Fast
Z-Score	Normal distributions	Low	Yes	Very Fast
IQR	Skewed distributions	High	Yes	Fast
Manual Bounds	Known thresholds	N/A	No	Fastest

Choosing a Detector →

Performance

Detectors are fast enough for routine incremental runs, so choose primarily on accuracy. Note that the windowed detectors (MAD, Z-Score, IQR) use a per-point loop, which is fine incrementally but can be slow for large historical backfills over big windows.

Best Practices

1. Start with MAD

MAD is a safe default for most metrics:

detectors:
  - type: mad
    params:
      threshold: 3.0
      window_size: 100

2. Add Seasonality for Time-Based Patterns

If your metric varies by hour/day/week:

seasonality_columns:
  - hour

detectors:
  - type: mad
    params:
      seasonality_components:
        - "hour"

If your metric has a gradual trend (slow growth or decline), use recency weighting and/or detrending so the drift itself is not flagged:

detectors:
  - type: mad
    params:
      window_weights: exponential
      half_life: "3d"     # weight halves every 3 days of age
      detrend: linear     # optional: remove in-window linear trend

4. Use Consecutive Anomalies

Reduce false positives:

alerting:
  consecutive_anomalies: 3  # Wait for confirmation

5. Filter by Direction

Only alert on meaningful changes:

alerting:
  direction: "up"    # Only alert on increases (e.g., errors, latency)
  # direction: "down"  # Or only on decreases (e.g., users, revenue)

6. Test Before Production

# Test query
dtk run --select my_metric --steps load

# Test detection
dtk run --select my_metric --steps detect

# Test alert
dtk test-alert my_metric

More Best Practices →

Troubleshooting

No Alerts Received

Check:

alerting.enabled: true
Recent anomalies detected (query _dtk_detections)
Consecutive threshold met
Webhook URLs correct

dtk test-alert my_metric

Too Many False Positives

Solutions:

Increase threshold parameter
Increase consecutive_anomalies
Add seasonality_components (if metric is seasonal)
Use direction filter

Full Troubleshooting →

Getting Help

Documentation: You’re reading it!
Issues: https://github.com/alexeiveselov92/detectkit/issues
PyPI: https://pypi.org/project/detectkit/

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please:

Open an issue to discuss changes
Fork and create pull request
Ensure tests pass
Follow existing code style

Changelog

See CHANGELOG.md for complete version history.

Get Started →

Overview

Quick Links

Getting Started

Installation

First Metric

Documentation Structure

Getting Started

Guides

Reference

Examples

Key Features

Statistical Detectors

Seasonality Support

Multi-Channel Alerting

Efficient Processing

Database Support

Architecture

Use Cases

Infrastructure Monitoring

Application Monitoring

Business Metrics

Common Workflows

Daily Monitoring

Partial Pipeline

Historical Backfill

Testing

Recovery

Cleanup After Editing Configs

Claude Code Context

Configuration Files

1. detectkit_project.yml

2. profiles.yml

3. metrics/*.yml

Detector Comparison

Performance

Best Practices

1. Start with MAD

2. Add Seasonality for Time-Based Patterns

3. Handle Trending Metrics

4. Use Consecutive Anomalies

5. Filter by Direction

6. Test Before Production

Troubleshooting

No Alerts Received

Too Many False Positives

Getting Help

License

Contributing

Changelog

1. `detectkit_project.yml`

2. `profiles.yml`

3. `metrics/*.yml`