Overview
detectkit - Metric monitoring with automatic anomaly detection.
A Python library and CLI tool for data analysts and engineers to monitor time-series metrics with automatic anomaly detection and multi-channel alerting.
Quick Links
Section titled “Quick Links”- Installation - Install detectkit
- Quickstart - Create your first metric in 5 minutes
- Examples - Common monitoring scenarios
- CLI Reference - Complete CLI documentation
Getting Started
Section titled “Getting Started”Installation
Section titled “Installation”pip install detectkit[clickhouse]First Metric
Section titled “First Metric”Fastest path: let Claude Code set it up for you — run
dtk init-claude, then thedtk-setup-projectskill configuresprofiles.ymlinteractively for your database. See Quickstart → Fastest start.
# Initialize projectdtk init my_monitoringcd my_monitoring
# (Optional) Set up Claude Code context for working with detectkitdtk init-claude
# Edit profiles.yml (add database connection)
# Create metric configcat > metrics/cpu_usage.yml <<EOFname: cpu_usageinterval: 1minquery: "SELECT timestamp, cpu_percent AS value FROM system_metrics WHERE timestamp >= '{{ dtk_start_time }}' AND timestamp < '{{ dtk_end_time }}' ORDER BY timestamp"
detectors: - type: mad params: threshold: 3.0 window_size: 1440
alerting: enabled: true channels: - mattermost_opsEOF
# Rundtk run --select cpu_usageNot sure which detector or threshold to use? After loading some history, let
dtk autotunepick the detector type, hyperparameters and seasonality for you — see Auto-tuning.
Documentation Structure
Section titled “Documentation Structure”Getting Started
Section titled “Getting Started”- Installation - Install detectkit and dependencies
- Quickstart - Create your first metric
Guides
Section titled “Guides”- Configuration - Complete configuration reference
- Detectors - Choosing and configuring detectors
- Auto-tuning - Let
dtk autotunepick the detector, params and seasonality for you - Tuning by hand - Interactively tune a detector on its real data and write it back (
dtk tune, the manual sibling of autotune) - Alerting - Setting up alerts and notifications
- Reading an alert - For stakeholders who receive alerts: what they mean and what to do
- Visualizing results - Build dashboards/charts on the
_dtk_*tables in any BI tool
Reference
Section titled “Reference”- CLI Reference - Command-line interface documentation
- Auto-tune Reference -
dtk autotuneflags, labels schema,autotune:block, scoring metrics - Internal Tables - Schema of the
_dtk_*tables (columns, primary keys, engines) - Detectors - Detector-specific documentation
- Shared Parameters — preprocessing, weighting, detrending, identity (MAD/Z-Score/IQR)
- MAD Detector
- Z-Score Detector
- IQR Detector
- Manual Bounds Detector
Examples
Section titled “Examples”- Examples - Real-world monitoring scenarios
- Infrastructure monitoring (CPU, memory, disk)
- Application monitoring (latency, errors, throughput)
- Business metrics (users, revenue, conversions)
- Advanced patterns (seasonality, multi-detector)
Key Features
Section titled “Key Features”Statistical Detectors
Section titled “Statistical Detectors”Multiple detector types for different data patterns:
- MAD - Robust, general-purpose, supports seasonality
- Z-Score - Fast, sensitive on normal distributions
- IQR - Excellent for skewed distributions
- Manual Bounds - Simple threshold-based detection
All windowed detectors (MAD, Z-Score, IQR) also support recency weighting
(window_weights + half_life) and robust linear detrending (detrend)
for metrics with a gradual trend.
Seasonality Support
Section titled “Seasonality Support”Handle time-based patterns automatically:
seasonality_columns: - hour - day_of_week
detectors: - type: mad params: seasonality_components: - ["hour", "day_of_week"]Multi-Channel Alerting
Section titled “Multi-Channel Alerting”Send alerts to multiple platforms:
- Mattermost - Team collaboration
- Slack - Team notifications
- Telegram - Mobile alerts
- Email - Traditional notifications
- Webhook - Generic HTTP endpoint
alerting: channels: - mattermost_ops - slack_critical - email_oncall min_detectors: 1 # Quorum: detectors that must agree per point consecutive_anomalies: 3 # Require confirmation direction: "up" # Only alert on increases alert_cooldown: "2h" # Recommended: without it a persisting anomaly re-alerts every run notify_on_recovery: true # Send a follow-up when the metric recovers no_data_alert: false # Alert when expected data is missing # suppress_until: "2026-07-01" # Mute this config until a date mentions: # Users/groups to @-mention - "@oncall"Define multiple independent blocks by giving alerting: a list — each block
has its own channels and conditions, and keeps independent cooldown/recovery
state:
alerting: - enabled: true channels: - mattermost_ops consecutive_anomalies: 3
- enabled: true channels: - slack_critical consecutive_anomalies: 1 # More sensitive for this channel direction: "up" # Only upward anomaliesEfficient Processing
Section titled “Efficient Processing”- Batch processing - Handle large datasets efficiently
- Incremental loading - Only load new data
- Idempotent operations - Safe to re-run
- numpy-based detectors - numpy core, no pandas (windowed detectors use a per-point loop — fine incrementally, slower for large backfills)
Database Support
Section titled “Database Support”Works with your existing data warehouse — all three backends are first-class:
- ClickHouse - native protocol,
detectkit[clickhouse] - PostgreSQL - 12+,
detectkit[postgres] - MySQL - 8.0+,
detectkit[mysql]
Only the connection and the SQL dialect of your metric queries differ; detectors, alerting and the CLI are identical. See the Databases guide for the per-backend breakdown.
Architecture
Section titled “Architecture” dtk run │ ▼ ┌──────────────────────────────────────────────────┐ │ Pipeline orchestration │ │ load → detect → alert │ └──────────────────────────────────────────────────┘ │ ├─▶ Data source ClickHouse query, gap-filled to the grid ├─▶ Detectors MAD · Z-Score · IQR · manual_bounds └─▶ Channels Mattermost · Slack · Telegram · Email · Webhook │ ▼ ┌──────────────────────────────────────────────────┐ │ Internal _dtk_* tables │ │ _dtk_datapoints loaded points │ │ _dtk_detections detection results │ │ _dtk_tasks run / lock state │ └──────────────────────────────────────────────────┘Use Cases
Section titled “Use Cases”Infrastructure Monitoring
Section titled “Infrastructure Monitoring”Monitor system resources:
# CPU, memory, disk, networkdetectors: - type: manual_bounds params: upper_bound: 90.0 - type: zscore params: threshold: 3.0Application Monitoring
Section titled “Application Monitoring”Track application health:
# Response time, error rate, throughputdetectors: - type: iqr params: threshold: 1.5 window_size: 1440Business Metrics
Section titled “Business Metrics”Monitor KPIs:
# Users, revenue, conversionsdetectors: - type: mad params: threshold: 3.0 seasonality_components: - "day_of_week"Common Workflows
Section titled “Common Workflows”Daily Monitoring
Section titled “Daily Monitoring”# Run all metrics (typically in cron)dtk run --select "*"Partial Pipeline
Section titled “Partial Pipeline”# Load data onlydtk run --select cpu_usage --steps load
# Detect without loading new datadtk run --select cpu_usage --steps detectHistorical Backfill
Section titled “Historical Backfill”# Load last 30 daysdtk run --select cpu_usage --from "2024-01-01"Testing
Section titled “Testing”# Test alert channelsdtk test-alert cpu_usageRecovery
Section titled “Recovery”# Clear a stuck lock left by a crashed run (e.g. DB restarted mid-run)dtk unlock --select cpu_usageCleanup After Editing Configs
Section titled “Cleanup After Editing Configs”# Prune detector/alert data orphaned by a config change (dry-run by default)dtk clean --select cpu_usagedtk clean --select cpu_usage --execute
# Purge data for metrics no longer defined in the projectdtk clean --orphaned-metrics --executeClaude Code Context
Section titled “Claude Code Context”# Scaffold AI-assistant context (CLAUDE.md, rules, skills) so Claude Code# can help create metrics, tune detectors and run the pipeline. The content# ships with detectkit and is idempotent — re-run it after upgrading.dtk init-claudeConfiguration Files
Section titled “Configuration Files”detectkit uses three main configuration files:
1. detectkit_project.yml
Section titled “1. detectkit_project.yml”Project-level settings:
name: my_monitoringversion: '1.0'
default_profile: prod
tables: datapoints: _dtk_datapoints detections: _dtk_detections tasks: _dtk_tasks
timeouts: load: 1800 # 30 minutes detect: 3600 # 1 hour alert: 300 # 5 minutes2. profiles.yml
Section titled “2. profiles.yml”Database connections and alert channels:
profiles: prod: type: clickhouse host: localhost port: 9000 internal_database: analytics data_database: default
alert_channels: mattermost_ops: type: mattermost webhook_url: "https://mattermost.example.com/hooks/xxx"3. metrics/*.yml
Section titled “3. metrics/*.yml”Individual metric definitions:
name: cpu_usageinterval: 1minquery: "..."
detectors: - type: mad params: threshold: 3.0
alerting: enabled: true channels: - mattermost_opsDetector Comparison
Section titled “Detector Comparison”| Detector | Best For | Robustness | Seasonality | Speed |
|---|---|---|---|---|
| MAD | General-purpose, seasonal data | High | Yes | Fast |
| Z-Score | Normal distributions | Low | Yes | Very Fast |
| IQR | Skewed distributions | High | Yes | Fast |
| Manual Bounds | Known thresholds | N/A | No | Fastest |
Performance
Section titled “Performance”Detectors are fast enough for routine incremental runs, so choose primarily on accuracy. Note that the windowed detectors (MAD, Z-Score, IQR) use a per-point loop, which is fine incrementally but can be slow for large historical backfills over big windows.
Best Practices
Section titled “Best Practices”1. Start with MAD
Section titled “1. Start with MAD”MAD is a safe default for most metrics:
detectors: - type: mad params: threshold: 3.0 window_size: 1002. Add Seasonality for Time-Based Patterns
Section titled “2. Add Seasonality for Time-Based Patterns”If your metric varies by hour/day/week:
seasonality_columns: - hour
detectors: - type: mad params: seasonality_components: - "hour"3. Handle Trending Metrics
Section titled “3. Handle Trending Metrics”If your metric has a gradual trend (slow growth or decline), use recency weighting and/or detrending so the drift itself is not flagged:
detectors: - type: mad params: window_weights: exponential half_life: "3d" # weight halves every 3 days of age detrend: linear # optional: remove in-window linear trend4. Use Consecutive Anomalies
Section titled “4. Use Consecutive Anomalies”Reduce false positives:
alerting: consecutive_anomalies: 3 # Wait for confirmation5. Filter by Direction
Section titled “5. Filter by Direction”Only alert on meaningful changes:
alerting: direction: "up" # Only alert on increases (e.g., errors, latency) # direction: "down" # Or only on decreases (e.g., users, revenue)6. Test Before Production
Section titled “6. Test Before Production”# Test querydtk run --select my_metric --steps load
# Test detectiondtk run --select my_metric --steps detect
# Test alertdtk test-alert my_metricTroubleshooting
Section titled “Troubleshooting”No Alerts Received
Section titled “No Alerts Received”Check:
alerting.enabled: true- Recent anomalies detected (query
_dtk_detections) - Consecutive threshold met
- Webhook URLs correct
dtk test-alert my_metricToo Many False Positives
Section titled “Too Many False Positives”Solutions:
- Increase
thresholdparameter - Increase
consecutive_anomalies - Add
seasonality_components(if metric is seasonal) - Use
directionfilter
Getting Help
Section titled “Getting Help”- Documentation: You’re reading it!
- Issues: https://github.com/alexeiveselov92/detectkit/issues
- PyPI: https://pypi.org/project/detectkit/
License
Section titled “License”MIT License - see LICENSE file for details.
Contributing
Section titled “Contributing”Contributions welcome! Please:
- Open an issue to discuss changes
- Fork and create pull request
- Ensure tests pass
- Follow existing code style
Changelog
Section titled “Changelog”See CHANGELOG.md for complete version history.