Quickstart

This guide will walk you through creating your first detectkit project and monitoring a metric.

Prerequisites

detectkit installed (Installation Guide)
Database connection (ClickHouse, PostgreSQL, or MySQL)
Basic SQL knowledge

Fastest start: set up with an AI assistant

If you use Claude Code, it can do the setup for you interactively — no hand-editing YAML:

dtk init my_monitoring && cd my_monitoring
dtk init-claude          # adds CLAUDE.md + .claude/rules + skills to this folder

Then, in Claude Code, ask it to run the dtk-setup-project skill. It walks you through profiles.yml based on your database (ClickHouse today): connection details, the internal _dtk_* vs data locations, an optional first alert channel — and verifies it with a non-destructive run. Next, ask it to run dtk-new-metric to scaffold your first metric. That’s the whole setup.

Prefer to do it by hand? The manual steps below do exactly the same thing.

Step 1: Initialize Project

Create a new detectkit project:

dtk init my_monitoring
cd my_monitoring

This creates the following structure:

my_monitoring/
├── detectkit_project.yml      # Project configuration
├── profiles.yml               # Database connections
├── README.md                  # Project readme with quick commands
├── metrics/                   # Metric definitions
│   └── example_cpu_usage.yml  # Working starter metric (mad + zscore, alerting)
├── incidents/                 # Labeled incidents for supervised `dtk autotune`
│   └── example_cpu_usage.yml  # Example labels file
└── sql/                       # SQL queries
    └── .gitkeep

metrics/example_cpu_usage.yml is a complete, runnable example — use it as a template for your own metrics.

Tip — set up an AI assistant. If you use Claude Code, run dtk init-claude in this folder. It writes a CLAUDE.md and .claude/rules/detectkit/ reference plus the setup skills — dtk-setup-project (walk through profiles.yml interactively) and dtk-new-metric (scaffold a metric) — so the assistant can do the setup below for you and help you write metrics, tune detectors and configure alerts. Re-run it after upgrading detectkit to refresh the context. See dtk init-claude.

Step 2: Configure Database Connection

Shortcut — let the assistant do it. If you ran dtk init-claude (see the tip above), just ask Claude Code to run the dtk-setup-project skill. It asks for your connection details, branches on the database type, fills in the profile fields below for you, and verifies the result. The manual steps below are the same thing by hand.

Edit profiles.yml to add your database connection:

ClickHouse Example

default_profile: prod

profiles:
  prod:
    type: clickhouse
    host: localhost
    port: 9000
    user: default
    password: ""

    # Internal tables location (for _dtk_* tables)
    internal_database: analytics

    # Data tables location
    data_database: default

    settings:
      max_execution_time: 600

Edit before running. The auto-generated profiles.yml ships a dev profile with example values — host: localhost and the two required ClickHouse locations internal_database: detectkit (for the _dtk_* tables) and data_database: default (where your source tables live). Change the host, port, credentials and both database names to match your environment. (There is no database: field — ClickHouse needs both internal_database and data_database, or the run fails with internal_database must be set for ClickHouse.)

Tip: dtk init --db-type postgres (or mysql) scaffolds profiles.yml with the right fields for that backend from the start.

PostgreSQL Example

PostgreSQL connects to a database (must already exist) and uses schemas:

profiles:
  prod:
    type: postgres
    host: localhost
    port: 5432
    user: postgres
    password: "your_password"
    database: detectkit         # must already exist
    internal_schema: detectkit  # auto-created
    data_schema: public

MySQL Example

MySQL (8.0+) uses databases (auto-created):

profiles:
  prod:
    type: mysql
    host: localhost
    port: 3306
    user: root
    password: "your_password"
    internal_database: detectkit
    data_database: analytics

See the Databases guide for the full per-backend breakdown (install extras, connection fields, SQL dialect).

Step 3: Create Your First Metric

Create a metric configuration file:

touch metrics/api_response_time.yml

Edit metrics/api_response_time.yml:

# Basic metric info
name: api_response_time
interval: 5min

# SQL query to load data.
# Built-in template variables: {{ dtk_start_time }}, {{ dtk_end_time }}
# (rendered as 'YYYY-MM-DD HH:MM:SS' strings) and {{ interval_seconds }}.
query: |
  SELECT
    timestamp,
    AVG(response_time_ms) AS value
  FROM api_logs
  WHERE timestamp >= '{{ dtk_start_time }}'
    AND timestamp < '{{ dtk_end_time }}'
  GROUP BY timestamp
  ORDER BY timestamp

# Column mapping (optional if columns match defaults)
query_columns:
  timestamp: timestamp
  metric: value

# Detector configuration
detectors:
  - type: mad
    params:
      threshold: 3.0
      window_size: 288      # 1 day of 5-min intervals
      min_samples: 50

# Alerting configuration
alerting:
  enabled: true
  channels:
    - mattermost_ops
  consecutive_anomalies: 3  # Require 3 anomalies in a row
  alert_cooldown: "30min"   # Recommended: without it a persisting
                            # anomaly re-alerts on every run

Step 4: Configure Alert Channel

Edit profiles.yml to add an alert channel:

# At the end of profiles.yml
alert_channels:
  mattermost_ops:
    type: mattermost
    webhook_url: "https://mattermost.example.com/hooks/your_webhook_id"
    channel: "alerts"
    # Bot name + avatar default to the detectkit brand. Override per channel
    # with username / icon_url / icon_emoji (see the Alert Channels guide).

Step 5: Run Your Metric

Run the metric for the first time:

dtk run --select api_response_time

Output looks like this — a header with the project root and the metric count, then a per-metric block (config file + steps) and the load → detect → alert pipeline rendered as a tree, ending in a success line:

Project root: /path/to/my_monitoring
Found 1 metric(s) to process

Processing metric: api_response_time
  Config file: metrics/api_response_time.yml
  Steps: load, detect, alert

  ┌─ LOAD
  │   ... (load progress)
  └─ ... (detect / alert progress)

✓ Pipeline completed successfully

The per-step detail lines are emitted by the pipeline itself, so the exact middle of the tree depends on how much data was loaded and how many anomalies were found.

Step 6: Explore Results

View Loaded Data

Data is stored in _dtk_datapoints table:

SELECT *
FROM analytics._dtk_datapoints
WHERE metric_name = 'api_response_time'
ORDER BY timestamp DESC
LIMIT 10;

View Detections

Anomalies are stored in _dtk_detections table:

SELECT
  timestamp,
  value,
  confidence_lower,
  confidence_upper,
  detection_metadata
FROM analytics._dtk_detections
WHERE metric_name = 'api_response_time'
  AND is_anomaly = true
ORDER BY timestamp DESC;

Common Use Cases

1. Error Rate Monitoring

name: error_rate
interval: 1min

query: |
  SELECT
    toStartOfMinute(timestamp) AS timestamp,
    countIf(status >= 500) / count() AS value
  FROM http_requests
  WHERE timestamp >= '{{ dtk_start_time }}'
    AND timestamp < '{{ dtk_end_time }}'
  GROUP BY timestamp
  ORDER BY timestamp

detectors:
  - type: manual_bounds
    params:
      upper_bound: 0.01  # Alert if error rate > 1%

2. CPU Usage Monitoring

name: cpu_usage
interval: 30s

query: |
  SELECT
    timestamp,
    avg_cpu_percent AS value
  FROM system_metrics
  WHERE timestamp >= '{{ dtk_start_time }}'
    AND timestamp < '{{ dtk_end_time }}'
  ORDER BY timestamp

detectors:
  - type: zscore
    params:
      threshold: 3.0
      window_size: 120  # 1 hour

3. Daily Active Users

name: daily_active_users
interval: 1day

query: |
  SELECT
    toDate(timestamp) AS timestamp,
    uniqExact(user_id) AS value
  FROM user_events
  WHERE timestamp >= '{{ dtk_start_time }}'
    AND timestamp < '{{ dtk_end_time }}'
  GROUP BY timestamp
  ORDER BY timestamp

detectors:
  - type: mad
    params:
      threshold: 3.0
      window_size: 60  # 2 months

CLI Commands

Run Specific Metrics

# Run single metric
dtk run --select api_response_time

# Run multiple metrics
dtk run --select "api_*"

# Run all metrics
dtk run --select "*"

Partial Pipeline

# Only load data (skip detection)
dtk run --select api_response_time --steps load

# Only detect anomalies (skip alert)
dtk run --select api_response_time --steps load,detect

Full Refresh

# Delete all data and reload from scratch
dtk run --select api_response_time --full-refresh

Historical Backfill

# Load data from a specific date
dtk run --select api_response_time --from "2024-01-01 00:00:00"

# Bounded backfill: pair --from with --to to load a closed window
dtk run --select api_response_time --from "2024-01-01" --to "2024-02-01"

Exclude and Force

# Run everything except a subset
dtk run --select "*" --exclude "metrics/staging/*"

# Ignore a stuck lock left by a crashed run
dtk run --select api_response_time --force

See the CLI Reference for the full flag list.

Test Alert

# Preview alert message without real anomalies
dtk test-alert api_response_time

Clear a Stuck Lock

# If a run was killed without releasing its lock (e.g. the database
# restarted mid-run), later runs fail with "Failed to acquire lock".
# Clear it immediately:
dtk unlock --select api_response_time

Stuck locks also auto-expire after 1 hour, so the next normal run recovers on its own — dtk unlock just does it right away.

Prune Stale Data After Editing Configs

# Editing a metric's detectors/alerting leaves the old results behind.
# Preview what no longer matches the config (dry-run), then delete it:
dtk clean --select api_response_time
dtk clean --select api_response_time --execute

# Renamed or deleted a metric? Purge everything left under the old name:
dtk clean --orphaned-metrics --execute

See the CLI Reference for both modes.

Next Steps

Now that you have a working metric:

Add seasonality - MAD Detector with Seasonality
Handle trending metrics - window_weights: exponential + half_life, or detrend: linear (Detectors Guide)
Configure multiple detectors - Detectors Guide
Set up multiple channels - Alerting Guide
Fan out to independent alert rules - alerting: can be a list of alert blocks, each with its own channels, conditions and template (Multiple alert blocks)
Explore examples - Examples

Troubleshooting

”Table _dtk_datapoints does not exist”

Solution: detectkit creates internal tables automatically on first run. Check database permissions.

”Connection refused”

Solution: Verify database connection in profiles.yml:

# Test ClickHouse connection
clickhouse-client --host=localhost --port=9000

# Test PostgreSQL connection
psql -h localhost -U postgres -d analytics

“No data loaded”

Solution: Check your SQL query returns data:

-- Run query manually with sample dates
SELECT
  timestamp,
  AVG(response_time_ms) AS value
FROM api_logs
WHERE timestamp >= '2024-03-01 00:00:00'
  AND timestamp < '2024-03-02 00:00:00'
GROUP BY timestamp
ORDER BY timestamp;

“All points marked as insufficient_data”

Solution: Increase historical data range or decrease min_samples:

detectors:
  - type: mad
    params:
      min_samples: 10  # Reduce from default 30

Getting Help

Documentation: Full guides available in docs/
Examples: See examples/ for more configurations
If something looks like a bug: when a dtk command errors or behaves unexpectedly and it isn’t your config, use the dtk-feedback skill (from dtk init-claude) to file a redacted bug report or feature request upstream — it collects diagnostics, strips every secret, and asks you to confirm first.
Issues: Report bugs at https://github.com/alexeiveselov92/detectkit/issues