CLI Reference
Complete reference for the dtk command-line tool.
Overview
Section titled “Overview”The dtk CLI provides dbt-like commands for managing metric monitoring:
dtk init <project> # Initialize new projectdtk init-claude # Set up Claude Code context for this folderdtk run --select <selector> # Run metric pipelinedtk autotune --select <sel> # Auto-configure a metric's detector from datadtk tune --select <sel> # Interactively tune a detector, write it backdtk test-alert <metric> # Test alert channelsdtk unlock --select <selector> # Clear a stuck pipeline lockdtk clean --select <selector> # Prune data that no longer matches configsdtk --version # Show versiondtk --help # Show helpGlobal Options
Section titled “Global Options”--version
Section titled “--version”Show the installed detectkit package version:
dtk --versionOutput:
detectkit, version x.y.z--help
Section titled “--help”Show help for any command:
dtk --helpdtk run --helpdtk init --helpCommands
Section titled “Commands”dtk init
Section titled “dtk init”Initialize a new detectkit project.
Syntax
Section titled “Syntax”dtk init <project_name> [OPTIONS]Arguments
Section titled “Arguments”project_name (required)
Name of the project to create.
Options
Section titled “Options”--target-dir, -d (default: .)
Directory to create project in.
Examples
Section titled “Examples”Create project in current directory:
dtk init my_monitoringCreate project in specific directory:
dtk init analytics --target-dir /opt/projectsCreated Structure
Section titled “Created Structure”my_monitoring/├── detectkit_project.yml # Project configuration├── profiles.yml # Database connections & alert channels├── README.md # Getting-started notes for the project├── metrics/ # Metric definitions│ ├── .gitkeep│ └── example_cpu_usage.yml # Example metric to copy/edit├── incidents/ # Labeled incidents for supervised `dtk autotune`│ └── example_cpu_usage.yml # Example labels file to copy/edit└── sql/ # SQL query files └── .gitkeepdtk init-claude
Section titled “dtk init-claude”Set up Claude Code context for working with detectkit. Run it in the folder that holds your detectkit project(s) — it gives an AI assistant the context and tools to help you create metrics, tune detectors, configure alerts and run the pipeline natively.
Syntax
Section titled “Syntax”dtk init-claude [OPTIONS]Options
Section titled “Options”--target-dir, -d (default: .)
Folder holding your detectkit project(s) to set up.
Created / updated files
Section titled “Created / updated files”<target>/├── CLAUDE.md # created, or a managed detectkit block is│ # injected/refreshed (your content is kept)└── .claude/ ├── rules/detectkit/ # reference docs the assistant reads on demand │ ├── overview.md │ ├── cli.md │ ├── project.md │ ├── metrics.md │ ├── detectors.md │ └── alerting.md └── skills/ ├── dtk-setup-project/ # skill: configure profiles.yml (DB + channels) │ └── SKILL.md ├── dtk-new-metric/ # skill: scaffold a validated metric YAML │ └── SKILL.md └── dtk-feedback/ # skill: file a redacted bug/feature/feedback └── SKILL.md # issue upstream (with your confirmation)Behavior
Section titled “Behavior”- Idempotent. The detectkit block in
CLAUDE.mdlives between<!-- BEGIN detectkit … -->/<!-- END detectkit -->markers; re-running refreshes only that block and the managed files. Anything you write outside the markers is preserved. A re-run with no upstream change reports everythingunchanged. - Versioned. The content ships with detectkit and tracks the installed
version, so re-run
dtk init-claudeafter upgrading to refresh the guidance to match the new release. - Works whether the folder holds one project or several side by side.
Examples
Section titled “Examples”# Set up the current folderdtk init-claude
# Set up a specific monitoring rootdtk init-claude --target-dir /opt/monitoringAfter running, open the folder in Claude Code and ask it about your metrics,
alerts or configs. Three skills come with it: dtk-setup-project (configure
profiles.yml — the database connection and a first alert channel — so runs
work end to end), dtk-new-metric (scaffold a validated metric YAML), and
dtk-feedback (file a bug report, feature request, or feedback as a GitHub
issue on the upstream repo — it collects the diagnostic context, redacts every
secret, and asks you to confirm before submitting).
dtk run
Section titled “dtk run”Run the metric processing pipeline.
Syntax
Section titled “Syntax”dtk run --select <selector> [OPTIONS]Options
Section titled “Options”--select, -s (required)
Section titled “--select, -s (required)”Selector for metrics to run. Three selector types are supported:
1. Metric name (searches only root metrics/ directory):
dtk run --select cpu_usage # Finds metrics/cpu_usage.ymldtk run --select api_latency # Finds metrics/api_latency.ymlNote: When using metric name (without path separators), do not include .yml extension. The extension is added automatically.
2. Path pattern (glob - supports subdirectories):
# Select specific file with full pathdtk run --select "metrics/critical/cpu.yml"
# Select all metrics in a folderdtk run --select "metrics/critical/*"
# Select all metrics recursivelydtk run --select "metrics/**/*.yml"
# Pattern matchingdtk run --select "api_*" # All metrics starting with "api_"3. Tag selector (searches recursively):
# Select all metrics with "critical" tagdtk run --select tag:critical
# Select metrics tagged as "api"dtk run --select tag:api
# Select metrics tagged as "10min"dtk run --select tag:10minTags must be configured in metric YAML files:
name: api_latencytags: ["critical", "api", "10min"]# ... rest of configUniqueness validation: All selected metrics are validated to ensure no duplicate metric names exist. If duplicates are found, an error is raised listing the conflicting files.
--exclude, -e (optional)
Section titled “--exclude, -e (optional)”Selector for metrics to exclude.
dtk run --select "*" --exclude "metrics/staging/*"--steps (default: load,detect,alert)
Section titled “--steps (default: load,detect,alert)”Pipeline steps to execute.
Available steps:
load- Load data from databasedetect- Run anomaly detectionalert- Send alerts
Examples:
# All steps (default)dtk run --select cpu_usage
# Load onlydtk run --select cpu_usage --steps load
# Detect and alert (skip load)dtk run --select cpu_usage --steps detect,alert
# Detect only (no load, no alert)dtk run --select cpu_usage --steps detect--from (optional)
Section titled “--from (optional)”Start date for data loading.
Format: YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
# Load from January 1, 2024dtk run --select cpu_usage --from "2024-01-01"
# Load from specific timestampdtk run --select cpu_usage --from "2024-01-01 12:00:00"Behavior:
- Overrides metric’s
loading_start_timeconfig - Only affects
loadstep - Timestamps are in UTC
--to (optional)
Section titled “--to (optional)”End date for data loading.
Format: YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
# Load up to February 1, 2024dtk run --select cpu_usage --from "2024-01-01" --to "2024-02-01"Behavior:
- Defaults to current time if not specified
- Only affects
loadstep - Timestamps are in UTC
--full-refresh (flag)
Section titled “--full-refresh (flag)”Delete existing data and reload from scratch.
dtk run --select cpu_usage --full-refreshBehavior (delete/reload is range-scoped to --from/--to):
- Deletes
_dtk_datapointsand_dtk_detectionsrows in the[--from, --to)window — and all history only when neither--fromnor--tois given (detect uses--toor now as the upper bound when--tois omitted) - Reloads data from
--from(orloading_start_timewhen no--from) up to--to(or now)
Use cases:
- Fixing corrupted data
- Changing data loading logic
- Reprocessing with new detector configuration
Warning: This is a destructive operation. Use with caution.
--force (flag)
Section titled “--force (flag)”Ignore an existing task lock and run anyway.
dtk run --select cpu_usage --forceBehavior:
- Skips the held-lock check (runs even if another lock is marked
running) - Still takes ownership of the lock for the duration of the run and releases
it on exit — so a
--forcerun also clears a previously stuck lock - Allows concurrent runs (not recommended)
Warning: Can cause data corruption if multiple processes run simultaneously.
Note: You usually don’t need
--forceto recover from a crash. Arunninglock left behind by a dead process (e.g. the database restarted mid-run) auto-expires after its timeout (1 hour) and is overridden by the next normal run. To clear a stuck lock immediately, usedtk unlockinstead of--force.
--profile (optional)
Section titled “--profile (optional)”Override the default profile from project config.
dtk run --select cpu_usage --profile stagingUse cases:
- Testing with different database
- Running against multiple environments
--report (optional, dual-mode)
Section titled “--report (optional, dual-mode)”After the run, write a self-contained HTML report per selected metric — values, each detector’s confidence band, the flagged anomalies, the alerts that fired (anomaly / recovery / no-data) and a summary, with a client-side period selector (24h / 7d / 30d / All + zoom/pan). The report is offline: the chart and data are inlined into one file, so nothing is fetched and nothing leaves the page.
# Default path: reports/<metric>.htmldtk run --select cpu_usage --report
# Into a directory: <dir>/<metric>.htmldtk run --select cpu_usage --report reports/
# Into a specific filedtk run --select cpu_usage --report cpu.htmlBehavior:
- Bare
--report→reports/<metric>.html; a directory →<dir>/<metric>.html; a.htmlpath → that exact file. - Reads the persisted
_dtk_datapoints/_dtk_detections, so it works even on a--steps load(or any partial) run, charting whatever is already stored. - Best-effort: a report failure is reported and does not fail the run.
Advanced — alerts are reconstructed, not read from state.
_dtk_alert_statesstores last-writer-wins cooldown/recovery bookkeeping, not an event log, so the report cannot read past alerts from it. Instead it replays the real decision logic (quorum,consecutive_anomalies, cooldown, recovery, no-data) over the stored detections to reconstruct the timeline. This is faithful to the rules, but because cooldown suppression depends on when the live pipeline ran (run cadence), the set of suppressed repeat alerts a live run dispatched can differ slightly from the replay, which evaluates every grid point causally. The anomalies, bands, and which incidents fired are unaffected.
Metric Selection Rules
Section titled “Metric Selection Rules”Understanding how metric selection works is important to avoid confusion:
File Name vs Metric Name
Section titled “File Name vs Metric Name”Two different identifiers:
- File name (e.g.,
metrics/cpu.yml) - where config is stored - Metric name (e.g.,
name: cpu_usagein YAML) - identifier used in database
Important: detectkit uses metric name (from config) for all operations:
- Database table rows are keyed by
metric_name - Task locking uses
metric_name - Display shows
metric_name(not file name)
Best practice: Keep file names and metric names consistent:
name: cpu_usage # Matches file name (recommended)name: server_cpu_usage # Confusing - file name doesn't matchUniqueness Requirements
Section titled “Uniqueness Requirements”Metric names MUST be unique across the entire project.
Why uniqueness matters:
- Database tables use
metric_nameas PRIMARY KEY component - Duplicate names cause data to mix from different sources
- Task locking conflicts prevent metrics from running
- Anomaly detection becomes invalid (mixed data)
Example of invalid configuration:
name: cpu_usage # Duplicate name!query: "SELECT * FROM api_metrics"
# metrics/system/cpu.ymlname: cpu_usage # Same name causes data corruption!query: "SELECT * FROM system_metrics"Validation: detectkit automatically validates uniqueness when selecting metrics. If duplicates are found:
Error: Duplicate metric name 'cpu_usage' found: - metrics/api/cpu.yml - metrics/system/cpu.yml
Metric names must be unique across the project.Please rename one of the metrics to avoid data corruption.Solution - use unique names:
name: api_cpu_usage # Unique
# metrics/system/cpu.ymlname: system_cpu_usage # UniqueSelector Behavior Summary
Section titled “Selector Behavior Summary”| Selector Type | Example | Searches | Extension |
|---|---|---|---|
| Metric name | cpu_usage | Root metrics/ only | Auto-added |
Path with / | metrics/api/cpu.yml | Glob pattern | Keep as-is |
Pattern with * | api_* | Glob pattern | Keep as-is |
| Tag | tag:critical | Recursive search | N/A |
Common mistakes:
dtk run --select cpu_usage.yml→ Won’t work (searches formetrics/cpu_usage.yml.yml)dtk run --select cpu_usage→ Correct (searches formetrics/cpu_usage.yml)dtk run --select "metrics/cpu_usage.yml"→ Also works (explicit path)
Examples
Section titled “Examples”Basic Usage
Section titled “Basic Usage”Run single metric:
dtk run --select cpu_usageRun all metrics:
dtk run --select "*"Run metrics matching pattern:
dtk run --select "api_*"Partial Pipeline
Section titled “Partial Pipeline”Load data only (skip detection):
dtk run --select cpu_usage --steps loadRun detection only (skip load and alert):
dtk run --select cpu_usage --steps detectRun detection and alert (skip load):
dtk run --select cpu_usage --steps detect,alertHistorical Backfill
Section titled “Historical Backfill”Load data from specific date:
dtk run --select cpu_usage --from "2024-01-01"Load specific date range:
dtk run --select cpu_usage \ --from "2024-01-01" \ --to "2024-02-01"Full Refresh
Section titled “Full Refresh”Delete and reload all data:
dtk run --select cpu_usage --full-refreshFull refresh with custom start date:
dtk run --select cpu_usage \ --full-refresh \ --from "2024-01-01"Multiple Metrics
Section titled “Multiple Metrics”Run multiple metrics by pattern:
dtk run --select "metrics/critical/*.yml"Run all except staging:
dtk run --select "*" --exclude "metrics/staging/*"Different Environment
Section titled “Different Environment”Run against staging database:
dtk run --select cpu_usage --profile stagingForce Run (Emergency)
Section titled “Force Run (Emergency)”Force run if previous run crashed:
dtk run --select cpu_usage --forceOutput
Section titled “Output”Each run renders as a load → detect → alert tree per metric:
Project root: /path/to/projectFound 1 metric(s) to process
Processing metric: cpu_usage Config file: metrics/cpu_usage.yml Steps: load, detect, alert
┌─ LOAD │ Resuming from last saved: 2024-03-15 09:50:00 │ Loading from 2024-03-15 10:00:00 to 2024-03-15 10:00:00 │ Total points: ~1,440 | Batch size: 2,160 │ Loading in single batch... └─ Loaded 1,440 datapoints
✓ Pipeline completed successfullyOn failure the tree ends with a red ✗ Failed: … line instead of
✓ Pipeline completed successfully.
dtk autotune
Section titled “dtk autotune”Automatically configure a metric’s detector from its data — and, if you supply
them, from labeled incidents. Searches detector type × hyperparameters ×
seasonality grouping × history window (× alert window, when supervised),
cross-validates each candidate with walk-forward folds, and writes a new,
annotated metric YAML. It is a separate pipeline from load → detect → alert:
it never edits the original config and never sends alerts.
Syntax
Section titled “Syntax”dtk autotune --select <selector> [OPTIONS]Options
Section titled “Options”--select, -s (required)
Section titled “--select, -s (required)”Metric selector — same semantics as dtk run (metric name, path
pattern, or tag:<name>). Tuning reads the metric’s already-loaded
_dtk_datapoints; if it has none yet, load it first (optionally backfill more
history, which tunes better):
dtk run --select api_error_rate --steps load --from "2026-01-01"--incidents (optional)
Section titled “--incidents (optional)”Path to a labels file of known incidents → supervised tuning. Without it
(and without an autotune.labels_file in the metric config), an interactive
terminal first prompts to enter incidents inline; declining — or running
non-interactively (cron/CI/piped input) — falls back to an unsupervised
objective (low false-positive rate + stable cross-fold separation). Supervised
mode engages only if labeled timestamps land on loaded grid points. The file
is YAML or JSON, all times UTC, each incident an interval ({start, end}) or a
point ({at}):
metric: api_error_rate # optional; must match the metric being tunedtimezone: UTC # optional; interprets the naive times belowincidents: - {start: "2026-05-02 14:00:00", end: "2026-05-02 16:30:00"} - {at: "2026-05-11 09:05:00"}dtk autotune --select api_error_rate --incidents incidents/api_error_rate.yml--label (flag)
Section titled “--label (flag)”Open the interactive labeler to mark incidents visually, then tune on them in the
same command. By default it starts a local 127.0.0.1 browser labeler; Save &
tune writes a versioned file into incidents/<metric>/ and the run continues
into tuning. Mark incidents by click-drag, use Threshold capture to grab
every span above/below a horizontal line at once, or Lasso capture to loop
around a cloud of outliers (each grid-adjacent run, gaps bridged, becomes one
incident span); remove one with its chart-side ✕ or the Delete key. It seeds from the metric’s newest saved set (or
--incidents <file-or-dir>), so re-running --label keeps editing in place.
--no-serve instead writes a static metrics/<metric>__labeler.html (Export
downloads a labels file; Import file… loads one back); --no-open prints the
URL instead of launching a browser. See the
--label reference for the full walkthrough.
dtk autotune --select api_error_rate --label--scoring (default: mcc)
Section titled “--scoring (default: mcc)”The metric the search maximizes across folds: mcc (default), f1, f_beta,
balanced_accuracy, roc_auc, pr_auc. MCC uses the whole confusion matrix and
suits rare anomalies.
dtk autotune --select api_error_rate \ --incidents incidents/api_error_rate.yml \ --scoring f_beta--from (optional)
Section titled “--from (optional)”Lower bound of the training window (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS, UTC).
--to (optional)
Section titled “--to (optional)”Upper bound of the training window (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS, UTC).
--profile (optional)
Section titled “--profile (optional)”Override the default profile from the project config.
--force (flag)
Section titled “--force (flag)”Ignore an existing task lock and run anyway (same lock semantics as
dtk run --force).
--dry-run (flag)
Section titled “--dry-run (flag)”Run the search but persist nothing — no config, no detections, no
_dtk_autotune_runs row. Previews what autotune would choose.
--report (optional, dual-mode)
Section titled “--report (optional, dual-mode)”Write the same self-contained HTML report as
dtk run --report for the tuned winner —
values, the chosen detector’s confidence band, the
flagged anomalies, the alerts that would have fired, and a summary, with the
client-side period selector. It charts the winner’s detections (persisted during
the run), so run without --dry-run.
# Default path: reports/<metric>__tuned_<id>.htmldtk autotune --select cpu_usage --report
# A directory, or a specific filedtk autotune --select cpu_usage --report reports/dtk autotune --select cpu_usage --report cpu_tuned.htmlBare --report → reports/<metric>__tuned_<id>.html; a directory →
<dir>/<metric>.html; a .html path → that file. The same Advanced note as
dtk run --report applies: alerts in the report are reconstructed by replaying
the decision logic over the stored detections.
Behavior
Section titled “Behavior”On success (without --dry-run), one run:
- writes
metrics/<name>__tuned_<id>.yml— a normal, ready-to-run config led by a#comment header explaining every decision (training period, labels, seasonality rationale, detector votes, grid-search winner + CV score + per-fold scores, window choice). The<id>is a deterministic hash of the run. - records one row in the
_dtk_autotune_runsaudit table; - persists the winning detector’s detections to
_dtk_detections; - prunes the superseded winners from prior autotune runs of the same metric.
The tuned config is an ordinary metric. Hand-editing its detector changes the
detector_id, orphaning the old detections — recompute and prune:
dtk run --select <name>__tuned_<id> --steps detect --full-refreshdtk clean --select <name>__tuned_<id> --executeSee the Auto-tuning guide and the
Auto-tune reference for the labels schema, the autotune: config
block, the scoring-metrics catalog, and the _dtk_autotune_runs columns.
dtk tune
Section titled “dtk tune”Interactively tune a metric’s detector on its real data, then write the
chosen config back into the metric YAML. The manual, human-in-the-loop sibling of
dtk autotune: it opens a browser view of the metric’s persisted
series, lets you turn the detector’s knobs and watch the confidence band + flagged
anomalies + would-fire alerts recompute live, and — on a click — applies the
config. Where autotune searches automatically and writes a new
__tuned_<id>.yml, tune is manual and edits the metric in place.
Safe by construction: the new config is validated before anything is written, the
previous metric YAML is archived under metrics/.history/<metric>/, and only then
is the metric overwritten. It takes no pipeline lock (it only edits a config
file); re-run dtk run afterwards to recompute detections under the new config.
Syntax
Section titled “Syntax”dtk tune --select <selector> [OPTIONS]Options
Section titled “Options”--select, -s (required)
Section titled “--select, -s (required)”Metric selector — same semantics as dtk run, but it must resolve to
a single metric (tuning is interactive and per-metric). Tuning reads the
metric’s already-loaded _dtk_datapoints; if it has none yet, load it first:
dtk run --select api_error_rate --steps load --from "2026-01-01"--from, --to (optional)
Section titled “--from, --to (optional)”Restrict the window the tuner shows and recomputes over (YYYY-MM-DD or
YYYY-MM-DD HH:MM:SS, UTC). Defaults to the recent persisted window.
--no-serve (flag)
Section titled “--no-serve (flag)”Write a static, read-only tuner HTML file (metrics/<metric>__tuner.html) and
exit instead of starting the local server. The sliders still recompute the band
live and you can still mark incidents, but there is no Apply / write-back —
Save incidents downloads the labels file instead of writing it.
--no-open (flag)
Section titled “--no-open (flag)”Don’t auto-open the browser — just print the local 127.0.0.1 URL.
--profile (optional)
Section titled “--profile (optional)”Profile override (default: from the project config).
What you can tune
Section titled “What you can tune”Detector type (MAD / Z-Score / IQR / Manual bounds), threshold, window
size, recency weighting + half-life, detrend, smoothing,
seasonality conditioning (per available seasonality column, optionally conjoined
into one group), direction (both/up/down) and the alert
consecutive_anomalies window. The “effective config” readout shows exactly
what will be written. A y = 0 line toggle shows the metric relative to zero.
Chart-first cockpit: modes, alert review & metrics
Section titled “Chart-first cockpit: modes, alert review & metrics”The whole screen is one chart (the windshield) with the live metrics pinned in a HUD over it (the speedometer) and every control in an always-visible side rail that is mode-aware — it shows only the current mode’s panel (detector knobs + effective-config readout + Apply in Tune, verdict actions in Review, capture tools + Save in Label) and collapses to give the chart the whole width. The controls that aren’t detector-specific — the Points shown data window, the alert rule (direction + consecutive anomalies) and the y = 0 toggle — stay visible in every mode. A mode switch picks the job and dims the layers that don’t matter to it:
- Tune — steer the band (corridor leads; incidents are read-only context; hover a point for its window).
- Review — confirm the fired alerts: click an alert marker to cycle its verdict un-reviewed (red) → valid (green) → false alarm (slate); Confirm all unreviewed valid does the lot. Confirming an alert valid IS marking an incident — the confirmed streak becomes a first-class incident that shows in the Marked incidents list (a ”✓ confirmed alert” row; remove it to un-confirm), counts toward recall + correct (so a clean metric is validated in a few clicks without drawing spans), and is written as an incident on Save. The list, the metrics and Save share one ground-truth set (marked spans + confirmed alerts).
- Label — mark real incidents: drag a span (edges/middle to adjust, ✕/Delete
to remove), Lasso anomalies (loop a cloud of anomaly dots — each consecutive
run, gaps bridged up to
consecutive_anomalies, becomes one span sized to the run), or Threshold capture (grab every span past a horizontal line; set it by click or value, above/below, optional gap-bridge, optional painted time window saved ascapture_windows; each span widened to a full interval so the alert lands inside).
As you tune, a metrics bar shows incident catch rate (recall) — the share of
ground-truth incidents (marked + confirmed-valid alerts) caught by an alert (caught
when an alert’s anomaly streak overlaps it, not just the fire instant) —
false-alert rate — the share of fired alerts outside every incident and not
confirmed valid (“≈1 in N false”) — and reviewed N/M; only incidents within the
loaded window are scored. An optional false-alert budget (false_alert_budget, a
fraction in (0, 1] on the metric then project, default 0.5) gently flags
the false-alert chip when the rate exceeds it — tuning-only, labeling stays optional.
Save incidents writes
a versioned incidents/<metric>/<…>.yml, the same store
dtk autotune reads (it seeds incidents and capture windows
from the newest such file on open, anchoring the budget-sized loaded window on
the seeded incidents — ending just past the latest one rather than at the last
datapoint — so they render and count without loading the whole history; older
incidents stay list-only, use --from/--to to tune against them; per-alert
verdicts persist as an alert_reviews metadata block and re-seed on reopen), so a
labeling round here also feeds the next supervised tune. Saving incidents does not
end the session; only Apply does.
How Apply writes back
Section titled “How Apply writes back”On Apply to metric the server validates the chosen detector (through the same
DetectorFactory + MetricConfig the pipeline uses) — a broken or untunable
config is rejected and nothing is written — then archives the current YAML
verbatim to metrics/.history/<metric>/<metric>-<timestamp>.yml and re-emits the
metric in place with the tuned detector (the detectors list becomes the single
tuned detector; the first alerting block’s consecutive_anomalies is updated if
present). The archive keeps a trackable history of chosen parameters and the
original is always recoverable.
Examples
Section titled “Examples”# Tune interactively and apply on clickdtk tune --select api_error_rate
# Tune over a specific windowdtk tune --select api_error_rate --from 2026-05-01 --to 2026-06-01
# Static, read-only preview file (no write-back)dtk tune --select api_error_rate --no-serveSee the Tuning guide for the full walkthrough and how it
relates to dtk autotune.
dtk test-alert
Section titled “dtk test-alert”Send test alert for a metric.
Syntax
Section titled “Syntax”dtk test-alert <metric_name> [OPTIONS]Arguments
Section titled “Arguments”metric_name (required)
Name of the metric to test alerts for.
Options
Section titled “Options”--profile (optional)
Profile to use (overrides project default).
Examples
Section titled “Examples”Test alert for single metric:
dtk test-alert cpu_usageTest with specific profile:
dtk test-alert cpu_usage --profile productionBehavior
Section titled “Behavior”Sends a mock alert through all configured channels with fake data:
- Current timestamp
- Mock anomaly value:
0.8532 - Mock confidence interval:
[0.4521, 0.6234] - Mock severity:
4.52 - Rule preview: the mock mirrors the alert config’s own
min_detectors,direction, andconsecutive_anomalies(defaults1/same/3), so the message shows the alert-centric layout a real firing would produce - Project label: the preview carries the project-name
[name]prefix (fromdetectkit_project.yml), exactly as a realdtk runstamps it — so a preview on a shared multi-project channel reads identically to the real alert
Use cases:
- Verify webhook URLs work
- Check alert formatting
- Test custom templates
- Validate channel permissions
Example Output
Section titled “Example Output”📨 Sending test alert for metric: cpu_usage Timezone: UTC Channels: mattermost_ops
→ Sending to mattermost_ops... ✓ SUCCESS
✓ Sent test alert to 1/1 channels
💡 Check your configured channels to verify message formatting Mock data used: value=0.8532, confidence=[0.4521, 0.6234], severity=4.52When the metric defines multiple enabled alerting blocks (the list form),
each block is tested independently: its Timezone/Channels are printed under
a [config i/N] header, followed by a combined Total: x/y channels across N alert configs line.
dtk unlock
Section titled “dtk unlock”Clear a stuck pipeline lock for the selected metric(s).
Syntax
Section titled “Syntax”dtk unlock --select <selector> [OPTIONS]Options
Section titled “Options”--select, -s (required)
Metric selector — same semantics as dtk run (metric name, path pattern, or
tag:<name>).
--profile (optional)
Profile to use (overrides project default).
Examples
Section titled “Examples”# Unlock a single metricdtk unlock --select cpu_usage
# Unlock everything matching a tagdtk unlock --select "tag:critical"When to use it
Section titled “When to use it”Every dtk run records a running lock in _dtk_tasks while it works and
clears it on exit. If a run is killed without releasing its lock — most
commonly when the database restarts mid-run — the running row is left
behind. Until it’s cleared, every subsequent non---force run fails with:
RuntimeError: Failed to acquire lock for metric '<name>'. Another task isrunning. Use --force to override.Stuck locks auto-expire after their timeout (1 hour) — the next normal run
treats the stale running row as released and overrides it, so the error
clears itself. dtk unlock simply does this immediately instead of waiting
for the timeout. It marks the task completed, so the next scheduled (cron)
run proceeds normally without needing --force.
Behavior
Section titled “Behavior”- Reports, per metric, whether a lock was cleared (
lock cleared) or none was held (• <name>: no active lock) - Clears even a not-yet-expired lock (use with the same care as
--force) - Does not run the pipeline — only releases the lock
Example Output
Section titled “Example Output”Project root: /path/to/projectFound 1 metric(s) to unlock
┌─ cpu_usage └─ lock cleared
Done. Cleared 1 lock(s) of 1 metric(s).dtk clean
Section titled “dtk clean”Remove internal data that no longer matches the project’s YAML configs.
Editing metrics over time leaves stale rows behind in the internal tables.
dtk clean finds and removes that drift. Both modes default to a dry-run
that only reports what would be deleted; pass --execute to actually delete.
Syntax
Section titled “Syntax”dtk clean --select <selector> [--execute] [OPTIONS] # drift modedtk clean --orphaned-metrics [--execute] [OPTIONS] # GC modeOptions
Section titled “Options”--select, -s (drift mode)
Section titled “--select, -s (drift mode)”Metric selector — same semantics as dtk run. For each selected
(still-existing) metric, removes:
_dtk_detectionsrows whosedetector_idis no longer produced by the config — i.e. you changed a detector parameter orseasonality_components(which changes the detector’s hash), or removed a detector;_dtk_alert_statesrows whosealert_config_idis no longer produced — i.e. you changed an alerting block’s functional params (channels,min_detectors,consecutive_anomalies, cooldown) or removed the block.
Datapoints are not touched — they are keyed only by (metric, timestamp)
and are never orphaned by a parameter edit. Use dtk run --full-refresh to
reload those.
--orphaned-metrics (GC mode)
Section titled “--orphaned-metrics (GC mode)”Deletes all rows, across every internal table, for metric names present in
the database but no longer defined by any YAML in the project (a renamed or
deleted metric). Operates over the whole project (ignores --select).
--execute (flag)
Section titled “--execute (flag)”Actually delete. Without it, the command only reports (dry-run).
--yes, -y (flag)
Section titled “--yes, -y (flag)”Skip the confirmation prompt for --orphaned-metrics --execute.
--profile (optional)
Section titled “--profile (optional)”Profile to use (overrides project default).
Examples
Section titled “Examples”# See what stale detector/alert data a metric has accumulated (dry-run)dtk clean --select cpu_usage
# ...then actually delete itdtk clean --select cpu_usage --execute
# Clean drift across everything matching a tagdtk clean --select "tag:critical" --execute
# List metrics in the DB that no longer exist in the projectdtk clean --orphaned-metrics
# Purge them (asks for confirmation unless -y)dtk clean --orphaned-metrics --executeSafety
Section titled “Safety”- Dry-run by default; nothing is deleted without
--execute. --orphaned-metrics --executeasks for confirmation (skip with--yes), and refuses to run if the project defines no metrics or its configs fail to parse — so a wrong directory or a duplicate-name error can’t wipe valid data.- In drift mode, if a metric’s config defines no detectors/alerting at all (so every stored row counts as orphaned), the command prints a loud warning before deleting.
- Deletes are synchronous ClickHouse mutations and idempotent — safe to re-run.
Example Output
Section titled “Example Output”Project root: /path/to/projectDRY-RUN — nothing will be deleted. Use --execute to apply.
Found 1 metric(s) to inspect
┌─ cpu_usage │ detector a1b2c3d4e5f6a7b8: would delete 4,320 detection row(s) └─ alert_config 9f8e7d6c5b4a3210: would delete stale alert state
Done. Would remove 1 detector group(s) and 1 alert-state row(s).Re-run with --execute to apply.Exit Codes
Section titled “Exit Codes”| Code | Meaning |
|---|---|
| 0 | Normal completion — including most user-facing errors (bad project dir, missing profiles.yml, config/DB connection failures), which print an error message and return |
| 2 | Click argument error (e.g. a missing required option or an invalid --steps/--from value) |
Note: detectkit does not currently exit non-zero on configuration or database errors — it reports them and returns
0. Don’t gate a scheduler on the exit code alone; check the logged output.
Environment Variables
Section titled “Environment Variables”The CLI itself defines no special environment variables, but configuration
files support environment-variable interpolation so secrets stay out of YAML.
Both ${VAR} and {{ env_var('VAR') }} syntaxes are supported:
profiles: prod: type: clickhouse host: "{{ env_var('CLICKHOUSE_HOST') }}" port: 9000 password: "${CLICKHOUSE_PASSWORD}"
alert_channels: mattermost_ops: type: mattermost webhook_url: "{{ env_var('MATTERMOST_WEBHOOK_URL') }}"Unresolved placeholders (variable not set) are kept as-is, so missing variables surface as configuration errors instead of empty strings.
Common Workflows
Section titled “Common Workflows”Initial Setup
Section titled “Initial Setup”# 1. Initialize projectdtk init my_monitoringcd my_monitoring
# 2. Edit profiles.yml (add database connection)# 3. Create metric config in metrics/
# 4. Run metricdtk run --select my_metricDaily Operations
Section titled “Daily Operations”# Run all metrics (typically in cron/scheduler)dtk run --select "*"
# Run critical metrics onlydtk run --select "tag:critical"
# Run specific metric manuallydtk run --select cpu_usageBackfilling Historical Data
Section titled “Backfilling Historical Data”# Load last 30 daysdtk run --select cpu_usage --from "2024-02-01"
# Load specific rangedtk run --select cpu_usage \ --from "2024-01-01" \ --to "2024-02-01"Reprocessing After Configuration Changes
Section titled “Reprocessing After Configuration Changes”# Detector config changed → rerun detectiondtk run --select cpu_usage --steps detect --full-refresh
# Query changed → reload datadtk run --select cpu_usage --full-refresh
# Detector/alert params changed → prune the now-orphaned old resultsdtk clean --select cpu_usage # previewdtk clean --select cpu_usage --executeTesting and Debugging
Section titled “Testing and Debugging”# Test alert channelsdtk test-alert cpu_usage
# Load data only (verify query works)dtk run --select cpu_usage --steps load
# Detect only (verify detector works)dtk run --select cpu_usage --steps detectEmergency Operations
Section titled “Emergency Operations”# Clear a stuck lock left by a crashed run (e.g. DB restarted mid-run)dtk unlock --select cpu_usage
# Force run if previous run crashed (also clears the stuck lock on exit)dtk run --select cpu_usage --force
# Full refresh if data is corrupteddtk run --select cpu_usage --full-refreshScheduling
Section titled “Scheduling”Cron (Linux/Mac)
Section titled “Cron (Linux/Mac)”# Run all metrics every 10 minutes*/10 * * * * cd /path/to/project && dtk run --select "*" >> /var/log/detectkit.log 2>&1
# Run critical metrics every 5 minutes*/5 * * * * cd /path/to/project && dtk run --select "tag:critical" >> /var/log/detectkit.log 2>&1systemd Timer (Linux)
Section titled “systemd Timer (Linux)”Create /etc/systemd/system/detectkit.service:
[Unit]Description=detectkit metric monitoring
[Service]Type=oneshotWorkingDirectory=/path/to/projectExecStart=/usr/local/bin/dtk run --select "*"User=detectkitCreate /etc/systemd/system/detectkit.timer:
[Unit]Description=Run detectkit every 10 minutes
[Timer]OnBootSec=1minOnUnitActiveSec=10min
[Install]WantedBy=timers.targetEnable:
systemctl enable detectkit.timersystemctl start detectkit.timerTask Scheduler (Windows)
Section titled “Task Scheduler (Windows)”# Create scheduled task to run every 10 minutes$action = New-ScheduledTaskAction -Execute "dtk" -Argument "run --select *" -WorkingDirectory "C:\projects\my_monitoring"$trigger = New-ScheduledTaskTrigger -Once -At (Get-Date) -RepetitionInterval (New-TimeSpan -Minutes 10)Register-ScheduledTask -TaskName "detectkit" -Action $action -Trigger $triggerDocker Cron
Section titled “Docker Cron”FROM python:3.11-slim
# Install detectkitRUN pip install detectkit[clickhouse]
# Install cronRUN apt-get update && apt-get install -y cron
# Copy project filesCOPY . /appWORKDIR /app
# Add cron jobRUN echo "*/10 * * * * cd /app && dtk run --select '*' >> /var/log/cron.log 2>&1" | crontab -
# Start cronCMD ["cron", "-f"]Best Practices
Section titled “Best Practices”1. Use Selectors Effectively
Section titled “1. Use Selectors Effectively”# Good: Specific selectordtk run --select "metrics/critical/*.yml"
# Avoid: Selecting all when not neededdtk run --select "*"2. Test Before Scheduling
Section titled “2. Test Before Scheduling”# Always test manually before adding to crondtk run --select my_metricdtk test-alert my_metric3. Log Output
Section titled “3. Log Output”# Redirect to log file for troubleshootingdtk run --select "*" >> /var/log/detectkit.log 2>&14. Use —steps for Development
Section titled “4. Use —steps for Development”# Test query without detectiondtk run --select my_metric --steps load
# Test detector without alertingdtk run --select my_metric --steps load,detect5. Be Careful with —force
Section titled “5. Be Careful with —force”# Only use --force if you're sure no other process is running# Check processes first:ps aux | grep dtkTo recover from a crashed run (no live process), prefer dtk unlock — it
clears the stale lock without running the pipeline concurrently. A stuck lock
also auto-expires after 1 hour, so often no manual action is needed at all.
Troubleshooting
Section titled “Troubleshooting””Metric not found”
Section titled “”Metric not found””Cause: Selector doesn’t match any metrics.
Solution: Check metric name and file path:
# List metric filesls metrics/
# Try exact matchdtk run --select cpu_usage # Not metrics/cpu_usage.yml“Task is locked” / “Failed to acquire lock”
Section titled ““Task is locked” / “Failed to acquire lock””Cause: Previous run is still in progress, or it crashed/was killed with the
running lock held. The most common crash cause is the database restarting
mid-run, which leaves a stale running row in _dtk_tasks.
Solution:
# Check if a process is actually still runningps aux | grep dtk
# If no process is running, clear the stuck lock immediately:dtk unlock --select cpu_usage
# (Or just wait — a stale lock auto-expires after 1 hour and the next# normal run overrides it. --force also clears it on exit.)“Connection refused”
Section titled ““Connection refused””Cause: Can’t connect to database.
Solution: Check profiles.yml and database connectivity:
# Test ClickHouse connectionclickhouse-client --host=<host> --port=<port>“No data loaded”
Section titled ““No data loaded””Cause: Query returns empty result.
Solution: Test query manually in database client with sample dates.
See Also
Section titled “See Also”- Configuration Guide - Configure metrics
- Detectors Guide - Configure detectors
- Auto-tuning Guide - Auto-configure a detector with
dtk autotune - Auto-tune Reference -
dtk autotuneflags, labels schema, scoring metrics - Alerting Guide - Configure alerts
- Quickstart Guide - Getting started tutorial