Tuning a Detector by Hand
dtk tune lets you tune a metric’s detector interactively, on its real data,
and then write the config you settled on back into the metric — safely. It is the
manual, human-in-the-loop sibling of dtk autotune: instead of
searching automatically, you turn the detector’s knobs and watch the confidence
band, flagged anomalies and would-fire alerts recompute live in the browser, then
click Apply to commit.
It reads the metric’s already-loaded _dtk_datapoints and recomputes everything
client-side — the same faithful detector port that powers the landing playground,
fed your real series instead of synthetic data. No data leaves the machine.
dtk tune vs dtk autotune
Section titled “dtk tune vs dtk autotune”Two complementary ways to optimize a metric:
dtk autotune | dtk tune | |
|---|---|---|
| Who chooses | the engine (cross-validated search) | you, by eye, on the real series |
| Feedback | a decision log after the fact | the band recomputes live as you drag a slider |
| Output | a new metrics/<name>__tuned_<id>.yml (original untouched) | the metric YAML, edited in place (previous version archived) |
| Best when | you have labels or want a strong starting point | you know the metric and want to dial it in by feel |
A natural workflow is to use both: let dtk autotune propose a config, then
dtk tune to refine it by eye and commit.
Prerequisites
Section titled “Prerequisites”Tuning reads the metric’s persisted datapoints, so load some history first:
dtk run --select api_error_rate --steps load --from "2026-01-01"Tune interactively
Section titled “Tune interactively”dtk tune --select api_error_rateThis starts a local 127.0.0.1 server and opens your browser. The selector must
resolve to a single metric. Restrict the window shown with --from / --to:
dtk tune --select api_error_rate --from 2026-05-01 --to 2026-06-01In the browser you can adjust:
- Detector — MAD, Z-Score, IQR (all windowed statistical) or Manual (fixed bounds; see below). Switching to Manual swaps the windowed knobs for the bound sliders.
- Threshold — interval width in σ-equivalent units.
- Window size — the trailing window each point is compared against. The
readout shows the equivalent wall-clock span on the metric grid next to the
point count (e.g.
2000 · 83d 8h), so “how much history is this window” reads at a glance. - Recency weighting + half-life — none / exponential / linear, with the half-life (in points) when exponential. Half-life also echoes its wall-clock span next to the point count.
- Detrend — none / linear (robust split-median slope).
- Smoothing — none / EMA / SMA.
- Lower bound / Upper bound (Manual detector only) — the fixed
thresholds a value is compared against. They are seeded from the metric’s bounds
(or the data’s p5/p95 band when switching from a windowed detector) and ranged
over the real value domain, so you can drag them in and watch how many points
fall outside (and how many alerts that yields). Apply writes a stateless
manual_boundsdetector. - Seasonality groups — assign each seasonality column the metric has to a
group (Off, G1, G2, …). Columns in the same group are conjoined into one
seasonal key (e.g.
dow×hour); separate groups each apply their own correction. This is the fullseasonality_componentsgrouping — you can mix one conjunctive group with other standalone columns, not just “all-separate” or “all-in-one”. - Direction — both / up / down: which anomalies are shown and counted
toward alerts. Pick up to focus on spikes above the band, down for drops
below it. It is a preview filter mirroring the alert
directionpolicy (seeded from the metric’s alerting, with the multi-detectorsamereading asany) — it never changes the band itself. - Alert: consecutive anomalies — the alert window (
consecutive_anomalies).
Every control carries an ⓘ tooltip explaining what it does. The confidence band, the flagged points and the would-fire alert markers update on every change (a small computing… spinner shows while a recompute is in flight), a legend labels the series / band / center / anomalies / alerts, and the “effective config” readout shows exactly what will be written.
Navigate a dense series
Section titled “Navigate a dense series”The chart is zoomable — scroll to zoom where you point, drag to pan, double-click to reset, and drag the navigator strip below the chart to move the view (the strip shows the whole series, the alert firings as red ticks, and a time axis). Zooming in lets you inspect alert quality region-by-region on a long, busy metric.
A Points shown slider above the chart trims the active sample to the most recent N points. Recompute cost grows with points × window, so once you can see a shorter period is enough, trimming it makes every knob-drag noticeably faster (and the period easier to read). Trimming only affects the live view — it never changes what Apply writes.
A y = 0 line toggle draws a horizontal reference line at zero and folds zero into the vertical scale, so a real-valued metric (one best read relative to zero) shows where it sits against zero. It is also available on the HTML report. Off by default.
One chart, three modes
Section titled “One chart, three modes”dtk tune is a chart-first cockpit: a single chart fills the screen (the
windshield), the live metrics ride pinned over the chart (your speedometer —
always in view), and every control lives in an always-visible side rail beside
the chart — so the first thing you do is turn a knob and watch the band, with no
scrolling. The rail is mode-aware: it shows only the controls the current mode
needs (the detector knobs + Apply in Tune, the verdict actions in Review, the
capture tools + Save in Label), and collapses to give the chart the whole width.
The controls that aren’t detector-specific — the Points shown data window, the
alert rule (direction + consecutive anomalies) and the y = 0 toggle —
stay visible in every mode, since they shape the band, the alerts you review, and
the recall/FDR you watch while labeling. A
mode switch above the chart picks the job; the layers that don’t matter to it
dim to context instead of competing for pixels:
- Tune — steer the band. The confidence corridor leads, marked incidents recede to read-only context, and hovering a point shows the trailing window that scored it.
- Review — confirm the fired alerts (see below). The band ghosts so the alert markers lead.
- Label — mark the real incidents. The band hides so incidents lead, and the capture tools (Lasso / Threshold) are armed.
Confirm the alerts (Review mode)
Section titled “Confirm the alerts (Review mode)”Often a config is already good — the alerts that would fire all look real. Rather than hand-draw an incident for each, switch to Review and click an alert marker to cycle its verdict:
- red → not yet reviewed
- green → valid (you confirmed it’s a real alert)
- slate → false alarm
Confirming an alert valid is just a fast way to mark an incident. A valid alert
is you asserting a real incident happened here, so the confirmed streak becomes a
first-class incident: it shows up in the Marked incidents list (in Label mode) as
a read-only ”✓ confirmed alert” row — focus it, or remove it to un-confirm the
alert — and it counts toward recall and as a correct alert. So a clean metric can be
validated in a few clicks without drawing any spans. Confirm all unreviewed
valid does the lot. Confirmed alerts are written as incidents on Save, so they
feed the next supervised dtk autotune too; the verdicts themselves
also persist as alert_reviews metadata and re-seed (re-bound to the moved alerts by
streak overlap) when you reopen. A confirmed incident stays in the ground truth even
if you then tune the detector so it no longer fires there — which correctly shows up
as a recall miss, not a silent disappearance.
Mark incidents (Label mode)
Section titled “Mark incidents (Label mode)”To mark ground truth directly, switch to Label:
- Drag across the chart to mark an incident span; drag its edges to adjust, drag its middle to move, and click its ✕ (or select it and press Delete) to remove it.
- Lasso anomalies — the fastest way to turn what the detector flags into ground
truth: click Lasso anomalies, then draw a freeform loop around a cloud of
anomaly dots. Each run of consecutive anomalies (small gaps — up to your
consecutive_anomaliessetting — are bridged) becomes one proper incident span sized to the run, not a single point; a separate burst inside the loop becomes its own incident. - Threshold capture — grab every contiguous span past a horizontal line in one
shot (the same tool as the autotune labeler): click to set the
line (or type a value), choose above/below, optionally bridge gaps, and
drag across the chart to limit the capture to a time window. Add N spans marks
them all. Each captured span is widened to a full interval, so a single matching
point becomes a real incident the alert lands inside; the painted window is saved
as
capture_windowsand restored on reopen.
Already-saved incidents are seeded from the newest file in incidents/<metric>/
when dtk tune opens, and the (budget-sized) loaded window is anchored on your
incidents — it ends just past the latest one rather than at the last datapoint —
so they render and count without loading the whole history. Incidents older than the
loaded window stay in the list but aren’t scored; pass --from/--to to tune
against a specific older window.
Read the alert quality
Section titled “Read the alert quality”As you tune, the metrics bar under the chart recomputes:
- Incident catch rate (recall) — what share of the ground-truth incidents (marked + confirmed-valid alerts) your config catches. An incident counts as caught when an alert’s whole anomaly streak overlaps it — not just the instant the alert fires (which lands a few intervals into the streak), so a streak that clearly covers an incident is scored as caught.
- False-alert rate — what share of fired alerts fall outside every incident and aren’t confirmed valid, shown as a percentage and as “≈1 in N false”. The complement is the share of alerts that are correct.
- Reviewed N/M — how many of the fired alerts you’ve looked at (and how many you confirmed valid).
The marked incidents and the confirmed-valid alerts are one ground-truth set, so it never matters whether you draw a span or confirm an alert — both feed recall and the false-alert rate, and both are saved.
A false-alert budget (optional)
Section titled “A false-alert budget (optional)”You can give a metric a target false-alert rate so the cockpit tells you when you’ve drifted past it:
# metrics/<name>.ymlfalse_alert_budget: 0.3 # at most 30% of fired alerts should be falseor project-wide as a default (a per-metric value wins):
false_alert_budget: 0.3When the false-alert rate exceeds the budget, the false alerts chip flags it
(▲ over 30% budget) — gently, never blocking anything. Unset, a lax built-in
default of 0.5 is used. This is purely a tuning aid: it only colours a number you
can already see, it never affects the load/detect/alert pipeline, and labeling stays
entirely optional — mark a short window when you want to put a number on your error,
or ignore it and just work with the alerts.
This is the loop the cockpit was built for: pick a detector, see the flagged points and the alerts they’d fire, confirm the good ones (or mark the real incidents), and tune until you catch what you care about without drowning in false alerts.
Click Save incidents to persist the marked spans to
incidents/<metric>/<metric>-<timestamp>.yml — the same versioned store
dtk autotune reads, so the labels you draw here also feed the
next supervised auto-tune (one source of truth). dtk tune seeds the labeler from
the newest file in that directory when it opens, so labeling round-trips across both
tools. Saving incidents does not end the session (only Apply does) — keep
adjusting and save again, or save labels and then tune the detector against them.
Apply the config back
Section titled “Apply the config back”Click Apply to metric. detectkit then, in order:
- Validates the chosen detector through the same
DetectorFactoryandMetricConfigthe pipeline uses — a broken or untunable config is rejected and nothing is written (fix the knobs and click Apply again). - Archives the current metric YAML verbatim (comments and all) to
metrics/.history/<metric>/<metric>-<timestamp>.yml, so you keep a trackable history of chosen parameters and can always recover the previous version. - Re-emits the metric file in place with the tuned detector — the
detectorslist becomes the single tuned detector, and the firstalertingblock’sconsecutive_anomaliesis updated if the metric has one.
dtk tune takes no pipeline lock — it only edits a config file. The live
preview is a faithful approximation; the next dtk run is the source of truth.
Because the detector parameters changed, the detector’s identity changes too, so
detections recompute under the new configuration on the next run:
dtk run --select api_error_ratePreview without writing (--no-serve)
Section titled “Preview without writing (--no-serve)”To share or inspect the interactive view without any write-back, write a static HTML file instead of serving:
dtk tune --select api_error_rate --no-serveThis writes metrics/<metric>__tuner.html. The sliders still recompute the band
live and you can still mark incidents, but there is no Apply button — the file
is read-only, and Save incidents downloads the labels file (drop it into
incidents/<metric>/ yourself) instead of writing it directly.
See also
Section titled “See also”- Auto-tuning a Detector — the automatic search.
- Visualizing Results — the read-only HTML report and BI recipes.
- Detectors — what each parameter does.