dtk autotune automatically configures a metric’s detector from its data — and,
if you can supply them, from labeled incidents. Instead of hand-picking a
detector type, threshold, window, seasonality and alert window, you point
autotune at a metric and it searches for the best configuration, then writes a
new, annotated metric YAML you can review and run.
It is a separate pipeline from load → detect → alert: it reads the metric’s
already-loaded _dtk_datapoints, runs a cross-validated search, and emits
metrics/<name>__tuned_<id>.yml. It never edits your original config and never
sends alerts.
Fastest path: let Claude Code drive the whole flow. Run
dtk init-claude, then use the
dtk-autotune skill — it runs the seasonality interview, writes the
incidents file, runs dtk autotune, and explains the chosen config and how it
behaves against your database, conversationally.
Autotune searches four (or five, when supervised) dimensions and cross-validates
every choice:
Seasonality — greedily builds the best seasonality_components grouping
from the metric’s available seasonality columns: the built-ins
(hour, day_of_week, day_of_month, month, is_weekend) plus any
columns your query declares. (is_holiday is skipped — the holiday calendar
isn’t implemented yet, so it is always false and carries no signal.)
Detector type — a distribution decision tree votes per seasonality group:
Gaussian / light-tailed → zscore; heavy tails or outliers → mad; skewed →
iqr. The winners are shortlisted.
Hyperparameters — a bounded coordinate grid search over threshold,
recency weighting, detrending and window_size, maximizing a cross-validated
score.
History window — prefers a larger window_size on near-ties (“more
history is better”), and sets loading_start_time to cover the lead-in (and
pins the detector’s start_time to it, so the first dtk run detects across
all loaded history).
Alert window (supervised only) — sweeps consecutive_anomalies against
the labeled incidents.
Cross-validation is automatic walk-forward (expanding-window) folds — there
are no split ratios to choose.
A tuned config is an ordinary detectkit config: one chosen detector reusing the
same windowed detectors and the same detector_id
identity as everything else.
Autotune reads the metric’s already-loaded datapoints from
_dtk_datapoints. If a metric has never run, load it first — and load enough
history, since more history tunes better:
Terminal window
# Load the metric (optionally backfill more history with --from)
By default autotune searches all loaded datapoints (capped at the most
recent 50,000 points unless you raise autotune.max_history). To tune against a
specific slice of history — without re-loading — pass --from / --to to the
autotune command itself (UTC, YYYY-MM-DD or YYYY-MM-DD HH:MM:SS):
Terminal window
# Tune only on spring 2026, even if years of history are loaded
When you can tell autotune which points were real incidents, it optimizes
directly against them — picking the detector, threshold and alert window that
catch your incidents while keeping false positives down.
The incidents (labels) file is the contract. It is YAML or JSON; all times are
UTC, and each incident is either an interval ({start, end}) for a
sustained problem or a point ({at}) for a single spike:
incidents/api_error_rate.yml
metric: api_error_rate# optional; must match the metric being tuned
timezone: UTC# optional; interprets the naive times below
incidents:
- start: "2026-05-02 14:00:00"
end: "2026-05-02 16:30:00"
label: payment-gateway outage# optional, free text
- at: "2026-05-11 09:05:00"# a single anomalous point
dtk init scaffolds an incidents/ directory beside metrics/ with an example
labels file, so the layout above is ready to fill in.
Prefer to keep labels in the metric config? Declare the same incidents
inline under the metric’s autotune: block instead of a separate file — handy
for a metric with one or two known incidents:
incidents_timezone: UTC# optional; interprets the naive times above
incidents and labels_file are mutually exclusive. The --incidents flag
still overrides either.
Can’t enumerate the incidents from memory? Run
dtk autotune --select api_error_rate --label. It opens a local browser
labeler of the series; click-drag across the chart to mark each real incident
(or Threshold capture to grab every span past a line at once, Lasso
capture to loop around a cloud of outliers, and the chart-side ✕ / Delete
to remove one), then Save & tune writes the
labels into incidents/api_error_rate/ and tunes on them in the same command.
Re-running --label re-opens the newest set so you can keep editing over time.
See the --label reference for the
static --no-serve variant.
└─ Re-run with: dtk run --select api_error_rate__tuned_3f9c1a2b
Done. Tuned 1 metric(s), 1 succeeded.
Reading it top to bottom: the LABELS line confirms how many of your incidents
landed on loaded grid points (and whether the run is supervised); SEASONALITY
/ DETECTOR SELECT / GRID SEARCH / WINDOW show each chosen dimension
with its cross-validated score; RESULT names the winning detector, its
per-fold CV scores, and the file it wrote. An unsupervised run looks the same
minus the WINDOW block (no labeled incidents to sweep the alert window against).
Add --dry-run to print this whole tree without writing the config, the
detections, or the audit row — handy to preview what autotune would choose.
If you pass no labels — no --incidents, no labels_file in the config — tuning
falls back to an unsupervised objective that rewards a low false-positive
rate and stable, clean separation across folds:
Terminal window
dtkautotune--selectapi_error_rate
This still picks a detector, hyperparameters, seasonality and window; it just
cannot optimize for your notion of an incident. Use it to get a sane starting
configuration, then refine with labels later.
In an interactive terminal, before falling back, autotune first asks
whether you want to enter the incidents now (No incident labels provided. Enter them now?) — answer No for the unsupervised path above, or type
incident windows at the prompt for a quick supervised run without writing a
file. In a non-interactive context (cron, CI, piped input) there is no
prompt: it goes straight to unsupervised.
Note also that supervised mode only engages if your labeled incidents actually
land on loaded datapoints. If every labeled timestamp falls outside the
loaded series (e.g. the history wasn’t backfilled far enough), no grid point
is marked and autotune silently runs unsupervised — load the incident window
first (see below).
Autotune maximizes a single scoring metric across the walk-forward folds. The
default is MCC (Matthews correlation coefficient), which uses the whole
confusion matrix and is well-suited to rare anomalies. Override it with
--scoring:
Terminal window
# Favor catching every incident (recall) over avoiding false pages
dtkautotune--selectapi_error_rate\
--incidentsincidents/api_error_rate.yml\
--scoringf_beta
Scoring metric
Use when
mcc (default)
Balanced, robust to rare anomalies — a safe default
f1
You weight precision and recall equally
f_beta
You want to tilt toward recall (a miss is worse than a false page) or precision
balanced_accuracy
Class balance matters and you want both rates weighted equally
roc_auc
You care about ranking/separability across thresholds
pr_auc
Heavily imbalanced data — emphasizes the positive (anomaly) class
See the scoring-metrics catalog for
one-line definitions of each. The recall-vs-precision trade-off is the usual
knob: optimize for recall (f_beta tilted toward recall) when missing an
incident is the expensive outcome; optimize for precision when false pages are.
The emitted YAML leads with a # comment block that walks every decision
before the real config begins:
the training period and the labels used,
the seasonality rationale (why those seasonality_components),
the detector votes (which distribution the data looked like, per group),
the grid-search winner with its CV score and per-fold scores,
and the window choice.
Read this header to understand why the configuration looks the way it does
before trusting it. Below the header is an ordinary metric config — a single
chosen detector with the chosen seasonality, and your query/alerting carried
over.
Each run is also recorded as one row in the _dtk_autotune_runs audit table
(see Inspecting the search and the
reference).
dtktest-alertapi_error_rate__tuned_<id># if alerting is configured
If you hand-edit the detector below the comment header, you change its
parameters — and a detector’s identity is a hash of its parameters, so the old
detections orphan under the previous detector_id. Recompute under the new id
and prune the orphans:
You can pin or constrain the search by adding an autotune: block to a metric
YAML. It is fully optional — absent means “tune everything automatically”:
autotune:
enabled: true
detector_types: [mad, zscore] # restrict candidates (subset of mad/zscore/iqr)
scoring_metric: mcc# default optimization target
beta: 1.0# only used for scoring_metric: f_beta
labels_file: incidents/orders.yml# external labels file, OR inline (below)
# incidents: # inline labels — mutually exclusive with labels_file
# incidents_timezone: UTC # interprets the naive times above (default UTC)
seasonality_candidates: [hour, day_of_week]
fixed_params: {window_size: 4320} # pin hyperparameters (excluded from the search)
folds: 5# number of walk-forward folds
max_history: 50000# cap training points
Command-line flags win: --scoring and --incidents override the block’s
scoring_metric / labels_file / incidents. See
autotuned-metric-example.yml for a
worked block, and the reference
for every field.
It writes a self-contained reports/<name>__tuned_<id>.html charting the winning
detector’s values, confidence band, flagged anomalies and the alerts it would
fire, with a period selector — no BI or SQL setup, nothing leaves your browser.
See Visualizing results for the full picture (and
dtk run --select <m> --report for the live config).
To query the raw rows instead, join recent datapoints with its detections —
value vs confidence_lower / confidence_upper vs is_anomaly vs severity
— for the run’s winning detector_id. Get that id from the latest
_dtk_autotune_runs row:
JSON_EXTRACT(detection_metadata, '$.severity') AS severity
FROM<internal>._dtk_detections
WHERE metric_name ='api_error_rate'
AND detector_id ='<winning_detector_id>'
ANDtimestamp>=NOW() - INTERVAL 7DAY
ORDER BYtimestamp;
For full charting recipes (the value with its confidence band, anomaly markers,
severity breakdowns) point any BI tool at these tables — see
Visualizing Results.