Changelog
All notable changes to detectkit will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.40.0] - 2026-06-27
Section titled “[0.40.0] - 2026-06-27”dtk tune: confirmed alerts now show up as incidents, and an optional false-alert budget. Two connected changes to the manual-tuning cockpit:- Confirming an alert is marking an incident. A valid alert (the green markers from Review mode) is now a first-class entry in the Marked incidents list — a ”✓ confirmed alert” row you can focus or remove (removing it un-confirms the alert). The list, the live recall / false-alert metrics, and Save incidents all read one ground-truth set (hand-marked spans plus confirmed-valid alerts, deduped by overlap so neither is counted twice), so “validate the alerts” is simply a fast way to label incidents — no hand-drawn span needed, and what you confirmed is exactly what gets saved. Confirmed-valid spans are now derived from the stored verdict rather than the current fire, so a confirmed incident stays in the ground truth (and correctly registers as a recall miss) even if you then tune the detector so it no longer fires there. Fixes a latent double-count after a Save→reopen (the same incident was seeded as both an incident and a review).
- Optional false-alert-rate (FDR) budget. New
false_alert_budgetconfig (a fraction in(0, 1], e.g.0.3= 30%) on a metric (priority) and the project (default); unset → a built-in default of0.5. The quality bar flags — gently, non-intrusively — when your false-alert rate exceeds the budget (the “false alerts” chip turns and reads▲ over 30% budget). Labeling stays entirely optional and the budget never affects the load/detect/alert pipeline — it only colours an already-computed number, so you can ignore it or label a short window to put a number on your error. Regenerateddetectkit/tuning/assets/tune.js.
[0.39.2] - 2026-06-27
Section titled “[0.39.2] - 2026-06-27”Changed
Section titled “Changed”dtk tunecolour legend moved to the top, visible in every mode. The chart colour key (alert markers — red fired / green confirmed valid / slate false alarm — plus anomaly dot, metric line, expected range and band centre) was in the stage footer below the chart, where it was easy to miss. It is now a pinned legend bar directly under the HUD, above the chart, leading with the three alert colours, so the marker colours are decoded almost immediately — and because it lives in the stage (not the mode-aware rail) it stays put across Tune / Review / Label. Regenerateddetectkit/tuning/assets/tune.js.
[0.39.1] - 2026-06-27
Section titled “[0.39.1] - 2026-06-27”Changed
Section titled “Changed”dtk tunerail refinements. The “effective config” readout in the rail footer is now collapsed by default (a one-line clickable header — click to expand) so the knob column gets more vertical room; it stays up to date while hidden, so it shows the current config the moment it’s opened.- Controls that aren’t detector-specific now stay visible in every mode instead of only in Tune: the Points shown data-window trim at the top of the rail, and the alert rule (Direction — which way the alert fires — and consecutive anomalies — how many in a row) plus the Show y = 0 line view toggle at the bottom. They frame the band, the alerts you review, and the recall/FDR you watch while labeling, so they apply to all three modes; only the detector knobs / verdict actions / capture tools swap with the mode.
[0.39.0] - 2026-06-27
Section titled “[0.39.0] - 2026-06-27”Changed
Section titled “Changed”dtk tunecockpit reworked into a chart-windshield + a mode-aware control rail. The controls no longer sit in a dock below the chart (where reaching a knob meant scrolling down, then scrolling back up to watch the band). Now the chart fills the screen as the windshield, the live metrics ride pinned in a HUD over the chart (the speedometer — always in view across every mode), and every control lives in an always-visible side rail beside the chart with its own scroll — so you turn a knob and watch the band change without scrolling or dropping your gaze. Collapse the rail (⟩) to hand the chart the whole width; a slim tab brings it back (the chart re-fits via aResizeObserver).- The control rail is mode-aware — it shows only the panel the current mode needs instead of every control at once: the detector knobs + the effective-config echo + Apply in Tune, the verdict actions in Review, and the Threshold capture / Lasso anomalies tools + the incident list + the Save incidents field in Label (previously the capture tools were easy to miss and the effective-config / Save controls hung around in every mode). The rail header renames to the active mode’s panel.
[0.38.0] - 2026-06-27
Section titled “[0.38.0] - 2026-06-27”dtk tuneis now a chart-first cockpit on ONE chart with three modes. The detector and labeler charts are merged into a single windshield that fills the screen; every control lives in a collapsible dock under the chart, and the live metrics sit right beneath it (no more scrolling past the chart to reach the knobs). A mode switch drives which layers lead and which interactions are armed: Tune (the band leads; incidents recede to read-only context; hover a point for its window), Review (the fired alerts lead; the band ghosts), and Label (the band hides; incidents are editable; threshold/lasso capture armed). The non-active layers dim to context instead of competing for pixels, so one canvas does the job two stacked half-charts used to.- Validate fired alerts right on the chart. Click an alert marker to cycle its
verdict un-reviewed (red) → valid (green) → false alarm (slate) — on the one
chart, in Tune or Review mode; Confirm all unreviewed valid does the lot. A
confirmed alert is the user asserting a real incident happened there: it counts
as caught (recall) and correct (FDR) — so a clean metric whose alerts are all
good can be validated in a few clicks without hand-drawing incident spans —
and it is written as a normal incident on Save, so confirming alerts also
feeds the next supervised
dtk autotune. The metrics bar gains a reviewed N/M chip. Verdicts persist as analert_reviews:metadata block (re-bound to the moved alerts by streak-span overlap on reopen; autotune ignores the block).
Changed
Section titled “Changed”- The two synced
dtk tunecharts are replaced by the single mode-driven chart (less vertical budget, no cross-chart sync machinery). The shared chart engine gains amode(tune/review/label) with a per-layer full/dim/hidden model; the landing playground (nomode/labeling) renders exactly as before.
[0.37.0] - 2026-06-27
Section titled “[0.37.0] - 2026-06-27”- Lasso capture in the incident labelers — turn a cloud of anomalies into proper
incidents in one gesture. In
dtk tune, the labeler chart now mirrors the detector’s anomaly dots, and a new Lasso anomalies tool lets you draw a freeform loop around a cluster: each run of consecutive anomalies (small gaps bridged, up to yourconsecutive_anomalies) collapses into one incident span sized to the run — not a point — while a separate burst inside the loop becomes its own incident. This is the intended tuning loop: tighten the band, lasso the real anomalies it surfaces, watch the metrics update. The standalone autotune labeler (dtk autotune --label) gains the same Lasso capture over raw points (no detector there), grouping consecutive points into interval incidents.
dtk tuneundercounted the incident catch rate (recall). An incident was scored as caught only when an alert’s single fire timestamp landed within ±½ interval of its span — but an alert firesconsecutive_anomalies − 1intervals into the anomaly streak, so a streak that visibly covered an incident was marked missed (e.g. 27% recall shown while almost every incident was caught). Recall/FDR now match an incident against each alert’s whole anomaly-streak span by overlap (the worker returnsfireSpansalongsidefires), so a streak covering an incident counts as caught.- Threshold capture produced near-zero-width “point” incidents that the fired alert landed just outside of. Each captured span is now widened to a full grid interval (half each side), so a single matching point becomes a real incident.
- The “≈1 in N false” false-alert readout rounded a mostly-false rate down to a misleading “1 in 1”. It now keeps one decimal below 10 (e.g. a 73%-false rate reads “≈1 in 1.4 false”) so the framing matches the percentage beside it.
[0.36.2] - 2026-06-25
Section titled “[0.36.2] - 2026-06-25”dtk tuneloaded the entire history (and hung the recompute) when a metric had many saved incidents. The 0.36.0 window-widening pulled the loaded window back to the earliest seeded incident, so a single old outlier among the incidents dragged in the whole series (e.g. 33k points instead of the budgeted ~9k) and the client-side recompute — O(points × window) — never finished. The window is now kept budget-sized (default_window_points) and anchored on the incident region: it ends just past the latest incident (with a few windows of recovery context) rather than at the last datapoint, so recent incidents still render and score while the load stays bounded. Incidents older than the loaded window remain in the list (and are excluded from the live metrics); use--from/--toto tune against a specific older window. Removes the now-unreachable_TUNE_INCIDENT_MAX_POINTSceiling.
[0.36.1] - 2026-06-25
Section titled “[0.36.1] - 2026-06-25”dtk tunecrashed withTypeError: can't compare offset-naive and offset-aware datetimeswhen widening the window to seeded incidents on a backend that returns tz-aware timestamps. The 0.36.0 window-widening compared the DB’s last-datapoint timestamp (tz-aware on some backends) against an incident start parsed from a naive-UTC display string. The earliest incident is now aligned to the DB timestamp’s awareness (both represent UTC) before the comparison, sodtk tuneopens for metrics with saved incidents regardless of backend.
[0.36.0] - 2026-06-25
Section titled “[0.36.0] - 2026-06-25”dtk tune: seeded incidents now render on the chart and count toward the live metrics. Previouslydtk tuneonly loaded the most-recent slice of the series, so any incident fromincidents/<metric>/older than that slice showed in the Marked incidents list but never on the chart — and dragged the recall metric down because it could never be caught. The loaded window is now widened back to cover the seeded incidents (with leading context for the detector’s window, clamped to the first datapoint and a_TUNE_INCIDENT_MAX_POINTSceiling), and the catch-rate / false-alert metrics only score incidents that overlap the loaded (possibly trimmed) window so an out-of-range label can’t mechanically skew them.
dtk tune: Threshold capture in the incident labeler. The labeler chart gains the same productivity tool as the autotunehtml_labeler: toggle Threshold capture, set a horizontal line (click the chart or type a value), choose above/below, optionally bridge gaps of a few intervals, and optionally drag across the chart to limit the capture to a time window — then Add N spans marks every contiguous run past the line in one click (overlapping spans merge into existing incidents). The painted window is persisted ascapture_windowsin the saved labels file and restored whendtk tunereopens (pure metadata —dtk autotuneignores it). Implemented in the shareddemo/chart.tslabelingmode (setThresholdMode+ anonThresholdChangecallback); the landing playground is untouched (the tool is off by default). The committeddetectkit/tuning/assets/tune.jsbundle is regenerated.
[0.35.0] - 2026-06-25
Section titled “[0.35.0] - 2026-06-25”Changed
Section titled “Changed”-
Alert timing fields renamed so the onset can’t be mistaken for the alert time, and recovery now shows the full timeline. The previously ambiguous Started / Latest / Cleared labels are now self-describing:
- anomaly alerts show Anomaly began (the resolved onset — the first anomalous point) and Latest reading (the most recent point);
- recovery alerts show the full Anomaly began → Alert fired → Recovered
timeline, where Alert fired is the on-grid moment the rule first tripped
(
onset + (consecutive_required − 1) × interval).
This fixes the confusion where “Started” could read as when the alert fired rather than when the metric first went bad — the two differ whenever the rule waits for several consecutive intervals. Applies to every channel (Slack/Mattermost/webhook, Telegram, email) and the plain-text
{window_line}. A new{fired_display}template variable exposes the alert-fire moment (empty when the run predates the lookback window or no interval is wired in). Purely a rendering change — no detector-ID resets and no stored-data changes.
[0.34.0] - 2026-06-25
Section titled “[0.34.0] - 2026-06-25”-
dtk tuneis now a full config cockpit: mark real incidents and see alert quality live. Beneath the detector chart there is a synced incident-labeler chart — drag to mark a real incident span, drag its edges to adjust / its middle to move, click its ✕ (or select + Delete) to remove. The two charts share x-zoom/pan, y-scale and the “Points shown” trim, and the detector chart overlays the same spans (read-only) so alerts vs incidents read together. A prominent metrics bar updates as you tune, with two operator-facing numbers:- incident catch rate (recall) — what share of the marked incidents your current config actually catches; and
- false-alert rate (FDR / type-I control) — what share of fired alerts fall outside any real incident, shown as a percentage and “≈1 in N false”.
Save incidents writes a versioned
incidents/<metric>/<…>.yml(the same storedtk autotunereads), so a labeling round indtk tunealso feeds the next supervised autotune — one source of truth.dtk tuneseeds the labeler from the newest file in that directory on open. Saving labels does not end the session (only Apply does);dtk tune --no-servedownloads the labels file instead. The labels schema, validation and versioned filenames are shared with the autotune labeler. -
y = 0reference line on thedtk tuneanddtk run --reportcharts. A toggle draws a horizontal line at zero and folds 0 into the vertical scale, so a real-valued metric can be read relative to zero. Off by default; the landing playground is unchanged.
[0.33.0] - 2026-06-25
Section titled “[0.33.0] - 2026-06-25”dtk tune: the window slider now reflects (and preserves) the metric’s realwindow_size. It was clamped tomin(2000, points_shown / 2)and snapped to a step of 5, so any metric with a larger window (common for sub-hourly metrics — e.g. 4320 or 8640) showed a smaller, wrong value the slider couldn’t even reach, and Apply could silently shrink the metric’s window to the clamp. The slider now seeds the exact configured value (step 1) and raises its maximum to at least that value, so the preview computes — and Apply writes — the metric’s actual window.dtk tune: turning the Threshold slider now visibly widens/narrows the band. The chart fitted its y-axis to the confidence band, so a wider band grew the axis in lockstep and the corridor looked unchanged. The tuning chart now fits the y-axis to the data (new opt-inyFit: 'data'chart option; the read-only report keeps the band-inclusive fit), so threshold changes read at a glance. The landing playground is unchanged.dtk tune: a large metric window is now actually exercised in the preview. The default shown-point count is floored at a few windows’ worth of history (instead of collapsing toward the minimum for big windows), so the band reaches its real width instead of leaving almost no scored region.
- Detectors warn when the window is too small to fill a seasonality group.
A per-group correction engages only when the trailing window holds
min_samples_per_grouppoints sharing the current point’s seasonal key, which recur once per cardinality — so it needswindow_size ≳ min_samples_per_group × distinct_keys(hourlyhour⇒ ≳ 240). Below that the group silently falls back to the global band and the seasonality has no effect — easy to hit with the defaultwindow_size = 100. The windowed detectors (MAD / Z-Score / IQR) now log a one-time warning naming the group, its key count and the required window. dtk autotuneoffers a seasonality-fill window candidate. The window grid now includesmin_samples_per_group × cardinalitywhen the data carries seasonality columns (capped to the fold budget), so cross-validation can actually evaluate a window where a chosen seasonal grouping engages instead of one where it silently falls back to global. When even the largest fold-feasible window can’t fill the groups, the decision log says so.
[0.32.0] - 2026-06-25
Section titled “[0.32.0] - 2026-06-25”dtk tune: a Manual-bounds detector option. The detector picker now offers Manual alongside MAD / Z-Score / IQR. Selecting it swaps the windowed knobs for Lower bound / Upper bound sliders (seeded from the metric’s bounds, or the data’s p5/p95 band) so you can drag fixed thresholds against the real series and watch the flagged points — and the resulting alert count — update live. Apply writes a statelessmanual_boundsdetector back into the metric YAML (validated, previous version archived). The browser port is parity-checked against the PythonManualBoundsDetector(golden vectors).dtk tune: a Direction filter. A both / up / down control restricts which anomalies are shown and counted toward alerts — only spikes above the band (up), only drops below it (down), or both. It is a preview filter (seeded from the metric’s alertingdirection, withsamereading asany) that mirrors the alert direction policy without changing the band.
dtk tunechart + autotune incident labeler: overlapping x-axis date labels. For spans of roughly 3–6 months the adaptive time-tick picker fell into a gap (no sub-monthly step met the target count) and packed ~13 biweekly labels onto the axis, overlapping. The picker now escalates to calendar months/years at the right span, and both the main axis and the navigator strip thin any labels that would still collide (gridlines are unaffected).
[0.31.1] - 2026-06-25
Section titled “[0.31.1] - 2026-06-25”dtk tune: window size and half-life echo their wall-clock span. The Window size and Half-life sliders — both measured in points — now show the equivalent duration on the metric grid next to the point count (e.g.2000 · 83d 8hon a 1h metric), so “how much history is this window” and “how far back does the decay reach” read at a glance. Mirrors the existing “Points shown” trim echo. Display only — what Apply writes is unchanged.
[0.31.0] - 2026-06-25
Section titled “[0.31.0] - 2026-06-25”dtk tune: zoom, pan and a navigator on the chart. The interactive tuning chart is now navigable — scroll to zoom where you point, drag to pan, double-click to reset, and drag the navigator strip below the chart (the whole series in miniature, with the current-view window, the alert firings as red ticks, and a time axis). On a long, dense metric you can now zoom into a region to inspect alert quality instead of reading the whole series at once. Adaptive time gridlines now label both the chart and the strip.dtk tune: a “Points shown” trim slider. Above the chart, it shortens the active sample to the most-recent N points. Live recompute cost grows with points × window, so trimming a long series (e.g. 10k → 2k points) makes every knob-drag several times faster and the period easier to read. Trimming only affects the live view — it never changes what Apply writes.dtk tune: flexible seasonality groups. Each seasonality column is now assigned to a group (Off / G1 / G2 / …): columns in the same group are conjoined into one seasonal key, separate groups apply independent corrections. You can now express the fullseasonality_componentsgrouping (e.g. onedow×hourgroup plus a standaloneis_holiday), not only “all-separate” or “all-in-one”.dtk tune: chart legend, control tooltips and a recompute spinner. A legend labels the metric line / expected-range band / band center / anomalies / alert markers; every control carries an ⓘ tooltip explaining it; and a computing… spinner shows while a recompute is in flight (replacing the bare status text).- Autotune incident labeler: marked incidents now show on the navigator. The red incident bands you mark are drawn on the bottom navigator strip too — at a minimum width so even a single-point incident stays visible on a long span — and the strip gained a time axis. The main chart gained adaptive vertical time gridlines, so a point’s place in real time reads off the grid instead of only by chasing the cursor.
- Labeler x-axis date labels on high-DPR displays. The labeler’s bottom time
labels were positioned with a doubled
devicePixelRatiofactor, pushing them off-canvas on retina / 2× screens; they now sit correctly under the chart at any DPR.
[0.30.1] - 2026-06-24
Section titled “[0.30.1] - 2026-06-24”dtk tuneis now responsive on large metrics. It previously baked a metric’s entire history into the page and re-ran the client-side detector over all points on every knob change — on a metric with tens of thousands of points that made the page slow to load and froze the UI on every slider drag. Three changes fix it:- The detector now runs in a Web Worker (off the UI thread), so dragging a
slider never freezes the page no matter the point count or window size; a
computing…hint shows while a recompute is in flight and stale results are dropped. The worker runs the same parity-checked detector port, so results are unchanged. - Smart default point count — instead of a flat cap, the shown window is
sized inversely to the detector’s window (recompute cost is
points × window): small windows show up to ~15k points, large windows fewer, clamped to a render-comfortable range. A--from/--tospan is still honored in full. - The window-size slider is capped at half the shown points, the live recompute is debounced, and the CLI reports how many points it is tuning on.
- The detector now runs in a Web Worker (off the UI thread), so dragging a
slider never freezes the page no matter the point count or window size; a
dtk tuneno longer spewsxdg-openerrors when launching the browser on a headless / WSL box: the best-effort browser launch now silences its stderr, and the printed hint tells you to open the URL manually if no browser appears.
[0.30.0] - 2026-06-24
Section titled “[0.30.0] - 2026-06-24”dtk tune— interactive manual tuning that writes the config back into the metric. The human-in-the-loop sibling ofdtk autotune. It opens an interactive browser view of the metric’s real persisted series and lets you turn the detector’s knobs — type (MAD / Z-Score / IQR), threshold, window, recency weighting + half-life, detrend, smoothing, seasonality conditioning, and the alertconsecutive_anomalies— while the confidence band, flagged anomalies and would-fire alerts recompute live in the browser (the same faithful TypeScript detector port that powers the landing playground, fed the real series instead of synthetic data). Clicking Apply to metric writes the chosen config back into the metric YAML. Wheredtk autotunesearches automatically and writes a new__tuned_<id>.yml,dtk tuneis manual and edits the metric in place — the two are complementary paths to optimizing a metric. Delivery mirrors the autotune incident labeler: a localhost-only server with a one-shot token; nothing is exposed off the machine and nothing is written until you click Apply.- Safe write-back with a versioned config history. On Apply, the chosen
detector + params are validated through
MetricConfigand theDetectorFactorybefore anything is written (a broken or untunable config never lands, returning a 400 so you can fix the knobs and retry); the previous metric YAML is then archived verbatim undermetrics/.history/<metric>/<stamp>.yml(so the history of chosen parameters is trackable and the original — including its comments — is always recoverable); only then is the metric file re-emitted with the tuned detector.dtk tunetakes no pipeline lock (it only edits a config file); re-rundtk runafterwards to recompute detections under the new config (the live preview is the TS approximation, the next real run is the source of truth).dtk tune --no-servewrites a static, read-only preview HTML (sliders recompute live, no write-back). New top-leveldetectkit/tuning/package (build_tune_payload,render_tune_html,apply_tuned_config,serve_tuner); the renderer bundledetectkit/tuning/assets/tune.jsis built from the shared chart/detector core (website/scripts/gen-tune-bundle.mjs) and ships in the wheel.
[0.29.0] - 2026-06-24
Section titled “[0.29.0] - 2026-06-24”dtk run --report/dtk autotune --reportemit a self-contained HTML report. Each writes one offline HTML file per metric — values + per-detector confidence bands + flagged anomalies + the alerts that fired (anomaly / recovery / no-data) + a summary, with client-side period selection (24h / 7d / 30d / All, plus zoom/pan) and an alerts list (rule that fired, severity, duration). Nothing leaves the browser (inline JS, baked payload), so a user can see how a metric actually performed without standing up BI / SQL / a 3rd-party charting tool.--reportis dual-mode: bare--report→ default path (reports/<metric>.html; autotune:reports/<metric>__tuned_<id>.html),--report <dir>→<dir>/<metric>.html,--report file.html→ that file. The report reads the persisted_dtk_*tables, so even a--steps loadrun can produce one from whatever is stored. New top-leveldetectkit/reporting/package (build_report_payloadreads_dtk_datapoints+_dtk_detectionsand replays alerts into a JSON payload;render_report_htmlinlines the pre-built renderer bundledetectkit/reporting/assets/report.js+ the payload into one HTML file).- Alert replay reconstructs the alert/recovery/no-data timeline from persisted
detections. A new pure
AlertOrchestrator.replay(detections, value_at, start, end)(detectkit/alerting/orchestrator/_replay.py,ReplayedEvent) re-walks the real decision logic (quorum / consecutive / cooldown / recovery / no-data) over a historical period — no channel dispatch, no_dtk_alert_stateswrites, no wall-clock. This is how the report surfaces alerts, because_dtk_alert_statesis last-writer-wins state, not an event log. It reuses the existing decision/builder functions verbatim;_resolve_incidentgained an optional in-memoryrecords=parameter so recovery resolution stays DB-free during replay (the production path is unchanged). InternalTablesManager.load_detections(...)— a new reader returning flat per-(detector, timestamp) detection rows (detector_id/from_timestamp/to_timestampfilters,final_modifierfor correctReplacingMergeTreededup), parallel toload_datapoints. The report builder reads through it.- An interactive landing playground. The website (
website/) ships a client-side island where a visitor shapes a synthetic metric (seasonality/noise/trend/incident) and tunes the real detector (MAD/zscore/iqr, threshold, window, recency, detrend, smoothing, seasonality grouping,consecutive_anomalies) live — seeing the corridor, flagged points, the trailing window used to score each point, and whether an alert would fire, all in-browser with zero server compute. Its chart renderer is the same framework-free TypeScript core (website/src/scripts/core/canvas.ts) the HTML report uses; the report bundle is built from it bywebsite/scripts/gen-report-bundle.mjs(esbuild) intodetectkit/reporting/assets/report.js(a committed generated asset). The playground’s detector math is a TS port verified to exact parity against the Python detectors (website/scripts/check-demo-parity.mjs, golden vectors fromwebsite/scripts/gen-demo-golden.py).
[0.28.0] - 2026-06-24
Section titled “[0.28.0] - 2026-06-24”- Autotune searches the recency half-life. The grid search previously only
toggled recency weighting on/off at a fixed half-life; it now sweeps the
half-life (in points, as fractions of the window, floored at
min_samples/2) whenever exponential weighting is adopted. This lets the search pick a faster-forgetting baseline that tracks the current regime — the knob that matters on a metric that shifted level — instead of leaving it at the default. - The regime advisory names a concrete
--fromdate. TheREGIMEadvisory (0.27.0) now maps the detected level-shift index to the actual grid timestamp and suggests--from <YYYY-MM-DD>verbatim (e.g.--from 2026-05-22), instead of a generic “after the shift”. The scan runs NaN-aware on the raw grid so the index aligns with the timestamps. The boundary date is recorded asshift_atin the decision log. - The labeler persists its threshold-capture time window. The painted capture
window (the regime scope you drag on the chart) is now written to the saved
labels file as an optional
capture_windows:block and restored when you reopen the set — so the regime boundary you reasoned about is auditable and no longer lost between sessions. It is pure metadata: it never affects ground truth.
Changed
Section titled “Changed”- The cross-fold stability penalty is now downside-only. Candidate scoring was
mean(folds) - λ·std(folds);stdpenalized upside spread too, biasing the search against a regime-adaptive config that simply scores better on the recent regime than on stale history. It is nowmean - λ·downside_deviation(shortfalls below the mean only, averaged over all folds — always ≤ the old penalty), so an adaptive config is no longer punished for fold-to-fold variance that is actually improvement. The weight is exposed asautotune.stability_lambda(default0.5; set0.0to disable) for a metric whose behavior differs across a regime shift. Tuning scores shift slightly and some winners may change (detector identity is unaffected).
[0.27.0] - 2026-06-24
Section titled “[0.27.0] - 2026-06-24”- Autotune flags a hidden regime shift in the decision log. The trend gate
that drives window selection and the detrend toggle is a single midpoint-median
test, so it silently misses a level shift that sits off-center (both halves
straddle it, so their medians barely differ) or one large enough to inflate
the very MAD it is measured against — and then treats the series as
stationary, prefers the largest window, and lets the baseline quietly average
two regimes. A new scan (
detect_level_shift) checks every split point against the within-segment scale (which a true step does not inflate, unlike a smooth ramp); when the series reads stationary yet a large (≥3σ within-regime) level shift is present, the run emits aREGIMEadvisory — streamed live and rendered in the annotated config header and_dtk_autotune_runs.decision_log_json— pointing at where the shift sits and suggesting you narrow the window with--from(orautotune.max_history) and re-tune. Advisory only: it changes no chosen parameters. It detects level shifts, not pure variance/shape changes (those still need labeled incidents). See the autotune reference’s “Non-stationary metrics & regime shifts” note.
[0.26.1] - 2026-06-24
Section titled “[0.26.1] - 2026-06-24”Changed
Section titled “Changed”- Made the threshold-capture time window discoverable. The per-period window
(added in 0.26.0) was only reachable by dragging the chart, with no visible cue
— the reset button appeared only after a window existed. The threshold bar now
always shows the current scope (
period: current view — drag the chart to limit it, orperiod: <span>once set), and the on-chart readout promptsdrag the chart to pick a periodbefore a line is set. No behavior change.
[0.26.0] - 2026-06-24
Section titled “[0.26.0] - 2026-06-24”- Threshold capture can be scoped to a time window. Previously the labeler’s threshold capture scanned the whole series, so one boundary had to fit every period. Now it captures within the current view by default, and you can drag horizontally across the chart to paint a narrower capture window — the area outside dims, the dashed line spans only the window, and the readout shows its span. This lets a metric that behaved differently across history take a different above/below boundary per period. ↺ whole view clears the window; the existing flow is unchanged (a click sets the line, a horizontal drag sets the window).
[0.25.0] - 2026-06-24
Section titled “[0.25.0] - 2026-06-24”- The incident labeler can now open and edit an existing labels file.
dtk autotune --select <m> --labelseeds the page from the metric’s newest saved set inincidents/<m>/(or from--incidents <file-or-dir>when given), so labeling can grow across sessions — open, mark a few more, Save & tune writes the next version (history is still kept; nothing is overwritten). The static--no-servepage also gains an Import file… button that loads any labels file (YAML/JSON) you pick. The seed preserves each incident’slabel:description. - Threshold capture. When many outliers are obvious, set a horizontal line on the chart (hover, or type an exact line value), choose above / below, optionally bridge gaps ≤ N intervals, and Add N spans marks every qualifying contiguous span at once — instead of zooming in and dragging each. The normal click-drag flow is unchanged; threshold capture is a toggled mode.
- On-chart incident deletion. Each incident band carries a ✕ handle (top-right); the selected band also responds to the Delete/Backspace key, and Escape deselects. No more scrolling the list to find the one row to remove. Selecting a band highlights and scrolls to its list row; focus on a row jumps the chart to that incident (the list ↔ chart now highlight together).
- Favicon — the labeler page now uses the detectkit brand mark as its tab icon (inline SVG data URI, still fully self-contained).
Changed
Section titled “Changed”IncidentInterval/IncidentPoint(detectkit/autotune/labels.py) now carry an optionallabel, so parsing a labels file round-trips its descriptions; newincidents_to_display/load_incidents_for_displayhelpers render a file as labeler-seed dicts.render_labeler_html/build_label_server/serve_labelergain anincidents/preloadargument.
[0.24.2] - 2026-06-24
Section titled “[0.24.2] - 2026-06-24”dtk runnow detects on the first run of a detector that has nostart_time— everydtk autotune-generated config.DETECTbuilds its lower bound from--from, the resume point (last persisted detection), and the detector’sstart_timeparam. When all three were absent — exactly the case for a freshly-created tuned metric (no--from, no prior detections, and the emitter never wrotestart_time) — the lower bound was left unset and the step mistook “no lower bound” for “nothing to do”, printing “Nothing to detect (already up to date)” and writing zero detections. The alert step then reported “No recent detections found” and dashboards showed an empty detections chart, while loading worked normally.DETECTnow falls back to the metric’sloading_start_time(then its earliest stored datapoint) so the first run detects across all loaded history. Hand-written metrics that setstart_timewere unaffected, which is why this only bit autotuned configs.
Changed
Section titled “Changed”dtk autotunenow writesstart_timeinto the generated detector’s params (pinned toloading_start_time), so the emittedmetrics/<name>__tuned_<id>.ymlis explicit and self-sufficient — it detects correctly even on an older detectkit that lacks theDETECTfallback above.start_timeis execution-level and excluded from the detector-id hash, so it never changes detector identity or forces recomputation.
[0.24.1] - 2026-06-24
Section titled “[0.24.1] - 2026-06-24”Changed
Section titled “Changed”dtk init-claude’s managedCLAUDE.mdblock is now version-less. The<!-- BEGIN detectkit … -->marker no longer embeds the detectkit version, so re-running after an upgrade is a true no-op unless the shipped guidance actually changed. Previously every release rewrote the marker (the version moved), which reported the block asupdatedand nudged users to re-run for nothing. Existing versioned markers (e.g.<!-- BEGIN detectkit v0.23.2 … -->) are still matched and refreshed in place, so upgrades stay seamless.
- Corrected the shipped
dtk init-claudeAI-assistant reference. Thecli.mdrule described metric-name selection as “searches the rootmetrics/dir only”; it actually resolvesmetrics/<name>.ymlat the root and then falls back to a recursive search by the YAMLname:field in any subdirectory. It also called--stepsa “subset/order” of stages — the steps always execute inload → detect → alertorder regardless of how they are listed. Thedtk-autotuneskill suggested an invalid--scoring recall; the valid scoring metrics aremcc,f1,f_beta,balanced_accuracy,roc_auc,pr_auc.
[0.24.0] - 2026-06-24
Section titled “[0.24.0] - 2026-06-24”dtk autotuneno longer emits an invalid config for metrics whose seasonality comes from the query. When a metric sources seasonality viaquery_columns.seasonality(custom columns such asleague_day), the tuner could pick a grouping over those columns and then duplicate them into the top-levelseasonality_columnsfield — which is validated against the built-in allowlist (hour,day_of_week, …) and is ignored by the loader in that mode. The result was aMetricConfigvalidation error and no tuned config written (0 succeeded). The emitter now keeps query-provided seasonality columns inquery_columnsonly; the chosen grouping still rides in the detector’sseasonality_components, so detection behavior is unchanged.
Changed
Section titled “Changed”- The labeler names exported/saved files after the metric, with the optional set
name folded in as a suffix:
<metric>[-<set>]-<UTC>.yml(e.g.api_error_rate-outage-20260624T010252Z.yml, orapi_error_rate-<UTC>.ymlwith no set name). Previously a typed set name replaced the metric name in the filename; now it is always appended, so every labeling round stays grouped under the metric.
[0.23.2] - 2026-06-24
Section titled “[0.23.2] - 2026-06-24”- The labeler shows the metric’s sampling interval as a highlighted chip next
to the metric name (e.g.
interval 1h) — the point spacing, taken straight from the metric (inferred from the series when not provided).
[0.23.1] - 2026-06-24
Section titled “[0.23.1] - 2026-06-24”- Live time readout while editing an incident in the labeler. Dragging an
incident’s edge now shows
start/end: <old> → <new>, and creating or moving a band shows the resulting<start> → <end>, so you can place a boundary on an exact timestamp.
[0.23.0] - 2026-06-24
Section titled “[0.23.0] - 2026-06-24”- One-command interactive labeling → tuning.
dtk autotune --select <m> --labelnow launches a small local labeler server (127.0.0.1, one-shot token), opens the browser, and on Save & tune writes a versioned labels file straight intoincidents/<m>/and continues into the tuning run on it — no manual file shuffling.--no-servekeeps the old static-HTML-download behavior;--no-openprints the URL instead of launching a browser. - Per-incident descriptions and named label sets in the labeler — the
description exports as the canonical
label:; the set name becomes the versioned filename<name>-<UTC>.yml. - Edit existing incidents on the chart — drag an incident’s edges to adjust its bounds, or its middle to move it (visible edge handles + resize cursor).
- Choose among saved label sets at tune time. When
--incidentspoints at a directory with multiple versions and the terminal is interactive, you’re prompted to pick one (default: newest); non-interactive runs use the newest.
Changed
Section titled “Changed”- Examples no longer use a real production metric name. The labeler demo (and
shipped example) now uses a generic
api_error_ratewith realistic error-rate numbers instead ofsessions_per_visitor_avg.
[0.22.0] - 2026-06-24
Section titled “[0.22.0] - 2026-06-24”Changed
Section titled “Changed”- Interactive incident labeler (
dtk autotune --label) overhauled. The self-contained HTML chart is now zoomable/pannable so narrow incidents are markable even on a long span with a small step: scroll to zoom at the cursor, double-click to reset, and a navigator strip below the chart to move the view (drag the window to pan, drag its edges to stretch/squeeze). Large series stay fast and spike-faithful via min/max decimation. Each incident now takes an optional description, exported as the canonicallabel:field. Restyled on the detectkit brand (palette/fonts/logo, axes, hover tooltip, live summary). - Versioned, never-overwriting exports. Export downloads a timestamped file
<metric>-<UTC>.yml(a browser can’t write to the project), so keep every labeling round underincidents/<metric>/.
- Directory-aware label resolution.
--incidents(andautotune.labels_file) may point at a directory; the newest versioned file in it is used —dtk autotune --select <m> --incidents incidents/<m>/always tunes on the latest labels while the full history stays on disk. - Landing + docs showcase the labeler with a live, embedded demo generated
from the real template (
website/scripts/gen-labeler-example.py).
[0.21.0] - 2026-06-24
Section titled “[0.21.0] - 2026-06-24”Changed
Section titled “Changed”dtk autotunenow works well out of the box without labels — every stage of the unsupervised pipeline was reworked so the no-label baseline is good on its own (labels remain a bonus that further improves it). This recomputes tuned configs; per detectkit’s policy that is acceptable. Specifically:- Seasonality selection is decoupled from the flag-objective. The old probe
scored a candidate grouping with the same low-flag-rate objective used for
detection, which is structurally biased against seasonality (finer groups →
tighter bands → more flags → worse score), so genuinely seasonal metrics were
rejected with “chose none”. It now uses a leak-free, walk-forward,
band-width-aware Gaussian-NLL probe (
oof_residual_reduction) that measures how much conditioning on a seasonal key tightens the per-group center/scale the detector actually applies, evaluated on held-out folds. Over-fragmented groupings fall back to global and can’t win mechanically; the no-seasonality baseline scores 0; a move is accepted only on a margin and an improvement in the majority of folds. - The unsupervised detector objective now rewards a tight confidence
interval.
unsupervised_objectiveis now0.4·budget + 0.3·sharpness + 0.3·separation: a smooth flag-rate budget (no flat cliff; one-sided so a clean metric isn’t pushed to flag), sharpness (rewards a narrow, well-calibrated band — the old ratio-only objective was scale-invariant and blind to band width), and separation. All-suppress no longer sits at a timid0.6plateau — it scores onlyw_budget, so a tight band that isolates real extremes strictly beats doing nothing. - Detector selection no longer excludes a type by heuristic. The distribution suitability vote is now advisory (it only orders the candidates); the grid search evaluates all windowed statistical detectors and lets cross-validation pick the winner.
- Grid search fixes the threshold↔window coupling with a final threshold re-sweep at the chosen window, and the threshold grid gained high “near-suppress” rungs (5/6σ, 4/6 Tukey) so a heavy-tailed metric can widen the band under the budget instead of being trapped flagging its tail.
- Window selection is trend-gated: stationary series still prefer the larger window, but under a trend / regime shift the tie-break now prefers the smaller window (a fresher baseline) instead of averaging in stale history.
- Seasonality selection is decoupled from the flag-objective. The old probe
scored a candidate grouping with the same low-flag-rate objective used for
detection, which is structurally biased against seasonality (finer groups →
tighter bands → more flags → worse score), so genuinely seasonal metrics were
rejected with “chose none”. It now uses a leak-free, walk-forward,
band-width-aware Gaussian-NLL probe (
- Honest unsupervised header. Emitted tuned configs (and the CLI log) no
longer label an unsupervised run’s score as
mcc = …(it never computed MCC); they readObjective : unsupervised (band-fit + flag-budget) = ….
autotune.force_seasonality— pin the seasonality grouping (a column or a conjunctive[col, col]group) and skip the search, for experts who already know a metric’s seasonality. Complementsseasonality_candidates, which only restricts the search.- Per-candidate transparency in the seasonality decision log — each tested
component now records its held-out residual reduction (e.g.
hour:5.70, day_of_week:-0.00), so a “chose none” is never opaque.
[0.20.0] - 2026-06-23
Section titled “[0.20.0] - 2026-06-23”dtk initnow scaffolds anincidents/directory besidemetrics/, with a commented example labels file (incidents/example_cpu_usage.yml) and a commentedautotune:block in the example metric. This makes the documentedincidents/<metric>.ymlconvention for superviseddtk autotuneready to fill in on a fresh project.- Inline incidents on the
autotune:block. Labeled incidents can now be declared directly in a metric config viaautotune.incidents(the same{start, end}/{at}entries as a labels file) plus an optionalautotune.incidents_timezone, as an alternative toautotune.labels_file— handy for a metric with one or two known incidents.incidentsandlabels_fileare mutually exclusive (validated at config load). Label resolution precedence is now:--incidentsflag →labels_file→ inlineincidents→ interactive prompt → none (unsupervised).
Changed
Section titled “Changed”dtk init-claudecontext now recommends (optionally) giving the assistant read access to the database — e.g. a database MCP — so it can inspect series, find incidents to label, and verify queries itself. Made explicit that detectkit’s pipeline never needs an MCP (it connects via its DB drivers); the access is an assistant convenience, not a runtime requirement.
[0.19.0] - 2026-06-22
Section titled “[0.19.0] - 2026-06-22”dtk autotune— automatic detector configuration. A new pipeline that, given a metric’s loaded datapoints (and optionally labeled incidents), automatically chooses the seasonality grouping, detector type, hyperparameters and history window, cross-validates the choice, and writes a ready-to-run, fully annotated config named<metric>__tuned_<id>. The comment header walks every decision (seasonality, detector votes, grid-search winner + CV score, window). It reads_dtk_datapoints, never edits the original config and never sends alerts.- Seasonality is greedily searched over the metric’s columns; the
detector type is chosen by a distribution decision tree that votes per
seasonality group (Gaussian →
zscore, heavy-tailed/outliers →mad, skewed →iqr); hyperparameters come from a bounded coordinate grid search; the history window prefers more context on near-ties. - Supervised tuning scores against a labels file (
--incidents, YAML/JSON of incident intervals/points); with no labels it falls back to an unsupervised objective (low false-positive rate + cross-fold stability). Cross-validation is automatic walk-forward folds — no split ratios to set. - Scoring metric defaults to MCC (uses the whole confusion matrix,
robust to rare anomalies); configurable via
--scoring(f1/f_beta/balanced_accuracy/roc_auc/pr_auc). --labelemits a self-contained HTML chart to mark incidents visually and export a labels file.--dry-runsearches without writing anything.
- Seasonality is greedily searched over the metric’s columns; the
detector type is chosen by a distribution decision tree that votes per
seasonality group (Gaussian →
_dtk_autotune_runsinternal table. One row per autotune run (inputs + outputs: training period, labels, scoring metric, chosen seasonality/detector/ params, CV score, decision log, generated config). An audit trail — created byensure_tables(), never read by the pipeline and never pruned bydtk clean --orphaned-metrics.- Optional
autotune:block on a metric config. Lets experts constrain the search (restrict detector types / seasonality columns, pin hyperparameters, set the scoring metric, point at a labels file, cap history/folds). Fully optional — absent means fully automatic. dtk init-claudeships adtk-autotuneskill +autotune.mdrule. The skill drives the whole flow conversationally — seasonality interview, writing the labels file from the user’s words, runningdtk autotune, presenting the annotated result, and generating a per-backend DB query to inspect the tuned detector’s behavior — including the “build a working alert from a request” hand-off todtk-new-metric.
[0.18.0] - 2026-06-21
Section titled “[0.18.0] - 2026-06-21”Changed
Section titled “Changed”- Default
half_lifeis now floored atmin_samples / 2(windowed detectors: mad/zscore/iqr). Whenwindow_weights: exponentialis set withhalf_lifeunset, the default waswindow_size / 20unconditionally. On the default 100-point window that resolved to5points — an effective (Kish) sample size of ~14, more aggressive than the legacyweight_decay=0.95default (~13.5 points, ESS ~38) that this very feature was redesigned to avoid. The default is nowmax(window_size / 20, min_samples / 2, 1):- It keeps the
window/20adaptation horizon the large-window trending recipe is tuned for (window8640→432points ≈"3d"). - On small/default windows the
min_samples / 2floor keeps the effective weighted sample size at parity with the rawmin_samplesgate (window100,min_samples=30→15points, ESS ~42), instead of silently honoring only half of it. - Only affects detectors that set
window_weights: exponentialand leavehalf_lifeunset; an explicithalf_life(orweight_decay) is unchanged.
- It keeps the
ALGORITHM_VERSIONof the windowed detectors bumped to v3. Because the resolved default changes the confidence bounds for the same config, the detector IDs change so affected detections recompute cleanly under the new id rather than mixing two regimes in_dtk_detections(same mechanism as the v1→v2 bump). Detections for all windowed detectors recompute on the next run.
[0.17.0] - 2026-06-21
Section titled “[0.17.0] - 2026-06-21”- Alert messages now answer “how long has this been going on?” Every
default-rendered anomaly leads with a plain-language sentence —
Anomalous for 2h 30m — 15 consecutive 10min intervals.— surfacing the metric interval, the true consecutive streak length, and the wall-clock duration. NewStarted/Latestfields bound the problematic span. Recovery alerts are symmetric:Incident lasted 2h 30m (…)withStarted/Cleared.- The true streak length and onset are resolved only when an alert
fires/clears —
_decision.py(_resolve_streak) and_recovery.py(_resolve_incident) look back over the detection history (bounded bySTREAK_LOOKBACK_POINTS, default 1000) and re-walk the same direction-aware quorum logic. A run older than the window renders asover …. The hot no-alert path issues no extra query. - New
AlertDatafieldsinterval_seconds/onset_timestamp/streak_capped;consecutive_countnow carries the true streak length (no longer capped at the rule threshold). New template variables:{anomaly_lead}/{recovery_lead}/{interval_display}/{duration_display}/{onset_display}/{started_display}/{window_line}. Newdetectkit.utils.datetime_utils.format_duration.
- The true streak length and onset are resolved only when an alert
fires/clears —
Changed
Section titled “Changed”- Uniform message order:
description → Rule → Value/Expectedon every channel and for both anomaly and recovery. Previously the anomaly message led with the Rule chip (description below it) while recovery led with the description; now both lead with the description and place the Rule chip right above the value/expected evidence it explains. - The default anomaly/recovery text templates and the webhook / Telegram /
email native layouts were reworked to the new lead +
Started/Latestfields and now also show Quorum on Telegram and email (previously webhook-only). The webhook/email Detected at field is replaced by theStarted→Latest(orCleared) pair. dtk test-alertpreviews now carry the incident-timing fields, so the mock matches what a real firing renders.
- Custom templates keep working unchanged; the new placeholders are additive.
Direct-API callers that don’t set
interval_secondsfall back to the previousLatest X/Y consecutive points met the quorum.lead.
[0.16.4] - 2026-06-20
Section titled “[0.16.4] - 2026-06-20”-
Sync the user-facing docs (
docs/) and the README with the 0.15–0.16 alerting changes — docs only, no code or behavior change:docs/guides/configuration.md— corrected thealert_help_urlper-channel rendering. The webhook “How to read this alert” link was still described as a bottom attachment field showing the bare URL; since 0.16.1 it renders as a compact clickable label in the sharedLinksfield (Slack<url|label>/ Mattermost-generic markdown), never a raw URL.docs/guides/alerting-no-data-errors.md— the no-data template-variable table now lists{project_name}/{project_name_prefix}(0.15.0) and{help_url}/{help_line}(0.16.0), matching the error-alert table; the Visual Distinction note now leads with the 🟡 status circle instead of only the amber accent color.docs/guides/reading-alerts.md— the stakeholder “Anatomy of an alert” table gains a Rule row describing the rule chip set apart on every anomaly and recovery since 0.16.3.docs/guides/configuration-metrics.md—linksnow notes the compact-label webhook rendering (0.16.1), and the{help_url}/{help_line}template variables are documented (set project-wide viaalert_help_url).README.md— added the new Reading Alerts stakeholder guide to the documentation list.
The
dtk init-claudeassets and dev rules were already current; this only brings the docs site and README in line.
[0.16.3] - 2026-06-20
Section titled “[0.16.3] - 2026-06-20”Changed
Section titled “Changed”- The firing rule is set apart consistently in every channel. On anomaly and
recovery alerts the configured rule now renders as a bold Rule label
followed by an inline-code chip (
min_detectors=… · direction=… · consecutive=…), with the quorum explanation on its own line — so the rule reads as “this is the config that fired” at a glance instead of running into the surrounding prose. Applied across all default-rendered channels and to both alert kinds:- Slack / Mattermost / generic webhook — bold label is platform-aware
(
*Rule*on Slack mrkdwn,**Rule**on Mattermost/generic CommonMark, via the newWebhookChannel._bold); the backtick code chip renders identically on both. - Telegram — the rule line changed from italic (
<i>Rule: …</i>) to<b>Rule</b> <code>…</code>. - Email — previously had no explicit rule line (the rule was buried in
prose); it now renders the same bold-label + monospace chip (
_rule_html), matching the other channels. - The landing-page channel previews were updated to match. Custom templates and the plain-text fallback bodies are unchanged.
- Slack / Mattermost / generic webhook — bold label is platform-aware
(
[0.16.2] - 2026-06-20
Section titled “[0.16.2] - 2026-06-20”dtk test-alertpreview now matches a real firing. The preview was built without the project-name[name]prefix thatdtk runstamps on every alert (since 0.15.0), so a preview on a shared multi-project channel read🔴 Alert: <metric>while the real alert read🔴 [Kiss 1] Alert: <metric>.create_mock_alert_data()now threadsproject_namefromdetectkit_project.ymlonto the mockAlertData, matching the run pipeline (_alert_step.py).dtk test-alertresolves the metrics directory frompaths.metrics. It read the deprecated top-levelmetrics_pathkey (ignored byProjectConfig), so a project that customizedpaths.metricscouldn’t find its metrics fromtest-alert— it only worked when the dir happened to be the defaultmetrics. Closes #13.
[0.16.1] - 2026-06-20
Section titled “[0.16.1] - 2026-06-20”Changed
Section titled “Changed”- Webhook links render as compact clickable labels, not raw URLs. On
Slack / Mattermost / generic webhook,
dashboard_url,links, and the “How to read this alert” guide now share one compactLinksfield of clickable labels (Dashboard · Runbook · How to read this alert) instead of printing full URLs on their own lines. A real dashboard URL (e.g. Grafana with many template variables) can be a paragraph long; hiding it behind its label keeps the alert readable. Links use each platform’s native syntax — Slack<url|label>, Mattermost/generic markdown links (detected from the webhook host) — via the newWebhookChannel._link_markup. The clickable attachment title (title_link→dashboard_url) and the Telegram/email link rendering are unchanged. The landing-page channel previews were updated to match.
[0.16.0] - 2026-06-20
Section titled “[0.16.0] - 2026-06-20”- “How to read this alert” link on every alert. Every default-rendered alert
(anomaly, recovery, no-data, error) on every channel now carries a link to
a plain-language guide explaining what the alert is and how to interpret it —
so non-operator stakeholders (PMs, analysts, on-call) who see a notification
can self-serve instead of asking what it means. It points at the new
Reading an alert docs page by
default.
- New stakeholder docs page (
docs/guides/reading-alerts.md, rendered at/guides/reading-alerts/): a 10-second TL;DR and status-color key for non-technical readers, then an alert anatomy (value vs expected, severity, quorum, consecutive) for analysts who want the detail. - Per-channel rendering: Slack / Mattermost / webhook get a bottom
“How to read this alert” attachment field (bare URL, auto-linkified);
Telegram appends it to the links line; email adds a clay footer link
(
Sent by detectkit · <project> · How to read this alert →). - Configurable per project via
alert_help_urlindetectkit_project.yml(tri-state): unset → the official guide (default); a URL → your own runbook/wiki;false→ hide the link. Resolved byProjectConfig.resolve_alert_help_url()and stamped ontoAlertData.help_urlby the orchestrator (and the project-level error-alert path). - Templates: exposed as
{help_url}(raw URL, empty when unset) and{help_line}(How to read this alert: <url>), mirroring the existing{dashboard_url}/{dashboard_line}. Direct library/API callers that don’t sethelp_urlrender unchanged.
- New stakeholder docs page (
[0.15.0] - 2026-06-20
Section titled “[0.15.0] - 2026-06-20”- Project name on every alert. The project name (
detectkit_project.yml→name) is now stamped onto every alert the pipeline sends and shown by default, so two detectkit projects pointed at the same channel stay distinguishable while both keep the default brand bot name + avatar (users no longer have to overrideusername/icon_urljust to tell projects apart).- Title / headline / subject of every alert kind (anomaly, recovery,
no-data, error) leads with a
[name]prefix:🔴 [payments] Alert: api_error_rate. - Slack / Mattermost / webhook also pair it in the attachment footer
(
detectkit · payments). - Telegram carries it in the bold headline (it has no footer or per-message avatar to brand).
- Email prefixes the subject, adds a small project eyebrow above the
metric, and pairs it in the footer (
Sent by detectkit · payments). - Exposed to custom templates everywhere as
{project_name}and{project_name_prefix}(previously only populated for project-level error alerts).AlertData.project_nameis threaded fromProjectConfig.namethrough the orchestrator (_alert_step→AlertOrchestrator); direct library/API callers that don’t set it render unchanged. - The project
nameremains informational only — it keys no_dtk_*table — so it can be renamed freely (spaces allowed for a prettier label likename: "Payments API").
- Title / headline / subject of every alert kind (anomaly, recovery,
no-data, error) leads with a
[0.14.0] - 2026-06-20
Section titled “[0.14.0] - 2026-06-20”dtk-feedbackskill shipped bydtk init-claude. When adtkcommand fails or behaves unexpectedly, the user wants a feature, or has feedback, the assistant can file it as a GitHub issue on the upstream repo (alexeiveselov92/detectkit). The skill rules out local config problems first, auto-collects diagnostic context (detectkit/Python/OS versions, backend type, command + traceback, a minimal redacted repro), strips every secret, searches for duplicates, and never submits without explicit confirmation — using theghCLI when available, or a prefilled “new issue” URL as a fallback. Filed issues carry avia:assistantattribution (a body marker, and the label when the maintainer has created it) so the assistant funnel can be triaged. Surfaced across the docs (theCLAUDE.mdblock,docs/reference/cli.md, the README feature list, the getting-started “Getting Help”/“AI Onboarding” sections, and the landing page).
[0.13.1] - 2026-06-20
Section titled “[0.13.1] - 2026-06-20”- Sync the
dtk init-claudeAI-context assets and the dev rules with the 0.13.0 alerting redesign: document the colored status circle that leads every alert title (🔴 anomaly / 🟢 recovery / 🟡 no-data / 🔵 pipeline error), correct the stale “stop error” wording, coverbuild_context+ native rendering in the add-a-channel guide, and surfacedashboard_urlin the metric example. Docs/assets only — no code or behavior change.
[0.13.0] - 2026-06-20
Section titled “[0.13.0] - 2026-06-20”- Rich, platform-native alert rendering. Every channel’s default message is
now laid out using that platform’s own rich primitives instead of a flat text
block — the alert still leads with the rule that fired, but the evidence reads
cleanly at a glance.
- Slack / Mattermost / generic webhook build a single message attachment
with the status-colored accent bar, a clickable title, a short markdown
lead, and a compact fields grid (Value / Expected / Quorum / Severity,
then full-width Detected-at / Detectors / Parameters), branded with a
footer+footer_icon. Mentions now ride in the top-level message text so they reliably notify on Slack. A customtemplatestill renders as a plain text attachment (color/title/branding preserved). - Telegram now defaults to
parse_mode: HTMLand sends a structured, HTML-escaped message with a colored status dot, bold headline and<code>evidence. This fixes silent delivery failures: the legacyMarkdownmode raised “can’t parse entities” on detector params JSON containing underscores (e.g.window_size). - Email ships a fully branded HTML card (inline-CSS, table-based, Outlook-safe) — a colored accent + status pill, the metric, a 2-column value/expected/severity table, a monospace params box and a footer. The plain-text part remains the fallback.
- Slack / Mattermost / generic webhook build a single message attachment
with the status-colored accent bar, a clickable title, a short markdown
lead, and a compact fields grid (Value / Expected / Quorum / Severity,
then full-width Detected-at / Detectors / Parameters), branded with a
- First-class dashboard / runbook links. New
dashboard_urlandlinksfields on a metric’salerting:config attach actionable links to every alert: a clickable attachment title on Slack/Mattermost, an inline link on Telegram, and an Open dashboard button in email.{dashboard_url}is also available to custom templates, and{dashboard_line}is appended to the default plain-text templates.
Changed
Section titled “Changed”- Colored status circle leads every alert. Titles and headlines now open
with a status dot — 🔴 anomaly, 🟢 recovery, 🟡 no-data, 🔵 pipeline error —
so the status reads at a glance from color alone (replaces the previous
⚠/✅glyphs in the default titles, bodies and email subject). - Telegram default
parse_modeis nowHTML(wasMarkdown). Custom Telegram templates are sent verbatim under the configured parse mode, so they should be HTML-safe; setparse_mode: Markdownon the channel to keep the old behavior. - The shared message-context builder (
BaseAlertChannel.build_context) is now the single source of the values used by both templates and native rendering, so chat, email and the website preview stay consistent.
[0.12.0] - 2026-06-20
Section titled “[0.12.0] - 2026-06-20”- Branded alert bot identity by default. Every alert channel now leads with
the detectkit brand — display name and avatar — instead of the old
:warning:emoji, so notifications are instantly recognizable. The defaults live indetectkit/alerting/channels/branding.py(BRAND_USERNAME,BRAND_ICON_URL) and remain fully overridable per channel.- Slack / Mattermost / generic webhook send the brand avatar as
icon_url(a PNG served from the docs site athttps://dtk.pipelab.dev/bot-icon.png). Newicon_urlparameter for a custom avatar image;icon_emojistill works to use an emoji instead. Icon precedence:icon_urlwins overicon_emoji, and setting either opts out of the brand avatar. - Email sends as
detectkit <from_email>(newfrom_nameparameter, defaultdetectkit) and now ships a multipart HTML body with the brand logo in the header — the plain-text body remains the fallback. - Telegram shows the bot account’s own avatar (set in @BotFather, not
per-message), so it can’t be overridden by detectkit; the docs explain how
to brand it with
/setuserpic. - New brand asset
website/public/bot-icon.png, generated from the logo geometry bywebsite/scripts/make-bot-icon.mjs.
- Slack / Mattermost / generic webhook send the brand avatar as
Changed
Section titled “Changed”- Default webhook/Slack/Mattermost bot name is now
detectkit(wasdetectk) and the default icon is the brand avatar (was the:warning:emoji). Channels that explicitly setusername/icon_emojiare unaffected. Sent webhook payloads now includeicon_url(oricon_emojiwhen configured) rather than always sendingicon_emoji.
[0.11.0] - 2026-06-20
Section titled “[0.11.0] - 2026-06-20”- PostgreSQL and MySQL are now fully supported backends. detectkit’s
database-agnostic architecture is realized end to end: ClickHouse, PostgreSQL
(12+) and MySQL (8.0+) all run the complete
load → detect → alertpipeline. Only the connection and the SQL dialect of your metric queries differ — detectors, alerting, the CLI and the project layout are identical.PostgresDatabaseManager(detectkit[postgres], psycopg2) — connects to adatabaseand stores tables in schemas (CREATE SCHEMA IF NOT EXISTS).MySQLDatabaseManager(detectkit[mysql], pymysql) — uses databases (CREATE DATABASE IF NOT EXISTS); requires MySQL 8.0+.- Both share a new
SQLDatabaseManagerbase that renders DDL with an enforcedPRIMARY KEY, maps the abstract column types per dialect, and reproduces ClickHouse’sReplacingMergeTreelast-writer-wins dedup with a version-aware upsert (ON CONFLICT DO UPDATE/ON DUPLICATE KEY UPDATE).
dtk init --db-type {clickhouse,postgres,mysql}scaffoldsprofiles.ymland the example metric query for the chosen backend (default:clickhouse).- New
databaseprofile field — the connect-target database, required for PostgreSQL (the database inside which the schemas live). - Per-database documentation — a new Databases section in the docs
(overview + ClickHouse / PostgreSQL / MySQL pages) covering install extras,
profiles.ymlshape, connection fields and SQL dialect per backend; plus a “Works with” database badge row on the landing page.
Changed
Section titled “Changed”- The shared
InternalTablesManagerlayer is now genuinely backend-neutral: a genericdelete_rows()primitive and afinal_modifierdedup-read hook replace the ClickHouse-onlyALTER TABLE … DELETE/FINAL/count()SQL that previously leaked throughexecute_query.TableModelgained an explicitversion_column. ClickHouse behavior is unchanged. ProfileConfig.create_manager()no longer raisesNotImplementedErrorforpostgres/mysql.
[0.10.0] - 2026-06-19
Section titled “[0.10.0] - 2026-06-19”-
dtk init-claude— AI-native onboarding. A new command that scaffolds Claude Code context into the folder holding your detectkit project(s), so an assistant can natively help you build and operate metrics, detectors and alerts. It writes:CLAUDE.md— created if absent, otherwise a managed detectkit block is injected/refreshed between<!-- BEGIN detectkit … -->/<!-- END detectkit -->markers (your own content is preserved)..claude/rules/detectkit/— reference docs the assistant reads on demand (overview,cli,project,metrics,detectors,alerting)..claude/skills/— skills that scaffold work:dtk-setup-project(first-time DB/channel setup) anddtk-new-metric(a validated metric YAML).
The content ships with the package and tracks the installed version, so re-run
dtk init-claudeafter upgrading to refresh it. The operation is idempotent. The canonical source lives indetectkit/cli/assets/claude/and is kept in sync with the user docs on every release. -
dtk-setup-projectskill (shipped bydtk init-claude): an interactive, database-type-aware setup that gathers your real connection details, points the profile at your database, optionally configures a first alert channel, and verifies with a non-destructive--steps loadrun. Surfaced at the top of the Quickstart and in thedtk init-claudereference. -
Visualizing results guide (
docs/guides/visualizing-results.md): BI-tool-agnostic and database-agnostic SQL recipes for charting the_dtk_*tables (value + confidence band, anomaly markers, anomaly counts, latest-value stat, multi-detector comparison, severity breakdown) in Grafana, Superset, Metabase, Tableau, or plain SQL. -
Developer docs rendered on the site under a “For developers” section (architecture, contributing, design & brand), single-sourced from
.claude/rules/so they double as in-repo AI-assistant context.
dtk initnow scaffolds a runnable, schema-correct project. The generated configs carried keys the loader silently ignores or the channels reject:profiles.ymlsetdatabase:on each profile — not a real field, sointernal_database/data_databasestayed unset and the firstdtk runaborted withinternal_database must be set for ClickHouse. Thedevprofile now sets both locations and is runnable against a local ClickHouse.- the
mattermost_alertschannel seticon_url, which the Mattermost channel rejects (Invalid parameters for mattermost channel) the moment it is built (e.g. ondtk test-alert); replaced with the supportedicon_emoji. detectkit_project.ymlused flatmetrics_path:/sql_path:keys instead of the nestedpaths:mapping the model expects (silently dropped).- the commented generic-webhook example used
url/method/headersinstead of the realwebhook_url/extra_headers(also corrected in thedtk init-claudeproject rules).
Changed
Section titled “Changed”- Example ClickHouse
hostin the shippeddtk init-clauderules/skill and in the profiles docs is now a neutral placeholder (clickhouse.example.com) instead of a sample IP address.
[0.9.0] - 2026-06-19
Section titled “[0.9.0] - 2026-06-19”Changed
Section titled “Changed”- Alert messages are now alert-centric, not anomaly-centric. The default
notification leads with the alert and the parameters it fired with — the
quorum/direction/consecutive rule — and shows the triggering anomaly as
supporting evidence below. This reflects the library’s model: the alert is
the primary entity, and an anomaly is a secondary signal the rule interprets
(a detector anomaly can mean very different things under different
min_detectors/direction/consecutive_anomaliessettings). The old"Anomaly detected in metric: …"body and"Anomaly detected: …"/"Metric recovered: …"titles become:- Anomaly: title
⚠ Alert: <metric>; body showsQuorum <actual>/<required> · direction <observed> (policy <configured>) · consecutive <actual>/<required>, aRule:line restating the configured thresholds, then the latest point (time / value / expected range / severity) and the detectors + params as evidence. - Recovery: title
✅ Alert cleared: <metric>; body states the alert condition no longer holds and echoes the same rule. Custom templates are unaffected — every previous template variable still works.
- Anomaly: title
- New alert template variables that surface the rule the alert fired with:
{min_detectors},{direction_policy},{consecutive_required}(the configured thresholds) and{detector_count}(observed detectors that agreed). Plus{expected_range}, a one-sided-aware expected band that renders one-sided detector bounds cleanly —>= 7.00for a lower-onlymanual_boundsinstead of the confusing[7.00, nan]. AlertDatanow carries the alert-rule fields (min_detectors,direction_policy,consecutive_required,detector_count); the orchestrator fills them from the alert config’sAlertConditions, anddtk test-alertpreviews them using the metric’s own alert rule.
[0.8.2] - 2026-06-15
Section titled “[0.8.2] - 2026-06-15”Changed
Section titled “Changed”- Unified CLI output style.
dtk cleananddtk unlocknow render in the same tree layout (┌─ / │ / └─) as thedtk runpipeline steps, instead of each command’s own ad-hoc formatting. Per-metric findings appear as child lines under a cyan metric header; metrics with nothing to do show a single•line; per-metric errors use✗; each run ends with a cyan-boldDone. …summary. Shared helpers live indetectkit/cli/_output.py.
[0.8.1] - 2026-06-15
Section titled “[0.8.1] - 2026-06-15”--select "*"(and other glob selectors) no longer crash on.gitkeepor non-YAML files. The glob branch of metric selection passed rawglob()results — including the.gitkeepstubdtk initcreates, stray files, and directories — straight to the YAML parser, sodtk run/unlock/clean --select "*"failed withEmpty metric config file: .../metrics/.gitkeep. Glob results are now filtered to.yml/.yamlfiles. Additionally,--select "*"now resolves recursively so metrics in subdirectories are included (previously it expanded to a non-recursivemetrics/*and silently skipped them).
[0.8.0] - 2026-06-15
Section titled “[0.8.0] - 2026-06-15”dtk cleancommand — prune internal data that no longer matches the project’s YAML configs, the rows left behind when metrics are edited on production. Two modes, both dry-run by default (--executeto apply):dtk clean --select <selector>removes_dtk_detectionsrows whosedetector_idis no longer produced by the config (a detector parameter orseasonality_componentschanged, or the detector was removed) and_dtk_alert_statesrows whosealert_config_idis no longer produced (an alerting block’s functional fields changed, or the block was removed). Valid hashes are recomputed with the same functions the pipeline uses, so pruning stays in lockstep with detection/alerting. Datapoints are not touched (they are keyed only by timestamp).dtk clean --orphaned-metricspurges all rows, across every internal table, for metric names present in the database but no longer defined by any YAML in the project (a renamed or deleted metric). Asks for confirmation (skip with--yes) and refuses to run when the project defines no metrics or its configs fail to parse, so a wrong directory or a duplicate-name error can’t wipe valid data.
- Internal-tables helpers backing the command:
list_detector_ids,list_alert_config_ids/delete_alert_state, and a maintenance mixin (list_known_metric_names,count_metric_rows,purge_metric).delete_detectionsgained an opt-inmutations_syncparameter. - New test suite
test_clean.py(+23 tests).
Documentation
Section titled “Documentation”- CLI reference gains a full
dtk cleansection; the configuration, detectors, and alerting guides note how config edits orphan data and link to the command.
[0.7.0] - 2026-06-12
Section titled “[0.7.0] - 2026-06-12”Major detector and alerting overhaul. Detector IDs change for many configs (see Migration below) — affected detectors recompute detections on the next run, which is safe and intended.
half_lifeparameter for recency weighting (mad/zscore/iqr). Withwindow_weights: exponential, a point’s weight halves everyhalf_lifepoints — accepts an int (points) or a duration string ("3d","12h", converted via the metric’s grid step). Defaults towindow_size / 20. Replacesweight_decay(still accepted, deprecated: decayd≡ half_lifeln(0.5)/ln(d)points; the old default 0.95 ≈ 13.5 points was so aggressive that detectors adapted to real incidents within hours).detrend: linearparameter (mad/zscore/iqr). Estimates a robust linear trend over the window (split-median slope) and projects window points to the current point before computing statistics, so a gradually trending metric no longer drifts out of its own confidence interval while sharp deviations from the trend are still caught. In the reference trend-spam simulation (60-day window, daily seasonality, −15% gradual decline over 30 days): 1557 false “below” alerts → 26 withhalf_life: "3d", → 19 combined withdetrend: linear; a sharp −40% incident is still caught at every point.- Time-aware weighting. Weights now depend on a point’s age on the time grid, not its position among valid points: data gaps no longer compress the decay, and seasonality-group statistics share the same recency horizon as global statistics (the horizon mismatch was the main reason weighting “barely helped” trending metrics before).
essmetadata field (Kish effective sample size) on weighted detections andtrend_slope_per_pointon detrended ones.- New test suites: weighted statistics, shared windowed-detector behavior (weights, detrend, validation, hashing), multi-detector decision matrix, channel send contract (+89 tests).
Changed
Section titled “Changed”- MAD threshold is now in σ-equivalents. MAD is scaled by the
normal-consistency constant 1.4826, so
threshold: 3.0genuinely means ~3-sigma (≈0.27% false positives on Gaussian noise) like Z-Score. Raw 3×MAD was only ≈2σ and fired on ~4.3% of perfectly normal points — the main source of baseline alert noise. MAD severity is in σ-equivalents too. - Multi-detector alert contract is now direction-aware and deterministic
(
min_detectors×direction×consecutive_anomalies):up/down: only anomalies in that direction count toward the quorum;any: every anomaly counts regardless of direction;same: at leastmin_detectorsdetectors must agree on ONE direction at the latest point (an up + a down detector is no longer “consensus”); the winning direction locks for the whole consecutive chain.- Consecutive points must be exactly one interval apart — detection gaps no longer count as “consecutive”.
- The alert payload comes from the highest-severity quorum record (ties broken by detector name) instead of arbitrary SQL ordering.
- Every result-affecting detector parameter now feeds the detector ID
(
seasonality_components,min_samples_per_group,smoothing_alpha,smoothing_window,window_weights,half_life,weight_decay,detrend). Previously tuning e.g.weight_decaysilently mixed old and new detection regimes under one ID. - Severity is now one convention for all windowed detectors: distance beyond the violated bound in spread units (σ-equivalents for MAD and Z-Score, IQR units for IQR; 0 = at the bound). Z-Score previously reported the point’s |z| (≥ threshold at the bound), which made cross-detector severities incomparable in multi-detector alerts.
- MAD/Z-Score/IQR collapsed into one shared
WindowedStatDetectortemplate (~1250 duplicated lines removed); behavior is identical across the three for windowing, preprocessing, weighting, detrending and seasonality. - Detector parameters are fully validated at construction: bad
input_type,smoothing,window_weights,detrend,half_lifevalues fail fast with a clear error instead of mid-detection. template_singleis now actually used (alerts withconsecutive_count ≤ 1);template_consecutivecovers streaks; each falls back to the other when unset.AlertConditionsdataclass defaults (direct API) now match the YAML defaults:direction="same",consecutive_anomalies=3.- Internal version is unified:
pyproject.tomlreadsdetectkit.__version__;dtk --versionreports the real version (was hardcoded0.1.0while__init__said0.5.3and pyproject0.6.0).
- Telegram and Email channels could never deliver an alert through the
orchestrator: their
send()signatures didn’t accept the template argument, so every dispatch raisedTypeError(and was swallowed as a failed channel). Both now follow the channel contract and return success. - Failed runs were recorded as
status='completed'with no error message in_dtk_tasks; they are now recorded asfailedwith the error. - Query-provided seasonality shifted onto wrong timestamps whenever gap filling inserted rows mid-range (padding was appended at the end); it is now realigned by timestamp.
- Seasonality grouping silently became a no-op when seasonality data
arrived as numpy unicode strings with orjson installed (
json_loadsrejectednumpy.str_, the error was swallowed, and the group mask matched the whole window). Parsing now coerces string types. - EMA smoothing no longer poisons the whole series when it starts with NaN.
get_context_size()now includes the smoothing warm-up, so batched detection with smoothing is deterministic across batch boundaries.weighted_percentileuses the midpoint (Hazen) convention — with uniform weights the median now matchesnp.medianexactly (the old interpolation was biased).weighted_std(ddof=1)no longer explodes when the effective sample size is ≤ 1.- IQR seasonality multipliers can no longer produce an inverted interval.
- Two alert channels of the same type no longer collapse into one dispatch result entry.
Migration
Section titled “Migration”- Detector IDs change for ALL mad/zscore/iqr detectors: the shared
implementation carries an algorithm-version tag (
@v2: σ-equivalent MAD, Hazen-midpoint weighted percentiles, unified severity), and additionally any non-defaultseasonality_components,min_samples_per_group, smoothing or weighting parameters now feed the hash. Affected detectors recompute from scratch on the next run (rows under old IDs remain;--full-refreshpurges them). - MAD users: intervals widen ×1.4826 by design. If you raised
thresholdto fight noise, try lowering it back toward 3.0. direction: samewithmin_detectors ≥ 2now requires true directional consensus and may alert less than the old (buggy) behavior.- Persisting anomalies still re-alert on every run unless
alert_cooldownis set — recommended for production metrics (e.g.alert_cooldown: "2h").
[0.6.0] - 2026-05-26
Section titled “[0.6.0] - 2026-05-26”- Stuck pipeline locks now self-heal;
--forceclears them. If a run was killed without releasing its lock — most commonly when the database restarted mid-run — therunningrow in_dtk_taskswas left behind, and every subsequent non---forcerun failed withRuntimeError: Failed to acquire lock ... Another task is running. Witherror_alertingenabled this produced a continuous stream of error alerts. Two gaps caused it, both now closed:acquire_lockignoredtimeout_seconds(the staleness check was an unimplemented TODO). Now arunningrow older than its storedtimeout_seconds(default 1 hour for the pipeline lock) is treated as stale and overridden, so the next normal run recovers automatically — matching thecan_start_processlogic in TECHNICAL_SPEC.md §13.1.--forcebypassed the lock but never cleared it: it skipped both acquire and release, so a forced run left the stale row in place and the spam continued.--forcenow takes ownership of the lock and releases it on exit, so a forced run also heals a previously stuck lock.
dtk unlock --select <selector>command. Clears a stuck pipeline lock immediately instead of waiting for the timeout to expire. Reports per metric whether a lock was cleared, accepts the same selectors asdtk run(name, path,tag:), and marks the taskcompletedso the next scheduled run proceeds without--force. Does not run the pipeline.
[0.5.3] - 2026-05-12
Section titled “[0.5.3] - 2026-05-12”- Project name in error alerts. When multiple detectkit projects
route
error_alertingto the same Slack/Mattermost channel, the genericPipeline error: <startup>title made it impossible to tell which project crashed (especially if both bots happened to share a username).AlertDatanow carriesproject_name, automatically populated fromdetectkit_project.yml’snamefield bydispatch_project_error_alert. The default error title becomes[project_name] Pipeline error: <metric>when the project name is known; collapses to the previous form when it isn’t. New template variables{project_name}and{project_name_prefix}are available in customerror_alerting.templatevalues (and in every other alert template — just empty for callers that don’t set it yet).
[0.5.2] - 2026-05-10
Section titled “[0.5.2] - 2026-05-10”dtk test-alertno longer crashes withAttributeError. The command had been broken since v0.3.9 (whenalertingbecame a list):create_mock_alert_datastill dereferencedmetric_config.alerting.mentionsand raisedAttributeError: 'list' object has no attribute 'mentions'on every invocation. Now it sources mentions from the specificAlertingConfigunder test — more correct anyway since different alert routes can ping different teams.
[0.5.1] - 2026-05-10
Section titled “[0.5.1] - 2026-05-10”- Project
error_alertingnow fires for startup failures. In v0.5.0 the dispatch lived insideTaskManager.run_metric, but three classes of failures crash earlier — at the CLI level, before a TaskManager exists:ProfilesConfig.from_yaml,profiles_config.create_manager(the user-reported “Connection reset by peer” case), andinternal_manager.ensure_tables. The DB outage that the feature was designed for is exactly the case that crashed increate_manager→ no alert went out. Extracted the dispatch intodetectkit.orchestration.error_dispatch. dispatch_project_error_alertand call it from both the CLI early paths (withmetric_name="<startup>") and fromTaskManager. The helper takesprofiles_config + project_configdirectly so it does not need a TaskManager instance to run.
[0.5.0] - 2026-05-10
Section titled “[0.5.0] - 2026-05-10”no_data_alertnow actually fires. The flag had been defined and persisted but was never read by the orchestrator, so missing-data alerts silently never went out. Newshould_alert_no_data()checks the latest expected interval in_dtk_datapoints(no row OR row with NULL/NaN value → “missing”) and dispatches a dedicated alert through the same channels, honouring the existingalert_cooldown/suppress_untilmachinery. Newtemplate_no_datafield onAlertingConfigfor the message body.- Project-level
error_alerting. New optional section indetectkit_project.ymlthat catches any pipeline exception (DB outage, query timeout, lock failure, channel HTTP, etc.) and ships one alert through the named channels. After the alert fires the run aborts (result["abort_run"] = True) so a dead source doesn’t cause N alerts for N metrics. No persistent cooldown — storing state in the DB doesn’t help when the DB itself is down, and a local file would break the dbt-style stateless model. Customtemplate,mentions, andtimezonesupported. AlertDatagainsis_no_data,is_error,error_type,error_message.format_messagehandles three new statuses (NO_DATA,ERROR, plus the existingRECOVERED/ANOMALY), exposes{value_display}as a NaN-safe template variable, and falls back to a kind-appropriate default if a user template uses{value:.2f}on a no-data / error payload.WebhookChanneladds amber#F0AD4Efor no-data and keeps red for error (visual parity with existing anomaly cards).
[dev]extras pinnedpytest-requests-mock>=0.1, which does not exist on PyPI. Every CI Test job aborted in 10s with “No matching distribution found” before pytest could even start. Replaced withpytest-mock.AlertData.valueis nowOptional[float](wasfloat). Required by the no-data / error paths where there is no real value; unchanged semantics for existing anomaly / recovery callers.
Internal
Section titled “Internal”- Whole codebase brought up to ruff + black compliance (autofixed
pyupgrade rules,
raise ... from e,zip(strict=True), formatting). No behaviour changes; 385 unit tests still pass. CI’s lint job is now actually a gate rather than a permanent red tile. [tool.ruff]migrated to[tool.ruff.lint]to silence the deprecation warning.
[0.4.1] - 2026-04-27
Section titled “[0.4.1] - 2026-04-27”min_detectors >= 2never fired:_load_recent_detectionscollapsed every detector at a given timestamp into a singleDetectionRecord, soshould_alertsaw at most one record per timestamp regardless of how many detectors actually flagged the point. Channels configured withmin_detectors: 2therefore went silent even when both detectors agreed on a “down” anomaly, while a parallelmin_detectors: 1channel fired normally. Now one record is emitted per detector per timestamp, matching the contract that the orchestrator and recovery code already expect.
[0.4.0] - 2026-04-19
Section titled “[0.4.0] - 2026-04-19”Breaking
Section titled “Breaking”DetectionResultfield order changed. The dataclass is now declared astimestamp, value, is_anomaly, processed_value=None, confidence_lower=None, confidence_upper=None, detection_metadata=None. Custom detectors that constructDetectionResultwith keyword arguments (the way every built-in detector does) are unaffected. Detectors that relied on the previous positional order (DetectionResult(ts, val, processed_val, True, ...)) must switch to keyword arguments or reorder.
Security
Section titled “Security”- SQL injection hardening: every
_dtk_*query now uses parameterised placeholders. Previouslymetric_name,detector_idand timestamp filters were interpolated via f-strings intoWHEREandALTER TABLE … DELETEclauses; a craftedmetric_namecould execute arbitrary SQL. Affected methods:load_datapoints,delete_datapoints,delete_detections,get_recent_detections(all ininternal_tables). - Secrets in
profiles.yml:${VAR}and{{ env_var('VAR') }}placeholders are now interpolated when the profile is loaded (ProfilesConfig.from_yaml). Database passwords no longer have to live in plaintext alongside the YAML.
detectkit.utils.env_interpolation.interpolate_env_vars— recursive helper used by both the profile loader and the alert-channel factory.detectkit.utils.json_utils— single source of truth for JSON helpers (replaces three local copies ofjson_dumps_sorted).detectkit.detectors.seasonality— sharedparse_seasonality_data/create_seasonality_mask(replaces ~240 lines of duplication across MAD, Z-Score and IQR).- GitHub Actions workflows:
ci.yml(pytest / mypy / ruff / black on Python 3.10–3.12) andpublish.yml(PyPI trusted publishing on tags). .pre-commit-config.yamlwith ruff/black/mypy/yaml/whitespace hooks.- Integration test scaffold under
tests/integration/usingtestcontainers[clickhouse]. Marked with@pytest.mark.integrationand skipped in environments without Docker. Install viapip install -e ".[integration]".
Changed
Section titled “Changed”internal_tables.py(1066 lines) became theinternal_tables/package with one mixin per logical table (_datapoints,_detections,_tasks,_metrics,_alert_states,_schema). Public API (from detectkit.database.internal_tables import InternalTablesManager) unchanged.task_manager.py(875 lines) became thetask_manager/package (_load_step,_detect_step,_alert_step,_base,_types,manager). Public exports preserved.alerting/orchestrator.py(777 lines) became thealerting/orchestrator/package (_decision,_cooldown,_recovery,_dispatch,_types)._compute_smaindetectors/base.pyrewritten using cumulative sums; the previous nested Python loop is gone.DetectionResult.processed_valueis now optional and defaults tovaluewhen not supplied — convenient for detectors that don’t pre-process data.- Pipeline failures now print the exception type and a traceback to stderr instead of just the message string.
- ClickHouse “epoch-as-NULL” handling consolidated into a single
_normalize_max_timestamphelper used by everyMAX(timestamp)query.
pytest.iniandpyproject.tomlno longer fight over pytest configuration: thepytest.inifile was removed and--cov=detectkit(was--cov=detectkitit) is the single source of truth.[tool.setuptools]packages = ["detectkit"]only shipped the top-level package; switched tosetuptools.packages.findso detector / alerting / CLI submodules end up in the wheel.- Stale unit tests that still expected the pre-
processed_valueschema and the wrong_dtk_detectionscolumn order have been refreshed.
Removed
Section titled “Removed”- Public-repo
.gitignoreno longer hidesTECHNICAL_SPEC.md,ARCHITECTURE.md,TODO.md,PROGRESS.md,init_plan.md,GRAFANA_DASHBOARD.md.CLAUDE.mdand.claude/remain ignored.
Migration notes (0.3.x → next)
Section titled “Migration notes (0.3.x → next)”- If you patched
detectkit.orchestration.task_manager.MetricLoaderin tests, update the dotted path todetectkit.orchestration.task_manager._load_step.MetricLoader(or importMetricLoaderdirectly fromdetectkit.loaders.metric_loader). - If you imported the private helpers
_parse_detection_metadata/_direction_from_metadatafromdetectkit.alerting.orchestrator— they’re still re-exported from the same path, no change needed. - To use env-var interpolation for DB credentials, set the variable in your
shell and reference it as
password: "{{ env_var('CLICKHOUSE_PASSWORD') }}"inprofiles.yml. Previously this only worked for alerting channels.
[0.3.17] - 2026-04-11
Section titled “[0.3.17] - 2026-04-11”- Recovery alert CI display: recovery messages now show the confidence interval from the current detection point (matching the displayed value’s seasonality group), not the stale CI from the last anomalous point. Previously, with hourly seasonality, recovery could show a CI from a different hour, making the value appear outside bounds when it was actually normal.
[0.3.16] - 2026-04-10
Section titled “[0.3.16] - 2026-04-10”suppress_untilfield in alerting config — temporarily suppress alerts until a specified UTC datetime without disabling the metric. Load and detect steps continue running; alerts auto-resume after the specified time. One-time setup, no need to toggleenabledtwice.
- Timezone display in alerts: timestamps are now converted from UTC to the configured
timezone(e.g.,Europe/Moscow) before formatting. Previously, UTC time was displayed with the timezone label appended, showing incorrect local time. - Recovery alert metadata: recovery messages now show the detector name and confidence interval from the last anomalous detection instead of “Detector: unknown” and “CI: N/A”.
[0.3.14] - 2026-04-09
Section titled “[0.3.14] - 2026-04-09”- Direction-aware recovery: recovery for
direction="up"/"down"/"same"alerts no longer waits for the metric to return inside the confidence interval. Adown-only alert now recovers as soon as the latest point is no longer adownanomaly (including when it flips to anupanomaly), matching the semantics of_count_consecutive_anomalies(). - ManualBoundsDetector recovery / alerting: anomaly direction is now read from
detection_metadata.direction(authoritative"below"/"above"written by every detector) instead of being reconstructed fromvaluevsconfidence_lower/upper. One-sided manual bounds (e.g. onlyupper_boundset,confidence_lower=None) no longer break direction resolution inAlertOrchestrator._check_recovery_since_last_alert()andTaskManager._load_recent_detections().
Changed
Section titled “Changed”InternalTablesManager.get_recent_detections()now selectsdetection_metadataand exposes it asdetection_metadata_listin the grouped result.- New
AlertOrchestrator._get_alert_trigger_direction()helper resolves the direction of the alert-triggering point fordirection="same"recovery checks.
[0.3.13] - 2026-04-08
Section titled “[0.3.13] - 2026-04-08”- New internal table
_dtk_alert_statesfor independent alert state per alerting config block (last_alert_sent,last_recovery_sent,alert_countkeyed bymetric_name+alert_config_id) alert_config_idgenerated as MD5 hash of all config params (channels, min_detectors, direction, consecutive_anomalies, alert_cooldown, cooldown_reset_on_recovery) — configs with the same channels but different conditions correctly get different IDs and independent state
- Multi-config alerting: when a metric has multiple
alerting:blocks, each now tracks its own alert/recovery state independently — fixes false recoveries caused by sharedlast_alert_sent - Recovery threshold: recovery now requires 0 detectors flagging the latest point as anomalous
(previously used
< min_detectors, causing false recovery when some detectors still saw anomaly) - Recovery message point:
_build_recovery_data()now correctly uses the newest detection point (detections[-1]) instead of the oldest (detections[0])
Changed
Section titled “Changed”get_last_alert_timestamp,update_alert_timestamp,get_last_recovery_timestamp,update_recovery_timestampnow requirealert_config_idparameterupsert_task_statussimplified — alert state no longer stored in_dtk_tasksAlertOrchestrator.__init__requiresalert_config_idparameter
Migration
Section titled “Migration”New table is created automatically on next dtk run via ensure_tables().
Existing alert state in _dtk_tasks is not migrated — first run after upgrade starts with clean state.
[0.3.12] - 2026-04-08
Section titled “[0.3.12] - 2026-04-08”- Custom
template_consecutivefrom alerting config now correctly passed tosend_alerts() - Numpy timezone warning in
upsert_task_status: strip tzinfo from datetime fields before converting todatetime64[ms]
Changed
Section titled “Changed”- Centralized UTC datetime handling into
detectkit/utils/datetime_utils.py(now_utc,now_utc_naive,to_naive_utc,to_aware_utc)
[0.3.11] - 2026-04-08
Section titled “[0.3.11] - 2026-04-08”- Recovery notifications never fired:
upsert_task_statuswas destroyinglast_alert_sent/last_recovery_senton every DELETE+INSERT cycle (fields were reset to NULL) - Alert mutations now use
mutations_sync=1to prevent race conditions between alert step and lock release
[0.3.10] - 2026-04-08
Section titled “[0.3.10] - 2026-04-08”- False recovery detection: check latest point’s anomaly status instead of counting consecutive anomalies
- Alert step now always runs (recovery notifications need it even when no new anomalies detected)
min_detectorsnow correctly read from alerting config instead of being hardcoded to 1
[0.3.9] - 2026-04-07
Section titled “[0.3.9] - 2026-04-07”- Multiple alerting configurations per metric:
alertingnow accepts a list of alert configs, each with its own channels, timezone, template, and conditions - Backward-compatible: single
alerting:dict still works as before
[0.3.8] - 2026-04-07
Section titled “[0.3.8] - 2026-04-07”- Channel-agnostic mentions in alert messages (
mentionsconfig field) format_mentions()method onBaseAlertChannel— overridable per channel- Platform-specific formatting: Mattermost (
@user), Slack (<!here>,<@UID>), Telegram (@user), Email (CC: user) {mentions}and{mentions_line}template variables for custom placement- Special keywords:
here,channel,allfor broadcast mentions - Documentation: mentions guide, 4 example scenarios, updated configuration reference
[0.3.7] - 2026-04-06
Section titled “[0.3.7] - 2026-04-06”Changed
Section titled “Changed”- Mattermost alerts now use attachments format with colored sidebar (red for anomaly, green for recovery)
- Webhook default templates omit metric name from body (shown in attachment title)
[0.3.6] - 2026-04-06
Section titled “[0.3.6] - 2026-04-06”- Recovery notifications:
notify_on_recovery: truein alerting config sends a message when metric stabilizes after an anomaly template_recoveryconfig option for custom recovery message template{status}template variable in all alert templates ("ANOMALY"or"RECOVERED")is_recoveryfield onAlertDatato distinguish recovery messages from anomaly alertsAlertOrchestrator.should_send_recovery()— checks recovery conditions and returns AlertDataAlertOrchestrator.send_recovery()— sends recovery via configured channels and tracks timestamp_dtk_tasks.last_recovery_sentcolumn for deduplication (one recovery notification per incident)InternalTablesManager.get_last_recovery_timestamp()andupdate_recovery_timestamp()methodsBaseAlertChannel.get_default_recovery_template()method
Migration
Section titled “Migration”Existing installations need to add the new column manually:
ALTER TABLE _dtk_tasks ADD COLUMN last_recovery_sent Nullable(DateTime64(3, 'UTC'));[0.3.2] - 2025-11-11
Section titled “[0.3.2] - 2025-11-11”- Critical bug: Newly added detectors no longer start processing from 1970-01-01 (epoch)
get_last_detection_timestamp()now properly handles epoch timestamps returned by ClickHouse for NULL values- This completes the epoch fix from v0.2.5 which only fixed the datapoints method
[0.3.1] - 2025-11-10
Section titled “[0.3.1] - 2025-11-10”- CLI now shows warnings when metric files fail to parse (YAML syntax errors, validation errors, etc.) instead of silently skipping them
- Tag selector (
--select tag:) now searches both.ymland.yamlfiles (previously only searched.yml, inconsistent with name selector)
Changed
Section titled “Changed”- Improved error messages when no metrics are found - now provides feedback about which files were skipped due to parsing errors
- Made metric file discovery consistent across both tag and name selectors
0.3.0 - 2025-11-10
Section titled “0.3.0 - 2025-11-10”- Alert cooldown system to prevent spam from persistent anomalies
alert_cooldownconfiguration parameter (supports “30min” string or integer seconds)cooldown_reset_on_recoveryoption to reset cooldown when metric recovers_dtk_tasks.last_alert_sentcolumn to track last alert timestamp_dtk_tasks.alert_countcolumn to track total alerts sent per metric
Changed
Section titled “Changed”AlertOrchestratornow checks cooldown period before sending alertsInternalTablesManageradded methods:get_last_alert_timestamp(),update_alert_timestamp()- Alert orchestration moved cooldown check before expensive operations for performance
- Alert spam when persistent anomalies generate duplicate alerts at every interval
0.2.8 - 2025-11-10
Section titled “0.2.8 - 2025-11-10”- Detection step no longer runs with 0 points when current interval is incomplete
- Alerts no longer sent when 0 anomalies detected in current run
get_recent_detections()now filters bycreated_afterto prevent loading old detections from previous runs
0.2.7 - 2025-11-10
Section titled “0.2.7 - 2025-11-10”_dtk_metricsinformational table for analysts and dashboards- Metric configuration metadata stored automatically on every
dtk run descriptionfield support in metric configuration files- Tags extraction and storage in
_dtk_metricstable
- Timezone warning in
load_datapoints()by converting timezone-aware datetimes to naive - Project name handling in
dtk initcommand (now extracts basename from path)
0.2.5 - 2025-11-08
Section titled “0.2.5 - 2025-11-08”- Critical bug:
get_last_timestamp()returning epoch (1970-01-01) instead of None when no data exists - Prevented incorrect historical data loading due to epoch timestamp
0.2.4 - 2025-11-07
Section titled “0.2.4 - 2025-11-07”Changed
Section titled “Changed”- Improved logging output formatting
- Enhanced error messages for better debugging
- Numpy datetime64 comparison warnings by ensuring datetime objects are timezone-naive
0.2.3 - 2025-11-07
Section titled “0.2.3 - 2025-11-07”- Metric name selector (
--select) now correctly searches metrics in subdirectories - Previously only searched in root
metrics/directory
0.2.2 - 2025-11-07
Section titled “0.2.2 - 2025-11-07”requestsdependency for HTTP-based alert channels
0.2.1 - 2025-11-07
Section titled “0.2.1 - 2025-11-07”Changed
Section titled “Changed”- Alert formatting improved for better readability
- Database-agnostic architecture maintained across all components
- Recursion error in alert message formatting by adding
detector_paramsfield - Broadcasting error in seasonality mask application
- Timezone comparison issues in datetime handling
0.2.0 - 2025-11-06
Section titled “0.2.0 - 2025-11-06”- Detector Preprocessing: Transform input values before detection
input_type: "raw"- Use values as-is (default)input_type: "diff"- Detect on differences between consecutive pointsinput_type: "pct_change"- Detect on percentage changes
- Value Smoothing: Reduce noise with moving average
smoothing_window: N- Apply N-point moving average before detection
- Recent Value Weighting: Weight recent data more heavily
recent_weight: 0.0-1.0- Weight for recent 20% of window (default: 0.0)
- All statistical detectors (MAD, Z-Score, IQR, ManualBounds) support preprocessing
Changed
Section titled “Changed”- Detector base classes updated to support preprocessing pipeline
- Detection metadata now includes preprocessing information
0.1.2 - 2025-11-05
Section titled “0.1.2 - 2025-11-05”- Data integrity validation: uniqueness checks for datapoints and detections
- Tags support for metric categorization and filtering
tagsfield in metric configuration (YAML array)
Changed
Section titled “Changed”- Internal tables rebuilt with ReplacingMergeTree engine for automatic deduplication
0.1.1 - 2025-11-04
Section titled “0.1.1 - 2025-11-04”- Seasonality support for Z-Score detector
- Seasonality support for IQR detector
- Documentation for seasonality features in all statistical detectors
0.1.0 - 2025-11-03
Section titled “0.1.0 - 2025-11-03”- Initial release of detectkit
- Core functionality:
- Metric data loading from databases (ClickHouse, PostgreSQL, MySQL)
- Statistical anomaly detectors (MAD, Z-Score, IQR, Manual Bounds)
- Seasonality support (MAD detector)
- Multi-channel alerting (Mattermost, Slack, Telegram, Email)
- CLI interface (
dtk init,dtk run) - Idempotent operations with resume capability
- Internal tables for state management (_dtk_datapoints, _dtk_detections, _dtk_tasks)
- Documentation:
- Comprehensive guides (configuration, alerting, detectors)
- API reference for all detector types
- Quick start guide
- Installation instructions
- Testing:
- 287+ unit tests
- 87% code coverage