Skip to content

Internal Tables

detectkit writes everything it computes into a handful of internal _dtk_* tables in your database. This page is the schema reference for those tables — their columns, primary keys, and storage engines. For copy-pasteable SQL that charts them, see Visualizing Results.

Results live in the internal_database (or internal_schema) configured in your profile — separate from the data_database your metric queries read from. See the Profiles configuration for where these live. All tables are auto-created on first run (ensure_tables(), idempotent).

Database note. Column types below are shown ClickHouse-flavored, but detectkit runs on ClickHouse, PostgreSQL and MySQL — all six tables exist on all three backends with portable-equivalent types. The primary keys are enforced on every backend; the storage engines (ReplacingMergeTree / MergeTree) are ClickHouse-specific. On ClickHouse, reads of a ReplacingMergeTree table need FINAL to collapse superseded versions, while PostgreSQL/MySQL enforce the primary key so a plain SELECT is already deduplicated.

Every table is keyed by metric_name, so deleting a metric’s YAML leaves orphan rows that dtk clean prunes (except _dtk_autotune_runs, which is a deliberate audit trail).

_dtk_datapoints — the loaded metric points

Section titled “_dtk_datapoints — the loaded metric points”

One row per metric per timestamp on the metric’s time grid (gaps are filled with NULL, never 0).

ColumnTypeMeaning
metric_nameStringMetric identifier
timestampDateTime64(3, UTC)Point on the metric’s time grid
valueNullable(Float64)Metric value (NULL = missing / gap-filled)
seasonality_dataString (JSON)Extracted seasonality features for the point
interval_secondsInt32The metric’s grid step, in seconds
seasonality_columnsStringComma-separated names of the seasonality features
created_atDateTime64(3, UTC)When the row was written

Primary key: (metric_name, timestamp) · Engine: ReplacingMergeTree(created_at)

_dtk_detections — per-point detection results

Section titled “_dtk_detections — per-point detection results”

One row per (metric, detector, timestamp). A metric with two detectors produces two rows per timestamp, so always filter or group by detector_id.

ColumnTypeMeaning
metric_nameStringMetric identifier
detector_idStringDetector identity (hash of its parameters)
detector_nameStringDetector class name (MADDetector, ZScoreDetector, IQRDetector, ManualBoundsDetector)
timestampDateTime64(3, UTC)The point that was scored
is_anomalyBooltrue/1 if flagged anomalous
confidence_lowerNullable(Float64)Lower edge of the expected range
confidence_upperNullable(Float64)Upper edge of the expected range
valueNullable(Float64)Original metric value (matches _dtk_datapoints.value)
processed_valueNullable(Float64)What the detector actually analyzed (after smoothing / input_type)
detector_paramsString (JSON)The detector’s non-default parameters
detection_metadataString (JSON)Per-point detail: severity, direction, distance, reason, …
created_atDateTime64(3, UTC)When the detection was written

Primary key: (metric_name, detector_id, timestamp) · Engine: ReplacingMergeTree(created_at)

The detection_metadata JSON commonly carries:

  • reason — set only when a point could not be scored: "missing_data" (value was NULL) or "insufficient_data" (fewer than min_samples points in the window). For real, scored points this key is absent. Filter it out to drop these “technical” rows.
  • direction"above" / "below" (present only on anomalies).
  • severity — distance past the breached bound in units of spread (σ-equivalents for MAD/Z-Score, IQR-units for IQR); 0 = exactly on the bound. Comparable across MAD and Z-Score.
  • distance — raw distance from the value to the breached bound.
  • Optional: ess (effective sample size, when window weighting is on), trend_slope_per_point (when detrend: linear is on), preprocessing, window_size, and global_* / adjusted_* window statistics.

See Shared Detector Parameters for the full metadata contract.

_dtk_tasks — pipeline locks & resume state

Section titled “_dtk_tasks — pipeline locks & resume state”

One row per (metric, detector, process_type). This is the pipeline’s bookkeeping table: it holds the run lock (only one process per key at a time), the resume cursor (last_processed_timestamp), and timeout/error state. A running row older than its timeout_seconds is treated as stale and overridden, so a killed run never blocks future ones.

ColumnTypeMeaning
metric_nameStringMetric identifier
detector_idStringDetector identity, or load for the loading task
process_typeStringload or detect
statusStringrunning / completed / failed
started_atDateTime64(3, UTC)When the task started
updated_atDateTime64(3, UTC)Last update
last_processed_timestampNullable(DateTime64(3, UTC))Resume cursor — last successfully processed point
error_messageNullable(String)Error detail when failed
timeout_secondsInt32Stale-lock threshold
last_alert_sentNullable(DateTime64(3, UTC))Legacy cooldown column (authoritative alert state is in _dtk_alert_states)
alert_countUInt32Legacy alert counter
last_recovery_sentNullable(DateTime64(3, UTC))Legacy recovery-sent column

Primary key: (metric_name, detector_id, process_type) · Engine: MergeTree (rows replaced via DELETE + INSERT)

One row per (metric, alert config block), identified by a hash of the alert config. Tracks cooldown / recovery state — useful for an “alert activity” panel.

ColumnTypeMeaning
metric_nameStringMetric identifier
alert_config_idStringHash of the alerting config (channels, conditions, …)
last_alert_sentNullable(DateTime64(3, UTC))When the last alert fired
last_recovery_sentNullable(DateTime64(3, UTC))When the last recovery notification fired
alert_countUInt32Total alerts sent for this config
updated_atDateTime64(3, UTC)Last update

Primary key: (metric_name, alert_config_id) · Engine: ReplacingMergeTree(updated_at)

_dtk_metrics — config mirror (informational)

Section titled “_dtk_metrics — config mirror (informational)”

A mirror of each metric’s config for dashboards. It is informational only — the load → detect → alert pipeline never reads it — and is rewritten on every dtk run via DELETE + INSERT.

ColumnTypeMeaning
metric_nameStringMetric identifier
descriptionNullable(String)Optional metric description
pathStringPath to the .yml config file
intervalStringInterval string (10min, 1h, …)
loading_start_timeNullable(DateTime64(3, UTC))Initial-load start time
loading_batch_sizeUInt32Loading batch size
is_alert_enabledUInt8Whether alerting is enabled (0/1)
timezoneNullable(String)Timezone for alerts
directionNullable(String)Required anomaly direction (same/any/up/down)
consecutive_anomaliesUInt32Consecutive anomalies to trigger an alert
no_data_alertUInt8Alert on missing data (0/1)
min_detectorsUInt32Minimum detectors that must agree
tagsString (JSON)JSON array of tags
enabledUInt8Whether the metric is enabled (0/1)
created_atDateTime64(3, UTC)First time the config was saved
updated_atDateTime64(3, UTC)Last config update

Primary key: (metric_name) · Engine: MergeTree (DELETE + INSERT)

_dtk_autotune_runs — auto-tune audit trail

Section titled “_dtk_autotune_runs — auto-tune audit trail”

One row per dtk autotune run, keyed by metric_name + run_id. It is an audit trail only: the pipeline never reads it and dtk clean --orphaned-metrics never prunes it. The full column list (training period, labels, scoring metric, chosen seasonality/detector/params, CV score, decision log, generated config) is documented in the Auto-tune reference.

Primary key: (metric_name, run_id) · Engine: ReplacingMergeTree(created_at)

Dedup is by primary key + INSERT IGNORE semantics. For the append tables (_dtk_datapoints, _dtk_detections, _dtk_alert_states, _dtk_autotune_runs) this is reinforced by ReplacingMergeTree, which collapses duplicate keys by the version column (created_at / updated_at). On ClickHouse you must read with FINAL to see the collapsed result; PostgreSQL/MySQL enforce the key directly. Every pipeline stage also resumes from the last persisted timestamp, so re-running dtk run never reprocesses or double-writes.