At a glance

Problem

Pipeline concerns are often mixed, so reruns and ownership become hard to reason about.

What was built

A contract-first engine with typed settings and adapter boundaries across runtime stages.

Current scope

REST and file ingestion, Duck-compatible warehouses, and dbt-first transformation.

Why it matters

Pipeline flow stays centralised while backend-specific behaviour remains isolated.

Problem

Data projects often blur orchestration, source logic, warehouse behaviour, and transformation execution. This increases maintenance cost and makes rerun behaviour harder to reason about. Axiomatic Engine addresses this by separating responsibilities with typed protocols and adapter boundaries.

Architecture

The engine is organised into contracts, adapters, and core stages. Sources expose resources through protocol boundaries, ingestion is executed with dlt into a selected warehouse adapter, and transformation is delegated through a transformation adapter.

This keeps pipeline flow centralised while backend specifics remain isolated. Settings are loaded from AXIOMATIC_* variables with CLI override support.

%%{init: {'flowchart': {'htmlLabels': true}} }%% flowchart LR sourceNode["<div style='width:17rem; text-align:center;'><b>Source protocols</b><div style='margin-top:0.5rem;'>Typed source contracts expose extraction resources</div></div>"] --> ingestNode["<div style='width:17rem; text-align:center;'><b>Ingestion stage</b><div style='margin-top:0.5rem;'>dlt orchestration handles extraction and load execution</div></div>"] ingestNode --> warehouseNode["<div style='width:17rem; text-align:center;'><b>Warehouse adapter</b><div style='margin-top:0.5rem;'>Backend-specific load behaviour remains isolated</div></div>"] warehouseNode --> transformNode["<div style='width:17rem; text-align:center;'><b>Transformation adapter</b><div style='margin-top:0.5rem;'>dbt execution applies model contracts and quality checks</div></div>"] transformNode -. planned .-> semanticNode["<div style='width:17rem; text-align:center;'><b>Semantic translator (planned)</b><div style='margin-top:0.5rem;'>Converts dbt semantic definitions for MCP consumption</div></div>"]

Execution and backend strategy

Core contracts

Source, storage, warehouse, and transformation protocols define extension points. Literal kinds constrain available options, so unsupported backends fail explicitly instead of degrading silently. This helps keep project code domain-aware and engine code domain-agnostic.

Execution flow

Runtime flow is consistent: resources are wrapped and enriched, dlt performs ingestion into the configured warehouse destination, and the transformation stage runs only when enabled. Settings are loaded from AXIOMATIC_* variables with CLI override support.

Transformation strategy

The transformation stage is dbt-first by design. Dependency graphing, model ordering, and tests stay with dbt, while orchestration stays with the engine. This avoids reimplementing mature dbt capabilities in engine internals and keeps CI execution straightforward.

Warehouse strategy

DuckDB and MotherDuck share a Duck-compatible base for common behaviour, with concrete adapters handling backend-specific validation. MotherDuck token handling remains environment-driven and is kept out of URI paths to reduce accidental credential exposure.

From source data to trusted reporting

Data from APIs and files is loaded on a schedule, modelled into a shared reporting layer, and used across dashboards and AI-assisted workflows.

The goal is one trusted model, clear ownership, and straightforward handover.

flowchart LR sourceApis["<b>Source APIs</b><br/>Operational systems and service endpoints"] --> dataLoads["<b>Automated loads</b><br/>Scheduled ingestion with governed contracts"] fileStreams["<b>File streams</b><br/>Batch extracts and managed file drops"] --> dataLoads dataLoads --> sharedModel["<b>Shared model</b><br/>Reusable semantic layer for consistent metrics"] sharedModel --> biReports["<b>Dashboards</b><br/>Business reporting for operational decisions"] sharedModel --> aiContext["<b>AI context</b><br/>Structured inputs for assistant and agent workflows"]

Implemented scope and declared extensions

Implemented scope includes REST and file-based ingestion paths, DuckDB-compatible warehouse execution, typed settings, and dbt-first transformation orchestration. Declared extension points remain for additional storage backends and warehouses, with explicit not-implemented signalling.

Implemented: REST and file ingestion, Duck-compatible warehouses, and dbt-first transformation.

Phase 1 roadmap: S3, GCS, S3-compatible storage, and BigQuery.

Phase 2 roadmap: GraphQL API, SQL source, vendor REST connectors, PostgreSQL, Snowflake, and Redshift.

Phase 3 roadmap: SharePoint, OneDrive, Fabric, Databricks, and Azure Blob/ADLS.

Trade-offs and Lessons

The current design favours clear boundaries and predictable execution over broad backend coverage. This reduces ambiguity for early adopters, but it also means some integrations remain planned. Semantic and MCP standards are tracked as planned capability work until concrete implementation is published.

Reproducibility Notes

How to reproduce

Assumptions

  • Python 3.10+ is available and dependencies are installed from examples/fake_store.
  • Copy .env.example to .env and set AXIOMATIC_* values for your chosen warehouse.
  • Network access is available for Fake Store API calls.

Commands

cd examples/fake_store
uv sync
uv run python run_pipeline.py --skip-transforms
uv run python run_pipeline.py --run-transforms
# pip alternative: python -m pip install -r requirements.txt

Limitations

  • Engine-level execution is demonstrated through the Fake Store project runner rather than a standalone engine CLI.
  • Transformation coverage depends on local dbt profile/target settings.

Open Fake Store example project

Evidence and Decision Log

Decision log

  • ADR 001 (accepted) - Keep shared Duck behaviour in a base class while isolating DuckDB and MotherDuck semantics in concrete adapters. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/001-warehouse-adapter-hierarchy.md - Read source ADR
  • ADR 002 (accepted) - Keep orchestration in engine, delegate model planning/testing to dbt, and enforce explicit warehouse compatibility checks. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/002-dbt-first-transformation-orchestration.md - Read source ADR
  • ADR 003 (accepted) - Treat SourceProtocol and ResourceProtocol as the stable boundary between domain extraction logic and engine orchestration. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/003-source-contract-boundary.md - Read source ADR
  • ADR 004 (accepted) - Inject a uniform _axiomatic_extracted_at_utc lineage field in BaseResource before load normalisation. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/004-axiomatic-extraction-metadata-injection.md - Read source ADR
  • ADR 005 (accepted) - Gate ingestion on landing outcome with explicit force_reload override for deterministic full reruns. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/005-pipeline-stage-gating-and-force-reload.md - Read source ADR
  • ADR 006 (accepted) - Use environment-first typed settings with explicit CLI override precedence for runtime control. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/006-environment-first-configuration-with-cli-overrides.md - Read source ADR
  • ADR 007 (accepted) - Resolve dbt project and profiles paths to absolute directories before subprocess invocation. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/007-dbt-project-and-profiles-path-resolution.md - Read source ADR
  • ADR 008 (accepted) - Establish a logging-first observability baseline for pipeline stage progress and diagnostics. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/008-observability-baseline-for-pipeline-execution.md - Read source ADR
  • ADR 009 (accepted) - Add typed resource load hints and map them in the source bridge with merge primary-key guardrails. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/009-resource-load-hints-contract-and-bridge-mapping.md - Read source ADR
  • ADR 010 (accepted) - Adopt hybrid schema evolution modes (auto, strict, discard) with source-bridge mapping to dlt semantics. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/010-hybrid-schema-evolution-policy.md - Read source ADR
  • ADR 011 (accepted) - Harden dbt runtime invocation with PATH fallback and MotherDuck token propagation safeguards. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/011-dbt-runtime-invocation-and-motherduck-token-propagation.md - Read source ADR
  • ADR 012 (accepted) - Standardise package quality gates with uv groups to protect distribution readiness before release. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/012-package-readiness-and-uv-quality-gate.md - Read source ADR
  • ADR 013 (accepted) - Introduce explicit source kinds and factory routing to keep project source definitions typed and extensible. https://github.com/axiomatic-bi/axiomatic-engine/blob/main/docs/adr/013-specific-source-kinds-and-source-factory-routing.md - Read source ADR

See all ADRs

Source Narrative

This page anchors engine claims to architectural and ADR evidence, rather than framework-level marketing statements.

Current measurable scope

  • Protocol boundary areas: source, storage, warehouse, transformation.
  • Implemented transformation backend: dbt (dbt-first orchestration path).
  • Implemented warehouse adapters: duckdb and motherduck with shared Duck-compatible base behaviour.
  • Runtime configuration is loaded from AXIOMATIC_* variables with CLI-over-env override support in project runners.
  • Declared-but-unimplemented examples: storage gcs/s3, warehouse bigquery, and additional transformation backends.

Operational behaviour already evidenced

  • Ingestion supports per-resource write_disposition (append, replace, merge) and optional primary_key.
  • Schema evolution policy supports auto, strict, and discard.
  • Rerun semantics are explicit: merge for idempotent upsert behaviour, replace for deterministic snapshots, append for arrival history.
  • dbt execution policy avoids internal dlt.common.* APIs to reduce compatibility risk across upstream changes.
ResourceLoadHints
- write_disposition: append | replace | merge
- primary_key: optional, used by merge semantics
- schema_evolution_mode: auto | strict | discard

Reproducibility boundary

Engine orchestration is exercised here through a project runner rather than a standalone engine CLI.
Future semantic and MCP orchestration standards remain explicitly planned until implementation artefacts are published.