`dft schema`¶

Browse the data hierarchy: source → schema → table → column. The deepest non-None argument determines the return tier.

dft schema [OPTIONS]

The schema verb is the single entry point for asking "what data do I have?" — from the configured sources at the top, down to a column's profile statistics at the bottom. Pair it with dft schema-search for keyword and predicate search across descriptions, tags, and roles.

Synopsis¶

Invocation	Returns
`dft schema`	All configured data sources
`dft schema --source SRC`	Schemas/datasets within `SRC`
`dft schema --source SRC --schema SCH`	Tables within `SCH`, with lean summaries (name, kind, row_count, description, tags)
`dft schema --source SRC --schema SCH --table TBL`	Deep profile of `TBL` (columns, types, lineage, partitions, last_modified)
`dft schema --source SRC --schema SCH --table TBL --column COL`	Column profile (type, semantic_type, distribution, distinct_count, null_pct, relationships)

Options¶

Flag	Description
`--source TEXT`	Data source name. Omit to list all sources.
`--schema TEXT`	Schema/dataset/namespace name. Requires `--source`.
`--table TEXT`	Table name. Requires `--source` and `--schema`.
`--column TEXT`	Column name. Requires `--source`, `--schema`, and `--table`.
`--json`	Stable JSON output suitable for agent consumption.
`--project-dir PATH`	Project directory for resolving relative paths.
`--lineage-depth INT`	Hops of upstream/downstream lineage to surface (1–10, default 1).

Wildcards at every level¶

--source, --schema, --table, and --column all accept:

* — everything at that tier
comma-lists — "orders,customers"
fnmatch globs — "stg_*", "*_id"

Wildcards are expanded once at entry and applied uniformly down the tree.

Always quote glob args

Shell expansion can swallow * before it reaches dft. Quote everything: --table 'stg_*' not --table stg_*.

Examples¶

List configured sources¶

dft schema

  warehouse  (bigquery)  fivetran-eng:analytics
  csv_files  (csv)       data/

List schemas in a source¶

dft schema --source warehouse

  raw           (47 tables)
  staging       (132 tables)
  analytics     (28 tables)
  marts         (15 tables)

List tables in a schema¶

dft schema --source warehouse --schema analytics

  [analytics]
    fct_orders             (table) rows=2401829
    dim_customers          (table) rows=18402
    fct_subscriptions      (table) rows=4291
    vw_active_customers    (view)

Profile a single table¶

dft schema --source warehouse --schema analytics --table fct_orders

Table: fct_orders
  rows: 2401829
  description: One row per order, grain = order_id
  ↑ upstream: stg_orders (ref), stg_order_items (ref)
  ↓ downstream: rpt_revenue_by_month, rpt_orders_dashboard
  partitions: day on order_date (90 entries)
  last_modified: 2026-05-11T03:14:22Z
  order_id          STRING        primary_key
  customer_id       STRING        foreign_key  → dim_customers.customer_id
  order_date        DATE          date
  revenue           NUMERIC(18,2)
  status            STRING        categorical
  ← from_table.from_column   # any foreign-key references back into this table

Profile a single column¶

dft schema --source warehouse --schema analytics --table fct_orders --column status

Column: fct_orders.status
  type: STRING
  semantic_type: categorical
  distribution: {completed: 0.84, pending: 0.09, cancelled: 0.07}
  distinct_count: 3
  null_pct: 0.0%

Glob across many tables¶

dft schema --source warehouse --schema analytics --table 'fct_*' --column '*_id'

Returns every _id column on every fact table, in one call.

JSON output and `jq` piping¶

--json emits a stable named-dict tree rooted at sources.<src>.schemas.<schema>.tables. This is the escape hatch for ad-hoc cross-cutting queries the curated verbs don't anticipate.

Tables above a row-count threshold¶

dft schema --source warehouse --schema analytics --table '*' --json \
  | jq '.sources.warehouse.schemas.analytics.tables
        | to_entries[]
        | select(.value.row_count > 1000000)
        | .key'

Every column whose name ends in `_at`¶

dft schema --source warehouse --schema analytics --table '*' --column '*' --json \
  | jq '[.sources.warehouse.schemas.analytics.tables
         | to_entries[]
         | {table: .key,
            cols: [.value.columns
                   | to_entries[]
                   | select(.key | test("_at$"))
                   | .key]}]
        | map(select(.cols | length > 0))'

Same column across all tables¶

dft schema --source warehouse --schema analytics --table '*' --column user_id --json \
  | jq '[.sources.warehouse.schemas.analytics.tables
         | to_entries[]
         | select(.value.columns.user_id)
         | .key]'

For keyword and predicate search (role:primary_key, --missing description, fk_to, etc.) prefer the structured verb instead — see dft schema-search.

How profiling works¶

The schema verb reads from a layered resolver:

Super Schema cache (target/super_schema.json) — if a recent profile exists, it's returned immediately.
dbt adapter + manifest — falls back to live warehouse introspection plus parsed manifest.json for metadata.

Row counts, distributions, and null percentages come from the cached profile. To force a fresh profile of a specific table, see the Inspector docs.

Empty results don't mean an empty project

Some projects query files directly (e.g. via read_csv(...) in DuckDB) instead of persistent database tables. A zero-table response means no persistent tables, not no data.

Lineage¶

--lineage-depth N (1 ≤ N ≤ 10) controls how many hops of upstream and downstream lineage are surfaced on table profiles.

--lineage-depth 1 (default): direct parents and children only
--lineage-depth 3: walks the DAG three hops in each direction

Lineage is derived from dbt ref() and source() calls in the manifest. Column-level lineage isn't yet exposed via this verb — table-level only.

Output contract¶

The JSON envelope is always wrapped in sources, with a _meta footer:

{
  "sources": {
    "warehouse": {
      "type": "bigquery",
      "schemas": {
        "analytics": {
          "tables": { ... }
        }
      }
    }
  },
  "_meta": {
    "sources_consulted": ["super_schema"],
    "generated_at": "2026-05-11T18:42:17Z"
  }
}

sources_consulted reports which layers contributed: ["super_schema"] on cache hit, ["dbt_adapter", "dbt_manifest"] on live introspection.

dft schema-search — keyword / predicate search across the same corpus
dft search — search dashboards (faces) by keyword
dft context — question-aware schema retrieval for AI workflows
Inspector — how the underlying profile artifact is built
Inspector Context Stack — full layer-by-layer breakdown of where schema data comes from

dft schema¶