Skip to content

dft schema

Browse the data hierarchy: source → schema → table → column. The deepest non-None argument determines the return tier.

dft schema [OPTIONS]

The schema verb is the single entry point for asking "what data do I have?" — from the configured sources at the top, down to a column's profile statistics at the bottom. Pair it with dft schema-search for keyword and predicate search across descriptions, tags, and roles.

Synopsis

Invocation Returns
dft schema All configured data sources
dft schema --source SRC Schemas/datasets within SRC
dft schema --source SRC --schema SCH Tables within SCH, with lean summaries (name, kind, row_count, description, tags)
dft schema --source SRC --schema SCH --table TBL Deep profile of TBL (columns, types, lineage, partitions, last_modified)
dft schema --source SRC --schema SCH --table TBL --column COL Column profile (type, semantic_type, distribution, distinct_count, null_pct, relationships)

Options

Flag Description
--source TEXT Data source name. Omit to list all sources.
--schema TEXT Schema/dataset/namespace name. Requires --source.
--table TEXT Table name. Requires --source and --schema.
--column TEXT Column name. Requires --source, --schema, and --table.
--json Stable JSON output suitable for agent consumption.
--project-dir PATH Project directory for resolving relative paths.
--lineage-depth INT Hops of upstream/downstream lineage to surface (1–10, default 1).

Wildcards at every level

--source, --schema, --table, and --column all accept:

  • * — everything at that tier
  • comma-lists — "orders,customers"
  • fnmatch globs — "stg_*", "*_id"

Wildcards are expanded once at entry and applied uniformly down the tree.

Always quote glob args

Shell expansion can swallow * before it reaches dft. Quote everything: --table 'stg_*' not --table stg_*.

Examples

List configured sources

dft schema
  warehouse  (bigquery)  fivetran-eng:analytics
  csv_files  (csv)       data/

List schemas in a source

dft schema --source warehouse
  raw           (47 tables)
  staging       (132 tables)
  analytics     (28 tables)
  marts         (15 tables)

List tables in a schema

dft schema --source warehouse --schema analytics
  [analytics]
    fct_orders             (table) rows=2401829
    dim_customers          (table) rows=18402
    fct_subscriptions      (table) rows=4291
    vw_active_customers    (view)

Profile a single table

dft schema --source warehouse --schema analytics --table fct_orders
Table: fct_orders
  rows: 2401829
  description: One row per order, grain = order_id
  ↑ upstream: stg_orders (ref), stg_order_items (ref)
  ↓ downstream: rpt_revenue_by_month, rpt_orders_dashboard
  partitions: day on order_date (90 entries)
  last_modified: 2026-05-11T03:14:22Z
  order_id          STRING        primary_key
  customer_id       STRING        foreign_key  → dim_customers.customer_id
  order_date        DATE          date
  revenue           NUMERIC(18,2)
  status            STRING        categorical
  ← from_table.from_column   # any foreign-key references back into this table

Profile a single column

dft schema --source warehouse --schema analytics --table fct_orders --column status
Column: fct_orders.status
  type: STRING
  semantic_type: categorical
  distribution: {completed: 0.84, pending: 0.09, cancelled: 0.07}
  distinct_count: 3
  null_pct: 0.0%

Glob across many tables

dft schema --source warehouse --schema analytics --table 'fct_*' --column '*_id'

Returns every _id column on every fact table, in one call.

JSON output and jq piping

--json emits a stable named-dict tree rooted at sources.<src>.schemas.<schema>.tables. This is the escape hatch for ad-hoc cross-cutting queries the curated verbs don't anticipate.

Tables above a row-count threshold

dft schema --source warehouse --schema analytics --table '*' --json \
  | jq '.sources.warehouse.schemas.analytics.tables
        | to_entries[]
        | select(.value.row_count > 1000000)
        | .key'

Every column whose name ends in _at

dft schema --source warehouse --schema analytics --table '*' --column '*' --json \
  | jq '[.sources.warehouse.schemas.analytics.tables
         | to_entries[]
         | {table: .key,
            cols: [.value.columns
                   | to_entries[]
                   | select(.key | test("_at$"))
                   | .key]}]
        | map(select(.cols | length > 0))'

Same column across all tables

dft schema --source warehouse --schema analytics --table '*' --column user_id --json \
  | jq '[.sources.warehouse.schemas.analytics.tables
         | to_entries[]
         | select(.value.columns.user_id)
         | .key]'

For keyword and predicate search (role:primary_key, --missing description, fk_to, etc.) prefer the structured verb instead — see dft schema-search.

How profiling works

The schema verb reads from a layered resolver:

  1. Super Schema cache (target/super_schema.json) — if a recent profile exists, it's returned immediately.
  2. dbt adapter + manifest — falls back to live warehouse introspection plus parsed manifest.json for metadata.

Row counts, distributions, and null percentages come from the cached profile. To force a fresh profile of a specific table, see the Inspector docs.

Empty results don't mean an empty project

Some projects query files directly (e.g. via read_csv(...) in DuckDB) instead of persistent database tables. A zero-table response means no persistent tables, not no data.

Lineage

--lineage-depth N (1 ≤ N ≤ 10) controls how many hops of upstream and downstream lineage are surfaced on table profiles.

  • --lineage-depth 1 (default): direct parents and children only
  • --lineage-depth 3: walks the DAG three hops in each direction

Lineage is derived from dbt ref() and source() calls in the manifest. Column-level lineage isn't yet exposed via this verb — table-level only.

Output contract

The JSON envelope is always wrapped in sources, with a _meta footer:

{
  "sources": {
    "warehouse": {
      "type": "bigquery",
      "schemas": {
        "analytics": {
          "tables": { ... }
        }
      }
    }
  },
  "_meta": {
    "sources_consulted": ["super_schema"],
    "generated_at": "2026-05-11T18:42:17Z"
  }
}

sources_consulted reports which layers contributed: ["super_schema"] on cache hit, ["dbt_adapter", "dbt_manifest"] on live introspection.