dft schema¶
Browse the data hierarchy: source → schema → table → column. The deepest non-None argument determines the return tier.
The schema verb is the single entry point for asking "what data do I have?" — from the configured sources at the top, down to a column's profile statistics at the bottom. Pair it with dft schema-search for keyword and predicate search across descriptions, tags, and roles.
Synopsis¶
| Invocation | Returns |
|---|---|
dft schema |
All configured data sources |
dft schema --source SRC |
Schemas/datasets within SRC |
dft schema --source SRC --schema SCH |
Tables within SCH, with lean summaries (name, kind, row_count, description, tags) |
dft schema --source SRC --schema SCH --table TBL |
Deep profile of TBL (columns, types, lineage, partitions, last_modified) |
dft schema --source SRC --schema SCH --table TBL --column COL |
Column profile (type, semantic_type, distribution, distinct_count, null_pct, relationships) |
Options¶
| Flag | Description |
|---|---|
--source TEXT |
Data source name. Omit to list all sources. |
--schema TEXT |
Schema/dataset/namespace name. Requires --source. |
--table TEXT |
Table name. Requires --source and --schema. |
--column TEXT |
Column name. Requires --source, --schema, and --table. |
--json |
Stable JSON output suitable for agent consumption. |
--project-dir PATH |
Project directory for resolving relative paths. |
--lineage-depth INT |
Hops of upstream/downstream lineage to surface (1–10, default 1). |
Wildcards at every level¶
--source, --schema, --table, and --column all accept:
*— everything at that tier- comma-lists —
"orders,customers" - fnmatch globs —
"stg_*","*_id"
Wildcards are expanded once at entry and applied uniformly down the tree.
Always quote glob args
Shell expansion can swallow * before it reaches dft. Quote everything:
--table 'stg_*' not --table stg_*.
Examples¶
List configured sources¶
List schemas in a source¶
List tables in a schema¶
[analytics]
fct_orders (table) rows=2401829
dim_customers (table) rows=18402
fct_subscriptions (table) rows=4291
vw_active_customers (view)
Profile a single table¶
Table: fct_orders
rows: 2401829
description: One row per order, grain = order_id
↑ upstream: stg_orders (ref), stg_order_items (ref)
↓ downstream: rpt_revenue_by_month, rpt_orders_dashboard
partitions: day on order_date (90 entries)
last_modified: 2026-05-11T03:14:22Z
order_id STRING primary_key
customer_id STRING foreign_key → dim_customers.customer_id
order_date DATE date
revenue NUMERIC(18,2)
status STRING categorical
← from_table.from_column # any foreign-key references back into this table
Profile a single column¶
Column: fct_orders.status
type: STRING
semantic_type: categorical
distribution: {completed: 0.84, pending: 0.09, cancelled: 0.07}
distinct_count: 3
null_pct: 0.0%
Glob across many tables¶
Returns every _id column on every fact table, in one call.
JSON output and jq piping¶
--json emits a stable named-dict tree rooted at sources.<src>.schemas.<schema>.tables. This is the escape hatch for ad-hoc cross-cutting queries the curated verbs don't anticipate.
Tables above a row-count threshold¶
dft schema --source warehouse --schema analytics --table '*' --json \
| jq '.sources.warehouse.schemas.analytics.tables
| to_entries[]
| select(.value.row_count > 1000000)
| .key'
Every column whose name ends in _at¶
dft schema --source warehouse --schema analytics --table '*' --column '*' --json \
| jq '[.sources.warehouse.schemas.analytics.tables
| to_entries[]
| {table: .key,
cols: [.value.columns
| to_entries[]
| select(.key | test("_at$"))
| .key]}]
| map(select(.cols | length > 0))'
Same column across all tables¶
dft schema --source warehouse --schema analytics --table '*' --column user_id --json \
| jq '[.sources.warehouse.schemas.analytics.tables
| to_entries[]
| select(.value.columns.user_id)
| .key]'
For keyword and predicate search (role:primary_key, --missing description, fk_to, etc.) prefer the structured verb instead — see dft schema-search.
How profiling works¶
The schema verb reads from a layered resolver:
- Super Schema cache (
target/super_schema.json) — if a recent profile exists, it's returned immediately. - dbt adapter + manifest — falls back to live warehouse introspection plus parsed
manifest.jsonfor metadata.
Row counts, distributions, and null percentages come from the cached profile. To force a fresh profile of a specific table, see the Inspector docs.
Empty results don't mean an empty project
Some projects query files directly (e.g. via read_csv(...) in DuckDB) instead of persistent database tables. A zero-table response means no persistent tables, not no data.
Lineage¶
--lineage-depth N (1 ≤ N ≤ 10) controls how many hops of upstream and downstream lineage are surfaced on table profiles.
--lineage-depth 1(default): direct parents and children only--lineage-depth 3: walks the DAG three hops in each direction
Lineage is derived from dbt ref() and source() calls in the manifest. Column-level lineage isn't yet exposed via this verb — table-level only.
Output contract¶
The JSON envelope is always wrapped in sources, with a _meta footer:
{
"sources": {
"warehouse": {
"type": "bigquery",
"schemas": {
"analytics": {
"tables": { ... }
}
}
}
},
"_meta": {
"sources_consulted": ["super_schema"],
"generated_at": "2026-05-11T18:42:17Z"
}
}
sources_consulted reports which layers contributed: ["super_schema"] on cache hit, ["dbt_adapter", "dbt_manifest"] on live introspection.
Related¶
dft schema-search— keyword / predicate search across the same corpusdft search— search dashboards (faces) by keyworddft context— question-aware schema retrieval for AI workflows- Inspector — how the underlying profile artifact is built
- Inspector Context Stack — full layer-by-layer breakdown of where schema data comes from