Skip to content

YAML Language Design Guide

Principles for evolving the Dataface YAML dashboard language. This guide is for contributors designing new syntax — fields, chart config, layout types, variable inputs, query types, or style properties. It captures the patterns that make the existing syntax coherent so new additions stay consistent.

For how to write face YAML files, see the YAML Style Guide. For the full field catalog, see the Field Reference.

Related: dataface/core/render/chart/DESIGN.md for chart rendering philosophy and the Vega-Lite wrapper contract.


1. Structure Is the Type Declaration

The presence of a key determines the type. Don't add explicit type: fields when structure alone is unambiguous.

# Layout type is inferred from the key:
rows: [...]      # row layout
cols: [...]      # column layout
grid: {...}      # grid layout
tabs: {...}      # tab layout

# Query type is inferred from its fields:
sql: |           # → SQL query
  SELECT ...
metrics: [...]   # → MetricFlow query
model: ref(...)  # → dbt model query
rows: [...]      # → values query (inline data)

When designing new syntax: if you're adding a new variant of something (new query type, new variable input, new layout), first ask whether the new fields are distinctive enough to infer the type. Only add an explicit type: field when structural inference would be ambiguous.

The order of checking matters — document it. For queries, the existing precedence is: metricsmodelsql → string (default SQL).


2. Data-Binding Key: column, Not field

Every place Dataface YAML binds a data position uses column: — the channel grammar (color.column, background.column, etc.), table column configs (style.columns[*].column), and variable options (variables.x.options.column). The corresponding Pydantic attributes are named column too.

Historical note. v1.x used the Vega-Lite term field: for the channel grammar and table columns. Renamed to column: pre-launch (tasks/workstreams/dft-core/tasks/rename-field-to-column-across-all- dataface-yaml-and-types.md) because every Dataface data source is a table and every binding target is a column — the Vega-Lite word was a euphemism. Old field: keys now raise ValidationError.

The Vega-Lite output boundary still uses field: inside the rendered VL spec — that's VL's own key, not Dataface's. The rename only covers the authored surface and Python-type attributes; render-layer output dicts that flow into VL retain "field".


3. Naming: snake_case Only — No camelCase Ever

All Dataface YAML property names use snake_case. No exceptions. Never use camelCase, even when porting directly from Vega-Lite or JavaScript sources. If a property arrives as camelCase from an upstream spec, convert it to snake_case at the Dataface boundary.

When wrapping a Vega-Lite concept, use VL's own name converted to snake_case:

Vega-Lite Dataface
cornerRadius corner_radius
labelFontSize label_font_size
strokeWidth stroke_width

Do NOT invent parallel names unless there is a deliberate, documented reason. If Vega-Lite calls it encoding.y.axis, don't create settings.y_axis or y_axis_config. Map it to encoding.y.axis (or a shorthand that resolves there). When you do diverge from a VL name, record it in the Known Divergences table below so it doesn't look like an accident.

When adding a concept that has no Vega-Lite equivalent (KPI, table, spark, interactions), name it descriptively in snake_case and follow the naming style of surrounding fields.

Namespace Repeated Prefixes

When three or more keys share a prefix (e.g. font_size, font_weight, font_family), that prefix should be a nested namespace instead of a flat prefix:

# BAD — flat prefix repeated 3+ times
title_font_family: "Source Serif 4"
title_font_weight: 600
title_overflow: wrap-two
title_height: 40

# GOOD — namespace the shared prefix
title:
  font:
    family: "Source Serif 4"
    weight: 600
  overflow: wrap-two
  min_height: 40

This reduces repetition, makes the hierarchy self-documenting, and mirrors how CSS groups properties (the font shorthand, border shorthand, etc.). When you see a flat prefix_* pattern accumulating, refactor it into prefix: with nested keys before it spreads further.


4. Shorthand + Full Form

When a field has a common single-value case but also supports richer configuration, it should support two forms:

  1. Shorthand — a string or scalar for the common case
  2. Full form — a dict for the complete configuration

Not every field needs this. Only apply the pattern when there's a clear 80% case that benefits from being a single value. Fields that are always structured (like grid:) don't need a shorthand.

# Shorthand
source: my_database
format: "$,.0f"
spark: line
projection: albersUsa

# Full form
sources:
  default: my_database
  other_db:
    type: postgres
    connection_string: ...

format:
  spec: ",.0f"
  prefix: "$"
  suffix: " USD"

spark:
  type: line
  color: "#3b82f6"
  show_last: true

# Progress spark — max is optional; omitting it auto-scales to column max
spark:
  type: progress
  # max: 50000   ← explicit cap; omit to auto-scale to the column's observed max

projection:
  type: albersUsa
  center: [-98, 38]
  scale: 1000

When designing new syntax: always ask "what's the 80% case?" and make that expressible as a single string or scalar. The full-form dict handles the remaining 20%.

The normalizer is responsible for expanding shorthand to full form. Downstream code should only see the full form.


5. Relationship to Vega-Lite

Dataface is deeply influenced by Vega-Lite's design: its declarative grammar, its encoding channel model, and its layered config/style inheritance. Today, Vega-Lite is the rendering backend for most chart types. But Dataface is not a Vega-Lite wrapper — it's a chart language that happens to target Vega-Lite for many chart types, and renders KPI, table, spark, and map charts through its own SVG renderers.

The goal is one cohesive chart library. A user should not perceive a seam between "Vega-Lite charts" and "Dataface-native charts." All chart types should feel like they belong to the same language — same field names, same style system, same shorthand patterns, same config inheritance. If Dataface eventually replaces the VL backend, the authored YAML should not need to change.

Thin Ergonomic Layer

For concepts Vega-Lite supports natively, the Dataface YAML should be a thin ergonomic layer — not a parallel chart language.

  • x, y, color, size, shape, theta map directly to VL encoding channels
  • type: bar means VL mark: bar
  • Axis, scale, legend config passes through to VL's native properties

Adding Dataface shorthand is appropriate when it significantly simplifies a common pattern:

# Dataface shorthand — common case is one field name
x: date
y: revenue

# What this resolves to in Vega-Lite
encoding:
  x: { field: date, type: temporal }
  y: { field: revenue, type: quantitative }

Adding Dataface shorthand is NOT appropriate when it just renames a Vega-Lite property:

# BAD — don't do this
y_axis_side: right        # Just renaming VL's axis.orient

# GOOD — pass through the VL property
style:
  axis_y:
    orient: right         # Maps directly to VL axisY.orient

Test: Would a Vega-Lite user recognize the field? If not, it needs a very good reason to exist.

Known Divergences from Vega-Lite

These are intentional naming or structural differences. They exist for good reasons — don't "fix" them back to VL names, but don't add new divergences without equally good reasons.

Dataface Vega-Lite Why
style (board/chart) config Dataface splits VL's config into style presets (scaffold) and themes (painting). style is the user-facing name because it better describes what the user is doing: styling their chart. config is reserved for project-level settings (dataface.yml).
style.charts.* sub-models use snake_case VL config.* uses camelCase Dataface is a Python/YAML ecosystem — snake_case is idiomatic. The style_to_vega_lite() mapper handles translation.
style.charts.table, style.charts.kpi, style.charts.inference (no VL equivalent) Non-VL sections that live alongside VL-mapped sections under style.charts. Consumed directly by Dataface renderers.
Style presets + themes (two layers) Single config object Dataface separates scaffold (HTML-like: what exists, where it sits) from painting (CSS-like: fonts, colors, strokes). See chart DESIGN.md.

When porting a VL property, check this table first. If the Dataface location differs from where VL puts it, there's likely a reason. Follow the existing pattern, don't create a third convention.

Stylistic Influence

Even where Dataface diverges from VL's names, it draws heavily from VL's design philosophy:

  • Declarative grammar — the chart is described, not imperatively drawn
  • Layered config inheritance — base defaults → style preset → theme → board-level style → chart-level style, each layer overriding the one below
  • Encoding channelsx, y, color, size, shape as the primary way to map data fields to visual properties
  • Mark types as the fundamental chart taxonomy
  • Config/style composition — a standalone chart should look good with zero style overrides; each layer adds specificity

New features should feel like they belong in this grammar. If something feels imperative or procedural ("first do X, then apply Y"), rethink it as a declarative property.

Layered Charts (Mixed-Mark Composition)

When a chart needs multiple mark types (e.g. bars + lines), use type: layered with a layers list. Parent-level channels supply shared defaults; each layer specifies its own type and can override y, color, size, shape.

charts:
  revenue_vs_target:
    query: monthly_data
    type: layered
    x: month
    layers:
      - type: bar
        y: revenue
      - type: line
        y: target

This maps directly to Vega-Lite's layer composition. The translation is mechanical: parent-level channels become shared encoding, each layer becomes a VL layer entry with its own mark and encoding overrides.

Design decisions:

  • type stays as the primary authored field — type: layered rather than introducing a mark: field or composition: concept
  • layers is a list (ordered) because layer order affects rendering (later layers draw on top)
  • Each layer's type must be a primitive mark (bar, line, area, point, etc.), not layered (no nesting) or auto
  • Pie/donut stay outside the layered path unless explicitly layered — they use theta encoding, not x/y

6. Data Transformations Belongs to Queries, Not Charts

The query layer owns data meaning: grain, aggregation, filtering, ordering. The chart layer owns visual encoding: marks, axes, colors, layout.

New chart fields should NEVER:

  • Aggregate, regroup, or bucket data
  • Derive new semantic columns
  • Perform analytical reordering that changes meaning
  • Silently fix wrong-shaped data

If the chart needs different data, the right answer is "change the query." The YAML should make this obvious — chart config touches presentation, query config touches data.


7. Progressive Disclosure

The simplest useful form should require the fewest fields. Complexity is opt-in.

# Level 1 — bare minimum
rows:
  - query: sales
    type: bar
    x: month
    y: revenue

# Level 2 — add polish
rows:
  - query: sales
    type: bar
    x: month
    y: revenue
    title: "Monthly Revenue"
    color: region
    sort:
      by: revenue
      order: desc

# Level 3 — full control
rows:
  - query: sales
    type: bar
    x: month
    y: revenue
    title: "Monthly Revenue"
    color: region
    sort:
      by: revenue
      order: desc
    style:
      orientation: horizontal
      bar:
        corner_radius: 4
      legend:
        orient: bottom
    encoding:
      y:
        axis:
          format: "$,.0f"

When adding new features: the zero-config version should do the right thing. Advanced configuration goes in nested keys that users can ignore until they need them.


8. Defaults Live in Config YAML, Not Code

All default values live in YAML config files:

What Where
Chart dimensions, axis limits defaults/default_config.yml
Default theme name defaults/default_config.yml > vega.default_theme
Visual styling, axis/legend/mark colors, scaffold defaults defaults/themes/*.yaml (compiled to VL via style_to_vega_lite)
Color palettes (categorical, sequential, diverging, semantic, scaffold) defaults/palettes/<family>/*.yml
Chart-type-specific settings style.charts.<type> in theme YAML

When introducing a new field:

  • Don't hardcode its default in Python
  • Do add the default to the appropriate YAML config file
  • Do read it via get_config() at runtime

This ensures users can override any default via dataface.yml or board-level style: without touching code.

The default config is the field catalog

default_config.yml should be readable as a complete reference of what properties exist and how they relate to each other. Two conventions make this work:

1. Comment inherited/cascaded properties at each level. When a section inherits properties from a parent, list them as commented-out placeholders so readers can see the full surface area:

style:
  title:
    font_family: "'Source Serif 4', Georgia, serif"
    font_weight: 600
    overflow: wrap-two
    min_height: 40

  charts:
    title:
      # inherits from style.title: font_family, font_weight, min_height
      font_size: 18        # overrides parent (board titles are larger)
      overflow: wrap-two    # same as parent, explicit for clarity

When a section adds no overrides, leave a commented-out block showing what it inherits:

    kpi:
      # title:
      #   inherits all from style.charts.title
      #   (font_family, font_size, font_weight, overflow)
      value_font_weight: "bold"

2. Add explanation comments for non-obvious properties. Any property whose purpose isn't clear from its name should have a brief comment explaining what it controls:

  # Pixel gap between cards when card_gap is enabled on a board.
  # Only takes effect when a face sets card_gap: true.
  card_gap: 24.0

  # Minimum floor for face title height in layout sizing.
  # The actual title height may be larger based on text wrapping.
  title_height: 40.0

These conventions make default_config.yml the single place to understand what's available, what cascades, and what each property does — without reading source code.


9. Recursive Composition

Boards nest arbitrarily. Any layout item can be a full board with its own variables, queries, charts, and layout.

New features should respect this. If a feature makes sense at the top-level board, ask whether it should also work at nested board scope. Usually the answer is yes — scoped variables, scoped queries, and scoped charts all follow this pattern.


9a. Property Inheritance (What Cascades)

Because boards nest arbitrarily (§8), every property needs a clear answer: does a child board inherit this value from its parent?

The Rule

"Would a child board typically want the same value as its parent?"

If yes, the property should cascade (child inherits unless it overrides). If no, each board gets its own value independently.

This is the same rule CSS uses, and it maps to a clean split:

  • Text/content properties cascade. You set font_family on a parent board and expect all nested boards to use the same font. You set a color palette and expect nested charts to share it.
  • Box/layout properties do NOT cascade. You set padding: 24px on a parent board and do NOT expect child boards to also have 24px padding. Each board has its own geometry.
  • Context properties cascade. Theme name, data source, and variables establish a context that children operate within.

What cascades and what doesn't

Property category Cascades? Examples Why
Typography (font, size, weight) Yes style.font.family, style.title.font.family Text should be consistent
Colors (palette, scheme) Yes color, chart palette Visual coherence
Chart styling (axis, legend, marks) Yes style.charts.* Charts in nested boards should match
Theme / style preset name Yes theme, style_preset Establishes visual context
Data source Yes source Children query the same database
Variables Yes variables Template context flows down
Width, height, min_height No face.width, face.min_height Box geometry — per-board
Padding, margin No style.board.margin, style.board.card_padding Box spacing — per-board
Background No style.background Each board has its own canvas
Border, border_radius No style.border Box chrome — per-board
Gap, card_gap No face.card_gap Layout spacing — per-board

When designing new properties

Before adding a property, decide: does it cascade? Apply the rule:

  1. Text/content property (font, color, chart config) → put it in StyleCompiled so it cascades via _propagate_style_compiled()
  2. Box/layout property (width, height, padding, border, gap) → put it in FaceStyle or face config. No cascade.
  3. Context property (source, theme, variables) → cascade via parent_context in the normalizer

Document the cascade behavior in the field reference when adding the field.


10. Lists for Ordered, Maps for Named

  • Lists ([...]) for things where order matters: layout items, grid items, table columns, transform steps
  • Maps ({...}) for things identified by name: queries, charts, variables, sources

Don't use a list of {name: ..., ...} objects when a map of name: {...} would work. Maps are more readable and enable direct reference by key.

# YES — map for named queries
queries:
  sales: { sql: "SELECT ..." }
  products: { sql: "SELECT ..." }

# NO — list with explicit name fields
queries:
  - name: sales
    sql: "SELECT ..."
  - name: products
    sql: "SELECT ..."

11. Cross-File References Use Dot Syntax

External references use filename.resource_id:

query: _shared_queries.sales

The filename (without extension) is the namespace. Partials use _ prefix convention.

New features that reference named things should use the same dot syntax. Don't introduce new reference mechanisms ($ref, imports, include directives, etc.).


12. Porting Config from Vega-Lite

When moving Vega-Lite configuration options into Dataface YAML:

Do

  • Always convert camelCase → snake_case: cornerRadiuscorner_radius. No camelCase property should ever appear in authored YAML. This is the most common source of inconsistency — catch it in review.
  • Keep the same semantic grouping: if VL groups it under axis, keep it under axis (or axis_y, axis_x)
  • Provide shorthand only for truly common patterns: if 80% of users will set the same thing, make it a top-level field
  • Document what VL property it maps to: in the field reference and in code comments at the mapping site

Don't

  • Rename for the sake of renaming: orient should stay orient, not become side or position
  • Flatten VL's nesting without reason: VL groups axis.labelFontSize and axis.labelColor together because they're related. Keep them grouped.
  • Create wrapper fields: don't add show_grid: true when the typed style surface already has axis.grid.hidden
  • Bundle multiple VL properties into one Dataface field: don't create axis_style: "minimal" that secretly sets grid + domain + ticks. Use style presets for that.

Checklist for each new field ported from VL

  1. What is the Vega-Lite property name?
  2. What is the snake_case Dataface name? (should be obvious conversion)
  3. Where does it go in the Dataface YAML hierarchy?
  4. Does it need a shorthand form?
  5. What is the default? (goes in config YAML, not code)
  6. Is there an existing Dataface field that already covers this? (don't duplicate)

13. Adding New Chart Types

The user should not perceive a seam between VL-backed chart types and Dataface-native ones. All chart types live in one flat namespace, share the same top-level fields, and use the same style system.

When adding a chart type that Vega-Lite doesn't support natively (like KPI, table, spark_bar):

  1. Use the same top-level chart fields where they apply (query, title, type, style)
  2. Add type-specific fields at the chart level, not buried in nested config: value for KPI, style.columns for table
  3. Follow the same shorthand + full form pattern
  4. Participate in style/theme inheritance — chart-level style: should work the same way it does for VL chart types
  5. Document in the field reference alongside VL chart types — not in a separate "custom charts" section

When adding an alias for a VL chart type (like scatterpoint, heatmaprect):

  1. Add it to the ChartType enum
  2. Map it mechanically in the profile layer
  3. Document it as an alias, not a new type

14. Adding New Variable Input Types

When adding a new input type:

  1. Prefer type-inference-by-structure if the new input has distinctive fields
  2. If not distinctive enough, use input: new_type
  3. Type-specific fields go under the variable definition, not in a separate config block
  4. Support sensible defaults — default: should work the same way
  5. Consider whether it needs query-driven options (like select does)

15. Error Messages Over Silent Behavior

When the YAML is wrong, the system should produce a clear error, not silently do something unexpected.

This applies to language design too: if a new field combination doesn't make sense (e.g., theta on a bar chart), validate and error at compile time. Don't silently ignore it.


16. Diagnostic Suppression

Dataface runs structural diagnostics on SQL queries at compile time (fanout_risk, reaggregation, missing_join_predicate). When a flagged pattern is intentional, authors can suppress the diagnostic at three levels. Suppression is the union of all layers — any layer can silence a code.

Layer 1 — SQL-inline (-- dft:ignore)

queries:
  sales_by_customer:
    sql: |
      -- dft:ignore fanout_risk
      SELECT customer_id, SUM(o.amount), SUM(li.quantity)
      FROM orders o
      JOIN line_items li ON o.id = li.order_id
      GROUP BY customer_id

Multiple codes on one line: -- dft:ignore fanout_risk reaggregation. Blanket suppress all (except parse_error): -- dft:ignore. Follows the sqlfluff -- noqa: convention but uses dft:ignore to avoid collision.

Layer 2 — YAML ignore property

queries:
  sales_by_customer:
    sql: SELECT ...
    ignore: [fanout_risk]

First-class field on the Query model. Discoverable, IDE-completable.

Layer 3 — Project-wide via meta.yaml

# meta.yaml
lint:
  ignore:
    - fanout_risk
  ignore_queries:
    sales_by_customer:
      - reaggregation

lint.ignore suppresses everywhere. lint.ignore_queries suppresses per query name. Cascades through the meta chain like other meta.yaml settings.

Design rules

  • parse_error is never suppressible — if SQL can't be parsed, that's always an error.
  • Suppressed diagnostics are recorded for audit (--show-suppressed in the CLI, suppressed_warnings on CompileResult). They are not silently discarded.
  • When adding a new diagnostic code, decide whether it belongs in _UNSUPPRESSIBLE_CODES. Most codes should be suppressible.

17. Visibility Toggles

Three mechanisms exist for hiding things. Use the right one:

Mechanism When to use Example
hidden: true Dataface-authored field: renders nothing but the slot still exists variables.hidden: true, style.spark_bar.labels_hidden: true
null / omit Remove a property from the compiled spec entirely title: null, legend: null
disable: true Vega-Lite passthrough only — matches VL's own naming axis.disable: true

Rule: Any new visibility toggle on a Dataface-authored field uses hidden: bool = false (positive-sense: false = visible). Never use show_* prefix or enabled for visibility.

pagination.enabled is not a visibility toggle — it activates a feature. It follows the enabled pattern because it controls whether the feature exists, not whether it renders.

Checklist: - [ ] New toggle hides something → hidden: false default, not show_* - [ ] VL passthrough toggle → use VL's disable: name - [ ] Removing from spec entirely → null, not hidden


Summary Checklist for New Syntax

Before adding or approving any new YAML field/feature:

  • [ ] Uses snake_case naming
  • [ ] Uses VL's name (converted to snake_case) if wrapping a VL concept, or divergence is documented
  • [ ] Doesn't invent a parallel name for something VL already names (without good reason)
  • [ ] No flat prefix_* repeated 3+ times — namespace them under prefix: instead
  • [ ] Supports shorthand + full form where a clear 80% single-value case exists
  • [ ] Type is inferred from structure where possible (no unnecessary type: fields)
  • [ ] Default value is in YAML config, not hardcoded in Python
  • [ ] Inherited properties are documented at each config level (commented-out placeholders or # inherits: notes)
  • [ ] Works at nested board scope if it works at top level
  • [ ] Cascade behavior is explicit: text/content properties cascade, box/layout properties don't (§8a)
  • [ ] Chart fields are pure presentation — no data transformation
  • [ ] Simplest form requires fewest fields (progressive disclosure)
  • [ ] Invalid combinations produce clear errors, not silent behavior
  • [ ] Documented in the field reference
  • [ ] Follows existing patterns in the same section of the YAML
  • [ ] Visibility toggle uses hidden: false, not show_* or enabled (§16)