Queries¶
Queries define what data to fetch from your data sources. Dataface supports five types of queries. Most projects start with SQL. MetricFlow semantic layer queries and dbt model queries are optional enhancements that play nicely with a dbt setup—bonus points if you already leverage dbt!
- SQL queries - Direct SQL against database tables
- Values queries - Inline data embedded directly in your YAML
- HTTP / API queries - Fetch JSON data from REST endpoints
- Semantic layer queries - Use dbt MetricFlow metrics and dimensions (optional, advanced)
- dbt model queries - Query dbt models directly (optional, no MetricFlow required)
Basic Query Structure¶
Every query has a unique name and defines what data to fetch. Query names must be unique within a file and across all imported files.
The simplest way to write a SQL query is a bare string — just the SQL itself. Set source: once at the face level and every string query uses it automatically:
{% raw %}
source: my_postgres queries: sales: | SELECT DATE_TRUNC('month', created_at) AS month, region, SUM(revenue) AS total_revenue, COUNT(*) AS order_count FROM orders WHERE status = 'completed' AND {{ filter('region', region) }} AND {{ filter_date_range('created_at', date_range) }} GROUP BY 1, 2
When you need to override the source, use description for AI search metadata, or mix query types on the same face, use the dict form instead:
{% raw %}
queries: sales: description: "Monthly completed-order revenue by region" sql: | SELECT ... source: my_postgres
Query Types¶
1. SQL Queries (Baseline)¶
String shorthand — set source: once at the face level and write queries as bare SQL strings:
{% raw %}
source: my_postgres queries: sales: | SELECT DATE_TRUNC('month', created_at) as month, region, SUM(revenue) as total_revenue, COUNT(*) as order_count FROM orders WHERE status = 'completed' AND {{ filter('region', region) }} AND {{ filter_date_range('created_at', date_range) }} GROUP BY 1, 2 ORDER BY 1 DESC
Dict form — use when you need to override the source per query, set a target, or add a description:
{% raw %}
queries: sales: sql: | SELECT ... source: my_postgres # Required when no face-level source target: prod # Optional: dbt target name (defaults to 'dev')
Use the pipe (|) after sql: (or directly after the query name for string shorthand) to start a YAML literal block. Everything you indent under it becomes one multiline string, so you can paste your SQL verbatim.
Database Connection via Source Configuration¶
Dataface uses source configuration for SQL database connections. A source name can refer to a dbt profile defined in profiles.yml. This means:
- ✅ Works with dbt profiles - Dataface reads your existing
profiles.ymlfile - ✅ Works with all dbt-supported databases - Postgres, Snowflake, BigQuery, Redshift, DuckDB, etc.
- ✅ Leverages dbt's adapter system - Handles database-specific quirks automatically
- ✅ Same credentials as dbt - Uses your existing database connections
Setting up sources:
- Create or edit
~/.dbt/profiles.yml(orprofiles.ymlin your project directory) - Define your database connection using dbt's profile format:
{% raw %}
# ~/.dbt/profiles.yml my_postgres: outputs: dev: type: postgres host: localhost port: 5432 user: myuser password: mypass dbname: mydb schema: public prod: type: postgres host: prod.example.com port: 5432 user: prod_user password: "{{ env_var('DB_PASSWORD') }}" dbname: analytics schema: public target: dev # Default target
- Reference the source in your queries using the
sourcefield (and optionallytarget):
queries: sales: sql: SELECT * FROM orders source: my_postgres # Uses 'my_postgres' source (matches dbt profile name) target: prod # Uses 'prod' target (defaults to 'dev' if not specified)
For more information: - See dbt's profile documentation for complete profile configuration options - See dbt adapter documentation for database-specific connection parameters
Dataface also has direct support for dbt MetricFlow semantic layer queries and dbt model queries as optional extras—use them when your dbt project already exposes metrics or models and treat it as a little reward for being a great dbt user.
Type Inference: Query types are automatically inferred from the keys present:
- sql: → SQL Query (requires source field)
- rows: or values: → Values Query (inline data)
- url: → HTTP Query
- metrics: and/or dimensions: → Semantic Layer Query
- model: and columns: → dbt Model Query
Use cases:
- Most common use case - direct SQL queries
- Full control over query logic
- Works with any dbt-supported database
- Requires source configuration (minimal setup - just profiles.yml)
2. Values Queries (Inline Data)¶
Values queries embed data directly in your YAML — no database or file needed. Perfect for examples, documentation, small reference datasets, and prototyping.
Dict rows syntax — each row is a key-value mapping:
queries: products: rows: - { product: "Widget A", revenue: 100, category: "Electronics" } - { product: "Widget B", revenue: 140, category: "Electronics" } - { product: "Gadget X", revenue: 180, category: "Accessories" }
Columns + values syntax — compact, SQL-style:
queries: products: columns: [product, revenue, category] values: - ["Widget A", 100, "Electronics"] - ["Widget B", 140, "Electronics"] - ["Gadget X", 180, "Accessories"]
Both syntaxes produce identical results. Use whichever reads better for your data.
Type Inference: Detected by the presence of rows: or values: — no explicit type: needed.
Use cases:
- Examples and documentation that need to be self-contained
- Small lookup/reference tables (categories, labels, thresholds)
- Prototyping charts before connecting a real data source
- Test fixtures and demo dashboards
3. HTTP / API Queries¶
Fetch data from any JSON API. This is useful for integrating external data sources (weather, stock prices) or calling your own services (ML models, Jupyter notebooks exposed as endpoints).
{% raw %}
queries: prediction: type: http url: "https://api.model-service.com/predict" method: POST headers: Authorization: "Bearer {{ env.MODEL_API_KEY }}" body: features: region: "{{ region }}" date: "{{ date_range.start }}"
Type Inference: Detected by the presence of url: or explicit type: http.
Use cases: - Calling ML inference endpoints - Fetching data from 3rd party APIs - Triggering server-side workflows (if read-only)
4. Semantic Layer Queries (MetricFlow) - Optional¶
Use dbt MetricFlow metrics and dimensions (requires dbt Semantic Layer setup):
queries: semantic_sales: metrics: [total_revenue, order_count] # Semantic layer metrics dimensions: [month, region] # Semantic layer dimensions filters: region: region order_date: "{{ date_range }}"
Use cases: - Teams already using dbt's semantic layer - Consistent metric definitions across dashboards - Automatic metric calculation logic - Leveraging existing MetricFlow setup
Note: - Requires dbt Semantic Layer (MetricFlow) setup - All metric names must exist in your dbt Semantic Layer - See MetricFlow Guide for details on how it works
5. dbt Model Queries - Optional¶
Query dbt models directly (requires dbt, but no MetricFlow):
queries: model_sales: model: "ref('fct_orders')" # Reference dbt model columns: [month, region, revenue, order_count] filters: region: region created_at: "{{ date_range }}"
Use cases: - Teams using dbt models but not MetricFlow semantic layer - Direct column access without metric definitions - Simpler queries that don't need semantic layer abstraction
Note: Requires dbt setup, but not MetricFlow
Field Reference¶
Values Queries¶
queries: <query_name>: # Option 1: Dict rows rows: # List of {key: value} dicts - { col1: val1, col2: val2 } # Option 2: Columns + values (compact) columns: [string] # Column names values: # List of value arrays - [val1, val2] limit: number # Optional: max rows
HTTP / API Queries¶
queries: <query_name>: type: http # Required url: string # Required: URL (supports Jinja) method: GET | POST # Optional: Default GET headers: Record<string, string> # Optional: HTTP headers params: Record<string, string> # Optional: Query parameters body: any # Optional: JSON body (for POST)
Semantic Layer Queries¶
queries: <query_name>: # Unique identifier metrics: [string] # Required dimensions: [string] # Optional: for grouping filters: # Optional <field>: <expression> time_grain: day | week | month | quarter | year limit: number # Optional: max rows
dbt Model Queries¶
queries: <query_name>: model: string # Required: e.g., "ref('orders')" columns: [string] # Required filters: # Optional <field>: <expression> limit: number # Optional: max rows
Raw SQL Queries¶
String shorthand (recommended when all queries share one source):
source: my_postgres # face-level default queries: <query_name>: string # bare SQL — inherits face source
Dict form (use when overriding source, setting target, or adding description):
queries: <query_name>: sql: string # Required: SQL query (supports Jinja) source: string # Required if no face-level source target: string # Optional: dbt target name (defaults to 'dev') description: string # Optional: metadata for AI and UI
Example:
source: my_postgres queries: sales: | SELECT * FROM orders WHERE status = 'completed' LIMIT 100
Database Connection via Source Configuration:
Dataface uses source configuration for SQL database connections. The source field must match a profile name defined in your dbt profiles.yml file (typically located at ~/.dbt/profiles.yml or in your project directory).
Source Configuration:
- Create or edit
~/.dbt/profiles.ymlwith your database connection:
{% raw %}
my_postgres: outputs: dev: type: postgres host: localhost port: 5432 user: myuser password: mypass dbname: mydb schema: public prod: type: postgres host: prod.example.com port: 5432 user: prod_user password: "{{ env_var('DB_PASSWORD') }}" dbname: analytics schema: public target: dev # Default target
- Reference the source in your queries:
queries: sales: sql: SELECT * FROM orders source: my_postgres # Uses 'my_postgres' source target: prod # Uses 'prod' target (defaults to 'dev' if not specified)
For more information: - See dbt's profile documentation for complete profile configuration options - See dbt adapter documentation for database-specific connection parameters (Postgres, Snowflake, BigQuery, Redshift, etc.)
Metrics (Semantic Layer Only)¶
Metrics are the measures you want to calculate. They come from your dbt Semantic Layer:
queries: revenue: metrics: [total_revenue] # Single metric multiple: metrics: [total_revenue, order_count, avg_order_value] # Multiple metrics
Note: Metrics only work with semantic layer queries. All metric names must exist in your dbt Semantic Layer (MetricFlow).
Dimensions¶
Dimensions are the categories you want to group by:
queries: sales_by_region: metrics: [total_revenue] dimensions: [region] # Group by region sales_by_time_and_region: metrics: [total_revenue] dimensions: [month, region] # Group by month AND region
Filters¶
Filters limit which data is included. You can reference variables or use literal values:
queries: filtered_sales: metrics: [total_revenue] dimensions: [month] filters: # Reference a variable directly region: region # Use a variable with Jinja expression order_date: "{{ date_range }}" # Literal value status: "completed" # Complex expression revenue: "{{ min_revenue }}"
See the Expressions guide for more on using variables in filters.
Time Grain¶
Time grain automatically groups time-based dimensions:
queries: monthly: metrics: [total_revenue] dimensions: [order_date] time_grain: month # Groups by month automatically
Available time grains: day, week, month, quarter, year
Pivot (Cross-Tab Tables)¶
Long-form SQL results can be pivoted into wide-form at render time by
adding a pivot: block to the query. Pivot is a rendering hint consumed
only by the table renderer — it does not change the SQL or the rows
returned by the database.
queries: sales_long: sql: | SELECT region, quarter, SUM(revenue) AS revenue FROM orders GROUP BY region, quarter source: warehouse pivot: column: quarter # Distinct values become column headers value: revenue # Measure that fills each cell charts: sales_crosstab: type: table query: queries.sales_long
The table will have one row per region and one column per distinct
quarter, with revenue filling the cells. Every SELECT column that is
neither column nor value is treated as a row dimension automatically.
Chart consumers other than table receive the original long-form rows
unchanged — pivot: does not affect bar, line, or other chart types.
Query Examples¶
Simple Revenue Query¶
queries: revenue: metrics: [total_revenue] dimensions: [month]
Filtered by Variable¶
queries: regional_sales: metrics: [total_revenue, order_count] dimensions: [month, region] filters: region: region # Uses 'region' variable order_date: "{{ date_range }}"
With Time Grain¶
queries: quarterly: metrics: [total_revenue] dimensions: [order_date] time_grain: quarter filters: region: region
Multiple Metrics and Dimensions¶
queries: comprehensive: metrics: [total_revenue, order_count, avg_order_value] dimensions: [month, region, product_category] filters: order_date: "{{ date_range }}" region: region
Best Practices¶
Reuse Queries Across Charts¶
Multiple charts can reference the same query, which is more efficient:
queries: sales: metrics: [total_revenue] dimensions: [month, region] rows: - cols: - chart1: query: queries.sales # Same query type: bar x: month y: total_revenue - chart2: query: queries.sales # Same query type: line x: month y: total_revenue color: region
Use Query References for DRY SQL¶
Instead of repeating SQL, reference other queries:
{% raw %}
# ❌ Bad: Repetitive SQL queries: north_sales: sql: | SELECT DATE_TRUNC('month', created_at) as month, SUM(revenue) as total_revenue FROM orders WHERE region = 'North' GROUP BY 1 source: my_postgres south_sales: sql: | SELECT DATE_TRUNC('month', created_at) as month, # Repeated SUM(revenue) as total_revenue # Repeated FROM orders WHERE region = 'South' # Only difference GROUP BY 1 source: my_postgres # ✅ Good: DRY with query references queries: base_sales: sql: | SELECT DATE_TRUNC('month', created_at) as month, SUM(revenue) as total_revenue FROM orders GROUP BY 1 source: my_postgres north_sales: sql: | SELECT * FROM {{ queries.base_sales }} WHERE region = 'North' source: my_postgres south_sales: sql: | SELECT * FROM {{ queries.base_sales }} WHERE region = 'South' source: my_postgres
Use Semantic Layer When Possible¶
Semantic layer queries provide: - Consistent metric definitions - Automatic calculation logic - Better maintainability - Reusability across dashboards
Performance Considerations¶
- Use
limitwhen you don't need all rows - Filter early in queries (use
filtersfield) - Reuse queries across multiple charts
- Use
time_grainfor automatic time grouping
Naming Conventions¶
- Use descriptive names (e.g.,
sales,products) and reference with explicit namespacing (queries.sales) - Use descriptive names that indicate what data the query fetches
- Keep names lowercase with underscores
Testing and Validation¶
Validate Your Dashboard¶
After writing queries, validate your dashboard to catch errors early:
See the CLI Reference for validation options.
Render to Inspect Resolved Queries¶
Render your dashboard to JSON to see the resolved query structure with executed results:
See the CLI Reference for rendering options.
Query References¶
You can reference other queries within your SQL using Jinja template syntax. This enables CTE-style query composition where complex queries can be built from simpler building blocks.
{% raw %}
queries: # Base query - sales by month sales_by_month: sql: | SELECT DATE_TRUNC('month', order_date) as month, SUM(amount) as sales FROM orders WHERE status = 'completed' GROUP BY 1 source: my_postgres # Reference the base query current_month_sales: sql: | SELECT sales as current_sales FROM {{ queries.sales_by_month }} ORDER BY month DESC LIMIT 1 source: my_postgres
How It Works¶
When you use {% raw %}{{ queries.query_name }}{% endraw %}, Dataface:
1. Wraps the referenced query's SQL in parentheses
2. Substitutes it inline as a subquery
3. Detects circular dependencies
The example above becomes:
SELECT sales as current_sales
FROM (
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(amount) as sales
FROM orders
WHERE status = 'completed'
GROUP BY 1
)
ORDER BY month DESC
LIMIT 1
Chained References¶
You can chain multiple query references:
{% raw %}
queries: raw_orders: sql: | SELECT * FROM orders WHERE status = 'completed' source: my_postgres monthly_sales: sql: | SELECT DATE_TRUNC('month', order_date) as month, SUM(amount) as sales FROM {{ queries.raw_orders }} GROUP BY 1 source: my_postgres current_month: sql: | SELECT sales FROM {{ queries.monthly_sales }} ORDER BY month DESC LIMIT 1 source: my_postgres
Subquery Aliases¶
Use SQL aliases when referencing queries as subqueries:
{% raw %}
queries: base_sales: sql: | SELECT product_id, amount FROM orders source: my_postgres top_products: sql: | SELECT product_id, SUM(amount) as total FROM {{ queries.base_sales }} AS sales GROUP BY product_id HAVING SUM(amount) > 1000 source: my_postgres
Limitations & Notes¶
Query references are powerful - they let you build complex analyses by composing simpler queries. Just two simple limitations:
1. Top-Level Only
All query references (`{% raw %}{{ queries.* }}{% endraw %}`) MUST reference top-level queries in the current dashboard's queries: section:
{% raw %}
queries: base_query: # ✅ Top-level - can be referenced sql: SELECT * FROM orders source: my_postgres # Import from other files and add to top level: shared_sales: _shared_queries.queries.base_sales # ✅ Top-level - can be referenced my_analysis: # ✅ Top-level - can be referenced sql: | SELECT * FROM {{ queries.shared_sales }} WHERE amount > 100 UNION ALL SELECT * FROM {{ queries.base_query }} source: my_postgres rows: - queries: # ❌ NESTED - cannot be referenced anywhere! nested_query: sql: SELECT * FROM products source: my_postgres
Important: Even if you import a query from another file that has internal references (like {% raw %}{{ queries.raw_data }}{% endraw %}), you must import ALL referenced queries to the top level of your current dashboard for them to work.
2. SQL Only
Only SQL queries can reference other queries. Values, HTTP, CSV, and other query types cannot use {% raw %}{{ queries.* }}{% endraw %}.
Importing Queries from Other Files¶
You can reuse queries across multiple dashboards by referencing queries from other files directly in your queries: section.
Import Syntax¶
Use the format: file_name.queries.query_name
{% raw %}
queries: # Import queries from other files base_sales: _shared_queries.queries.base_sales customer_data: _shared_queries.queries.customer_data # Use imported queries like any other query my_analysis: sql: | SELECT * FROM {{ queries.base_sales }} WHERE revenue > 1000 source: my_postgres charts: sales_chart: query: my_analysis type: bar x: month y: revenue
Import Examples¶
Import from same directory:
queries: # _shared_queries.yml is in the same directory (underscore prefix means not rendered) orders: _shared_queries.queries.base_orders customers: _shared_queries.queries.base_customers
Import from subdirectory:
queries: # analytics/metrics.yml revenue: analytics/metrics.queries.total_revenue # ml/predictions.yml top_customers: ml/predictions.queries.predict_top_customers
Use imported queries:
{% raw %}
queries: base_sales: _shared.queries.sales filtered_sales: sql: | SELECT * FROM {{ queries.base_sales }} WHERE region = '{{ filter("region", "US") }}' source: my_postgres summary: sql: | SELECT COUNT(*) as total_orders, SUM(amount) as total_revenue FROM {{ queries.filtered_sales }} source: my_postgres charts: summary_chart: query: summary type: number value: total_revenue
Best Practices¶
- Use imports for shared queries: Create
_shared_queries.ymlfiles (underscore prefix means not rendered as dashboards) for queries used across multiple dashboards - Don't overdo it: If you find yourself creating many shared query files, consider whether you need better data modeling in dbt or a proper data mart instead. Query references are for composition, not data transformation pipelines.
- Use for composition: Build complex queries from simple, reusable building blocks
- Consider CTEs: For simple cases or one-off queries, standard SQL CTEs might be clearer
- Add aliases: Always alias your subqueries for clarity (
FROM {% raw %}{{ queries.base }}{% endraw %} AS base) - Test separately: Ensure referenced queries work independently before chaining them
- Keep it simple: Avoid deep chains (A → B → C → D). Two levels is usually enough
Security: Parameterized Queries¶
Dataface uses parameterized queries to prevent SQL injection attacks. When you use variables in your SQL queries, they are automatically passed as parameters to the database driver rather than being interpolated directly into the SQL string.
How It Works¶
When you write:
{% raw %}
queries: sales: sql: | SELECT * FROM orders WHERE region = '{{ region }}' AND {{ filter('status', status) }} source: my_postgres
Dataface converts this to a parameterized query:
-- SQL sent to database:
SELECT * FROM orders
WHERE region = $1
AND status = $2
-- Parameters sent separately:
params = ['North', 'active']
This separation ensures that:
- Malicious input cannot alter query logic - Even if a user enters '; DROP TABLE orders; -- as input, it's treated as a literal string value, not SQL code
- Database can cache query plans - The same parameterized SQL structure enables query plan reuse
- No escaping needed - The database driver handles type conversion and escaping automatically
Filter Helper Behavior¶
The filter() and filter_date_range() helpers return 1=1 (always true) when the variable is None, undefined, or empty. This means:
- Filter applied:
{{ filter('region', 'North') }}→region = $1(with 'North' as parameter) - Filter skipped:
{{ filter('region', None) }}→1=1(all rows match, no filtering)
This pattern allows optional filtering where unset variables don't restrict results.
Deny-on-null fallback¶
For mandatory-scope dashboards (audit views, customer-specific dashboards, anything where the unfiltered superset is the wrong default), pass none='deny' to flip the fallback from 1=1 to 1=0 (zero rows):
{{ filter('region', None, none='deny') }}→1=0(no rows when unset){{ filter('region', 'North', none='deny') }}→region = $1(normal filter when set)
none is keyword-only; valid values are 'allow' (default) and 'deny'. Anything else raises ValueError.
Security Validations¶
The parameterized filter helpers include additional security measures:
- Operator validation: Only valid SQL operators (=, !=, >, <, >=, <=, LIKE, IN, etc.) are allowed. Invalid operators raise an error.
- Column name validation: Column names must contain only letters, numbers, underscores, and optionally one dot (for
table.columnformat).
Best Practices¶
- Use
filter()for user-controlled values - Always use the filter helpers for variables that come from user input - Don't construct SQL from user strings - Avoid patterns like
WHERE {{ user_column }} = ...where column names come from user input - Validate at the variable level - Use variable validation (data types, options) to restrict allowed values
Related¶
- Variables - Using variables in filters
- Expressions - Variable references and Jinja expressions
- Charts - Visualizing query data
- CLI Reference - Validate, render, serve, and more
- Field Reference - Complete field reference
- Troubleshooting Guide - Common query issues