dbt 0.13 添加了一个新的功能sources
我呢可以用来做以下事情
- 从基础模型的源表中进行数据选择
- 测试对于源数据的假设
- 计算源数据的
freshness
source 操作
- 定义source 模版格式
注意对于pg 等类型的,如果包含了schema 的可能需要配置额外参数,或者通过schema 约定
# This example defines a source called `source_1` containing one table
# called `table_1`. This is a minimal example of a source definition.
version: 2
sources:
- name: source_1
tables:
- name: table_1
- name: table_2
- name: source_2
tables:
- name: table_1
- schema 配置数据源格式
# This source entry describes the table:
# "raw"."public"."Orders_"
#
# It can be referenced with:
# {{ source('ecommerce', 'orders') }}
version: 2
sources:
- name: ecommerce
database: raw # Tell dbt to look for the source in the "raw" database
schema: public # You wouldn't put your source data in public, would you?
tables:
- name: orders
identifier: Orders_ # To alias table names to account for strange casing or naming of tables
一个简单例子
我配置的source 直接在model 文件夹中 可以参考https://github.com/rongfengliang/dbt-source-demo,关于表数据结构
也可以参考此项目
- 环境准备(使用python venv 管理)
python3 -m venv venv
source venv/bin/activate
pip install dbt
- 测试数据库准备(使用docker-compose)
version: '3.6'
services:
postgres:
image: postgres:9.6.11
ports:
- "5432:5432"
environment:
- "POSTGRES_PASSWORD:dalong"
graphql-engine:
image: hasura/graphql-engine:v1.0.0-beta.2
ports:
- "8080:8080"
depends_on:
- "postgres"
environment:
- "HASURA_GRAPHQL_DATABASE_URL=postgres://postgres:dalong@postgres:5432/postgres"
- "HASURA_GRAPHQL_ENABLE_CONSOLE=true"
- "HASURA_GRAPHQL_ENABLE_ALLOWLIST=true"
- model source 配置
models
├── apps
│ ├── app_summary.sql
│ └── sources.yml
└── users
├── sources.yml
├── user_summary.sql
└── user_summary2.sql
- source 内容
内容很简单,就是配置table
version: 2
sources:
- name: apps
schema: public
tables:
- name: apps
- 运行效果
dbt run
效果
Running with dbt=0.13.1
Found 3 models, 0 tests, 0 archives, 0 analyses, 94 macros, 0 operations, 0 seed files, 2 sources
17:43:42 | Concurrency: 3 threads (target='dev')
17:43:42 |
17:43:42 | 1 of 3 START view model public.app_summary........................... [RUN]
17:43:42 | 2 of 3 START view model public.user_summary.......................... [RUN]
17:43:42 | 3 of 3 START table model public.user_summary2........................ [RUN]
17:43:44 | 2 of 3 OK created view model public.user_summary..................... [CREATE VIEW in 0.26s]
17:43:45 | 1 of 3 OK created view model public.app_summary...................... [CREATE VIEW in 0.27s]
17:43:46 | 3 of 3 OK created table model public.user_summary2................... [SELECT 2 in 0.27s]
17:43:46 |
17:43:46 | Finished running 2 view models, 1 table models in 4.46s.
Completed successfully
Done. PASS=3 ERROR=0 SKIP=0 TOTAL=3