dbt 0.13.0 新添加特性sources 试用

dbt 0.13 添加了一个新的功能sources 我呢可以用来做以下事情

  • 从基础模型的源表中进行数据选择
  • 测试对于源数据的假设
  • 计算源数据的freshness

source 操作

  • 定义source 模版格式

    注意对于pg 等类型的,如果包含了schema 的可能需要配置额外参数,或者通过schema 约定

# This example defines a source called `source_1` containing one table
# called `table_1`. This is a minimal example of a source definition.
version: 2
sources:
  - name: source_1
    tables:
      - name: table_1
      - name: table_2
  - name: source_2
    tables:
      - name: table_1
 
 
  • schema 配置数据源格式
# This source entry describes the table:
# "raw"."public"."Orders_"
#
# It can be referenced with:
# {{ source('ecommerce', 'orders') }}
version: 2
sources:
  - name: ecommerce
    database: raw # Tell dbt to look for the source in the "raw" database
    schema: public # You wouldn't put your source data in public, would you?
    tables:
      - name: orders
        identifier: Orders_ # To alias table names to account for strange casing or naming of tables
 
 

一个简单例子

我配置的source 直接在model 文件夹中 可以参考https://github.com/rongfengliang/dbt-source-demo,关于表数据结构
也可以参考此项目

  • 环境准备(使用python venv 管理)
python3 -m venv venv 
source venv/bin/activate
pip install dbt
  • 测试数据库准备(使用docker-compose)
version: '3.6'
services:
  postgres:
    image: postgres:9.6.11
    ports: 
    - "5432:5432"
    environment:
    - "POSTGRES_PASSWORD:dalong"
  graphql-engine:
    image: hasura/graphql-engine:v1.0.0-beta.2
    ports:
    - "8080:8080"
    depends_on:
    - "postgres"
    environment:
    - "HASURA_GRAPHQL_DATABASE_URL=postgres://postgres:dalong@postgres:5432/postgres"
    - "HASURA_GRAPHQL_ENABLE_CONSOLE=true"
    - "HASURA_GRAPHQL_ENABLE_ALLOWLIST=true"
  • model source 配置
models
├── apps
│ ├── app_summary.sql
│ └── sources.yml
└── users
    ├── sources.yml
    ├── user_summary.sql
    └── user_summary2.sql
  • source 内容

    内容很简单,就是配置table

version: 2
sources:
  - name: apps
    schema: public
    tables:
      - name: apps
  • 运行效果
dbt run

效果

Running with dbt=0.13.1
Found 3 models, 0 tests, 0 archives, 0 analyses, 94 macros, 0 operations, 0 seed files, 2 sources
17:43:42 | Concurrency: 3 threads (target='dev')
17:43:42 | 
17:43:42 | 1 of 3 START view model public.app_summary........................... [RUN]
17:43:42 | 2 of 3 START view model public.user_summary.......................... [RUN]
17:43:42 | 3 of 3 START table model public.user_summary2........................ [RUN]
17:43:44 | 2 of 3 OK created view model public.user_summary..................... [CREATE VIEW in 0.26s]
17:43:45 | 1 of 3 OK created view model public.app_summary...................... [CREATE VIEW in 0.27s]
17:43:46 | 3 of 3 OK created table model public.user_summary2................... [SELECT 2 in 0.27s]
17:43:46 | 
17:43:46 | Finished running 2 view models, 1 table models in 4.46s.
Completed successfully
Done. PASS=3 ERROR=0 SKIP=0 TOTAL=3

参考资料

https://github.com/rongfengliang/dbt-source-demo

上一篇:spring实战三装配bean之Bean的作用域以及初始化和销毁Bean


下一篇:zookeeper、kafka、storm install