博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
拉链表-增量更新方法一
阅读量:6814 次
发布时间:2019-06-26

本文共 3890 字,大约阅读时间需要 12 分钟。

参考文档:http://lxw1234.com/archives/2015/08/473.htm

一、元表结构

1、定义业务库原始订单表:

drop table chavin.orders;

CREATE TABLE orders (

orderid INT,

createtime STRING,

modifiedtime STRING,

status STRING

)row format delimited fields terminated by '\t'

stored AS textfile;

--加载测试数据

1 2015-08-18 2015-08-18 创建

2 2015-08-18 2015-08-18 创建

3 2015-08-19 2015-08-21 支付

4 2015-08-19 2015-08-21 完成

5 2015-08-19 2015-08-20 支付

6 2015-08-20 2015-08-20 创建

7 2015-08-20 2015-08-21 支付

8 2015-08-21 2015-08-21 创建

2、定义ODS订单表结构,采用日分区存储:

drop table t_ods_orders_inc;

CREATE TABLE t_ods_orders_inc (

orderid INT,

createtime STRING,

modifiedtime STRING,

status STRING

) PARTITIONED BY (day STRING)

row format delimited fields terminated by '\t'

stored AS textfile;

3、创建dw层历史订单表:

drop table t_dw_orders_his;

CREATE TABLE t_dw_orders_his (

orderid INT,

createtime STRING,

modifiedtime STRING,

status STRING,

dw_start_date STRING,

dw_end_date STRING

)row format delimited fields terminated by '\t'

stored AS textfile;

二、初始化dw层历史订单表:

1、将源库订单表历史数据插入到ods订单表中:

INSERT overwrite TABLE t_ods_orders_inc PARTITION (day = '2015-08-20')

SELECT orderid,createtime,modifiedtime,status

FROM chavin.orders

WHERE cast(createtime as date) <= '2015-08-20';

2、通过ods层订单表数据初始化dw层历史订单表:

INSERT overwrite TABLE t_dw_orders_his

SELECT orderid,createtime,modifiedtime,status,

createtime AS dw_start_date,

'9999-12-31' AS dw_end_date

FROM t_ods_orders_inc

WHERE day = '2015-08-20';

三、增量添加数据

1、将原始订单表增量数据插入到ods层订单表前一天分区中:

INSERT overwrite TABLE t_ods_orders_inc PARTITION (day = '2015-08-21')

SELECT orderid,createtime,modifiedtime,status

FROM orders

WHERE createtime = '2015-08-21' OR modifiedtime = '2015-08-21';

2、通过dw历史数据和ods增量数据刷新dw历史数据,此处采用临时表方法:

DROP TABLE IF EXISTS t_dw_orders_his_tmp;

CREATE TABLE t_dw_orders_his_tmp AS

SELECT orderid,

createtime,

modifiedtime,

status,

dw_start_date,

dw_end_date

FROM (

    SELECT a.orderid,

    a.createtime,

    a.modifiedtime,

    a.status,

    a.dw_start_date,

    CASE WHEN b.orderid IS NOT NULL AND a.dw_end_date > '2015-08-21' THEN '2015-08-20' ELSE a.dw_end_date END AS dw_end_date

    FROM t_dw_orders_his a

    left outer join (SELECT * FROM t_ods_orders_inc WHERE day = '2015-08-21') b

    ON (a.orderid = b.orderid)

    UNION ALL

    SELECT orderid,

    createtime,

    modifiedtime,

    status,

    modifiedtime AS dw_start_date,

    '9999-12-31' AS dw_end_date

    FROM t_ods_orders_inc

    WHERE day = '2015-08-21'

) x

ORDER BY orderid,dw_start_date;

3、根据历史表更新dw层历史订单表:

INSERT overwrite TABLE t_dw_orders_his

SELECT * FROM t_dw_orders_his_tmp;

4、根据上面步骤增加22号数据:

--加载增量数据到ods层订单表分区'2015-08-22'中:

1 2015-08-18 2015-08-22 支付

2 2015-08-18 2015-08-22 完成

6 2015-08-20 2015-08-22 支付

9 2015-08-22 2015-08-22 创建

8 2015-08-22 2015-08-22 支付

10 2015-08-22 2015-08-22 支付

alter table t_ods_orders_inc add partition(day='2015-08-22');

load data local inpath '/opt/datas/orders22.txt' into table chavin.t_ods_orders_inc partition(day='2015-08-22');

--根据历史订单数据和增量数据更新历史订单表数据,此处采用临时表:

DROP TABLE IF EXISTS t_dw_orders_his_tmp;

CREATE TABLE t_dw_orders_his_tmp AS

SELECT orderid,

createtime,

modifiedtime,

status,

dw_start_date,

dw_end_date

FROM (

    SELECT a.orderid,

    a.createtime,

    a.modifiedtime,

    a.status,

    a.dw_start_date,

    CASE WHEN b.orderid IS NOT NULL AND a.dw_end_date > '2015-08-22' THEN '2015-08-21' ELSE a.dw_end_date END AS dw_end_date

    FROM t_dw_orders_his a

    left outer join (SELECT * FROM t_ods_orders_inc WHERE day = '2015-08-22') b

    ON (a.orderid = b.orderid)

    UNION ALL

    SELECT orderid,

    createtime,

    modifiedtime,

    status,

    modifiedtime AS dw_start_date,

    '9999-12-31' AS dw_end_date

    FROM t_ods_orders_inc

    WHERE day = '2015-08-22'

) x

ORDER BY orderid,dw_start_date;

--根据临时表更新历史订单表:

INSERT overwrite TABLE t_dw_orders_his

SELECT * FROM t_dw_orders_his_tmp;

5、查看2015-08-21、2015-08-21历史快照:

select * from t_dw_orders_his where dw_start_date <= '2015-08-21' and dw_end_date >= '2015-08-21';

select * from t_dw_orders_his where dw_start_date <= '2015-08-22' and dw_end_date >= '2015-08-22';

转载地址:http://sydzl.baihongyu.com/

你可能感兴趣的文章
我的NodeJS一年之旅总结
查看>>
MyBatis-3.4.2-源码分析6:解析XML之objectWrapperFactoryElement & reflectorFactoryElement
查看>>
javascript与获取鼠标位置有关的属性
查看>>
Oracle database 11.2.0.3.0 升级至 11.2.0.3.14
查看>>
heartbeat理论介绍
查看>>
简单实现MVC模式
查看>>
mysql连接小错误一例
查看>>
奇怪的“考生”:中美高考,我都考一考!
查看>>
winform datagridview 使用论坛。
查看>>
Cocos Studio study ---------- 使用CocosStudio1.6制作 界面,并结合代码制作游戏
查看>>
关于LittleSis网站数据API的简单整理
查看>>
虚函数的实现
查看>>
【原】Oracle 数据库实例启动过程
查看>>
上传文件和导出的测试用例设计
查看>>
程序员为何如此累
查看>>
ajax(异步页面动态刷新)
查看>>
关于JQuery的选择器
查看>>
Java和C++的区别
查看>>
git工作区和暂存区
查看>>
C函数调用与栈
查看>>