Remay
Big Data
HDP 2.2 – How to create a surrogate key on hive
2 fevereiro, 2016
0

We can create a surrogate key in hive using the same BI workload with window aggregate functions like the example below:

--Sets for update the engine and vectorized processing
set hive.execution.engine=tez;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled=true;

–Target table to create the surrogate key on field 1
drop table tbl1;
create table tbl1
(
f1 int,
f2 string
)
clustered by (f2) into 1 buckets
stored as orc tblproperties (“transactional”=”true”);

insert into table tbl1 values (1, ‘line1’);
insert into table tbl1 values (2, ‘line2’);

–Source table with the new data to load into tbl1
drop table tbl2;
create table tbl2
(
f1 int,
f2 string
)
clustered by (f2) into 1 buckets
stored as orc tblproperties (“transactional”=”true”);

insert into table tbl2 values (300, ‘line3’);
insert into table tbl2 values (400, ‘line4’);
insert into table tbl2 values (500, ‘line5’);

–Insert with the window aggregate function to create the surrogate key
INSERT INTO table tbl1
SELECT
ROW_NUMBER() OVER(ORDER BY tbl2.f1) + t2.f1_max
,tbl2.f2
from
tbl2
CROSS JOIN
(SELECT COALESCE(MAX(f1),0) f1_max FROM tbl1) T2;

 

How to work with ACID tables in hive

<div class="addthis_toolbox addth...

Read more

HDP 2.2 Sandbox – How To Fix the error JA002: Unauthorized connection for super-user: oozie from IP 127.0.0.1

<div class="addthis_toolbox addth...

Read more

Existem 0 comentários

twitter

REDES SOCIAIS