Tuesday, January 9, 2024

Hive Partitioning and Bucketing:

 

Hive Partitioning and Bucketing:

hive > create table non_partition_table (bankid string, bankrepname string, banktover string, country string, year int) row format delimited fields terminated by '|' lines terminated by '\n' stored as textfile; load data local inpath 'SBA*' into table non_partition_table; hive > select * from non_partition_table where bankid like '%TZ%'; //It will go to each of 7 files in HDFS and searh for TZ. //Its a Sequential reading process . STATIC Partitioning: create table static_partition_table (bankid string, bankrepname string, banktover string) PARTITIONED BY (country string , year int) row format delimited fields terminated by '|' lines terminated by '\n' stored as textfile; hive> load data local inpath 'SBA_NM_Data.log' into table static_partition_table ; /*failed as its a partitioned table*/ FAILED: SemanticException [Error 10001]: Line 1:52 Table not found 'parttab' hive> use batch122; hive> set hive.cli.print.current.db=true; hive > set hive.exec.dynamic.partition.mode=nonstrict ; hive> load data local inpath 'SBA_NM_Data.log' into table static_partition_table PARTITION(country='NIMEBIA',year=2011); hive > load data local inpath 'SBA_UG_Data.log' into table static_partition_table PARTITION(country='UGANDA',year=2011); hive >load data local inpath 'SBA_NG_Data.log' into table static_partition_table PARTITION(country='NIGERIA',year=2013); STATIC partition drawbacks: 1.Less data we can load , if thousands file will be there , it will be diffiult. 2. For NIGERIA One of the data is 2014 , we have by mistakely load to 2013. 3. For TANZANIA 2017 and 2012 , we have to load separately. Dynamic Partitioning: ******************** In DYNAMIC PARTITIONING we will create a non_partition_table/parent table , then one child table. From parent table , we will insert records to child table. INSERT OVERWRITE TABLE <<childtable>> PARTITION(country,year) SELECT * FROM <<parenttable>>; hive > create table non_partition_table (bankid string, bankrepname string, banktover string, country string, year int) row format delimited fields terminated by '|' lines terminated by '\n' stored as textfile; hive > load data local inpath 'SBA*' into table non_partition_table; hive > set hive.exec.dynamic.partition.mode=nonstrict ; hive > create table dynamic_partition_table (bankid string, bankrepname string, banktover string) PARTITIONED BY (country string, year int) row format delimited fields terminated by '|' lines terminated by '\n' stored as textfile; hive > INSERT OVERWRITE TABLE dynamic_partition_table PARTITION (country,year) select * from non_partition_table;

No comments:

Post a Comment