Create the Hive table you want to write data in. In this scenario, this
table is named as agg_result, and you can
create it using the following statement in tHiveRow:
create table agg_result (id int, name string, address string, sum1 string, postal string, state string, capital string, mostpopulouscity string) partitioned by (type string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/agg_result'
In this statement,
'/user/ychen/hive/table/agg_result' is the directory used in
this scenario to store this created table in HDFS. You need to replace it
with the directory you want to use in your environment.
For further information about tHiveRow,
see tHiveRow.
Create two input Hive tables containing the columns you want to join and
aggregate these columns into the output Hive table, agg_result. The statements to be used are:
create table customer (id int, name string, address string, idState int, id2 int, regTime string, registerTime string, sum1 string, sum2 string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/customer'
and
create table state_city (id int, postal string, state string, capital int, mostpopulouscity string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/state_city'
Use tHiveRow to load data into the two
input tables, customer and state_city. The statements to be used are:
"LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO TABLE customer"
and
"LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv' OVERWRITE INTO TABLE state_city"
The two files, customer.csv and
State_City.csv, are two local files
we created for this scenario. You need to create your own files to provide
data to the input Hive tables. The data schema of each file should be
identical with their corresponding table.
You can use tRowGenerator and tFileOutputDelimited to create these two files
easily. For further information about these two components, see tRowGenerator and tFileOutputDelimited.