Skip to main content Skip to complementary content

Preparing the Hive tables

Procedure

  1. Create the Hive table you want to write data in. In this scenario, this table is named as agg_result, and you can create it using the following statement in tHiveRow: create table agg_result (id int, name string, address string, sum1 string, postal string, state string, capital string, mostpopulouscity string) partitioned by (type string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/agg_result'
    In this statement, '/user/ychen/hive/table/agg_result' is the directory used in this scenario to store this created table in HDFS. You need to replace it with the directory you want to use in your environment.
    For further information about tHiveRow, see tHiveRow.
  2. Create two input Hive tables containing the columns you want to join and aggregate these columns into the output Hive table, agg_result. The statements to be used are: create table customer (id int, name string, address string, idState int, id2 int, regTime string, registerTime string, sum1 string, sum2 string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/customer' and create table state_city (id int, postal string, state string, capital int, mostpopulouscity string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/state_city'
  3. Use tHiveRow to load data into the two input tables, customer and state_city. The statements to be used are: "LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO TABLE customer" and "LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv' OVERWRITE INTO TABLE state_city"
    The two files, customer.csv and State_City.csv, are two local files we created for this scenario. You need to create your own files to provide data to the input Hive tables. The data schema of each file should be identical with their corresponding table.
    You can use tRowGenerator and tFileOutputDelimited to create these two files easily. For further information about these two components, see tRowGenerator and tFileOutputDelimited.

    For further information, see the Hive query language manual.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!