Hiveテーブルの準備

手順

データを書き込むHiveテーブルを作成します。このシナリオでは、このテーブルにagg_resultという名前が付いており、tHiveRowで次のステートメントを使ってこのテーブルを作成できます: create table agg_result (id int, name string, address string, sum1 string, postal string, state string, capital string, mostpopulouscity string) partitioned by (type string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/agg_result'
このシナリオのステートメントでは、 '/user/ychen/hive/table/agg_result'というディレクトリーが、作成したテーブルをHDFSに保存するために使用されます。実際の環境で使用する際には、ディレクトリーを変更する必要があります。

tHiveRowの詳細は、tHiveRowをご覧ください。
結合するカラムを含んだ2つの入力Hiveテーブルを作成し、これらのカラムを出力Hiveテーブル、agg_resultに集約します。使用すステートメントは、create table customer (id int, name string, address string, idState int, id2 int, regTime string, registerTime string, sum1 string, sum2 string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/customer'とcreate table state_city (id int, postal string, state string, capital int, mostpopulouscity string) row format delimited fields terminated by ';' location '/user/ychen/hive/table/state_city'です。
tHiveRowを使って、2つの入力テーブル、customerとstate_cityにデータをロードします。使用するステートメントは、"LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO TABLE customer"と"LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv' OVERWRITE INTO TABLE state_city"です。
2つのファイル、customer.csvとState_City.csvは、このシナリオ用に作成した2つのローカルファイルです。ユーザーは、入力Hiveテーブルにデータを格納するためのファイルを作成する必要があります。各ファイルのデータスキーマは、対応するテーブルと同一にする必要があります。

tRowGeneratorとtFileOutputDelimitedを使って、これらの2つのファイルを容易に作成できます。これら2つのコンポーネントの詳細は、tRowGeneratorおよびtFileOutputDelimitedをご覧ください。

Hiveのクエリー言語の詳細は、https://cwiki.apache.org/confluence/display/Hive/LanguageManual (英語のみ)をご覧ください。

このページは役に立ちましたか?

このページまたはコンテンツにタイポ、ステップの省略、技術的エラーなどの問題が見つかった場合はお知らせください。

こちらにフィードバックをお寄せください