tHCatalogOperation Standard properties
These properties are used to configure tHCatalogOperation running in the Standard Job framework.
The Standard tHCatalogOperation component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data Fabric.
Basic settings
Property type |
Either Built-in or Repository Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. |
Distribution |
Select the cluster you are using from the drop-down list. The options in the
list vary depending on the component you are using. Among these options, the following
ones requires specific configuration:
|
HCatalog version |
Select the version of the Hadoop distribution you are using. The available options vary depending on the component you are using. |
Templeton hostname |
Fill this field with the URL of Templeton Webservice. Information noteNote:
Templeton is a webservice API for HCatalog. It has been renamed to WebHCat by the Apache community. This service facilitates the access to HCatalog and the related Hadoop elements such as Pig. For further information about Templeton (WebHCat), see https://cwiki.apache.org/confluence/display/Hive/WebHCat+UsingWebHCat. |
Templeton port |
Fill this field with the port of URL of Templeton Webservice. By default, the value for this field is 50111. Information noteNote:
Templeton is a webservice API for HCatalog. It has been renamed to WebHCat by the Apache community. This service facilitates the access to HCatalog and the related Hadoop elements such as Pig. For further information about Templeton (WebHCat), see https://cwiki.apache.org/confluence/display/Hive/WebHCat+UsingWebHCat. |
Use kerberos authentication |
If you are accessing the Hadoop cluster running
with Kerberos security, select this check box, then, enter the Kerberos
principal name for the NameNode in the field displayed. This enables you to use
your user name to authenticate against the credentials stored in Kerberos.
This check box is available depending on the Hadoop distribution you are connecting to. |
Use a keytab to authenticate |
Select the Use a keytab to authenticate check box to log into a Kerberos-enabled system using a given keytab file. A keytab file contains pairs of Kerberos principals and encrypted keys. You need to enter the principal to be used in the Principal field and the access path to the keytab file itself in the Keytab field. This keytab file must be stored in the machine in which your Job actually runs, for example, on a Talend Jobserver. Note that the user that executes a keytab-enabled Job is not necessarily the one a principal designates but must have the right to read the keytab file being used. For example, the username you are using to execute a Job is user1 and the principal to be used is guest; in this situation, ensure that user1 has the right to read the keytab file to be used. |
Operation on |
Select an object from the list for the DB operation as follows: Database: The HCatalog managed database in HDFS. Table: The HCatalog managed table in HDFS. Partition: The partition specified by the user. |
Operation |
Select an action from the list for the DB operation. For further information about the DB operation in HDFS, see https://cwiki.apache.org/Hive/. |
Create the table only it doesn't exist already |
Select this check box to avoid creating duplicate table when you
create a table.
Information noteNote:
This check box is enabled only when you have selected Table from the Operation on list. |
Database |
Fill this field with the name of the database in which the HCatalog managed tables are placed. |
Table |
Fill this field to operate on one or multiple tables in a database
or on a specified HDFS location.
Information noteNote:
This field is enabled only when you have selected Table from the Operation on list. For further information about the operation on Table, see https://cwiki.apache.org/Hive/. |
Partition |
Fill this field to specify one or more partitions for the partition operation on a specified table. When you specify multiple partitions, use comma to separate every two partitions and use double quotation marks to quote the partition string. If you are reading a non-partitioned table, leave this field empty. Information noteNote:
This field is enabled only when you select Partition from the Operation on list. For further information about the operation on Partition, see https://cwiki.apache.org/Hive/. |
Username |
Fill this field with the username for the DB authentication. |
Database location |
Fill this field with the location of the database file in HDFS.
Information noteNote:
This field is enabled only when you select Database from the Operation on list. |
Database description |
The description for the database to be created.
Information noteNote:
This field is enabled only when you select Database from the Operation on list. |
Create an external table |
Select this field to create an external table in an alternative
path defined in the Set HDFS
location field in the Advanced
settings view. For further information about creating
external table, see https://cwiki.apache.org/Hive/.
Information noteNote:
This check box is enabled only when you select Table from the Operation on list and Create/Drop and create/Drop if exist and create from the Operation list. |
Format |
Select a file format from the list to specify the format of the external table you want to create: TEXTFILE: Plain text files.
RCFILE: Record Columnar files.
For further information about RCFILE, see https://cwiki.apache.org/confluence/display/Hive/RCFile.
Information noteNote:
RCFILE is only available starting with Hive 0.6.0. This list is enabled only when you select Table from the Operation on list and Create/Drop and create/Drop if exist and create from the Operation list. |
Set partitions |
Select this check box to set the partition schema by clicking the Edit schema to the right of Set partitions check box. The partition schema is either built-in or remote in the Repository. Information noteNote:
This check box is enabled only when you select Table from the Operation on list and Create/Drop and create/Drop if exist and create from the Operation list. You must follow the rules of using partition schema in HCatalog managed tables. For more information about the rules in using partition schema, see https://cwiki.apache.org/confluence/display/Hive/HCatalog. |
|
Built-in: The schema will be created and stored locally for this component only. Related topic: see Talend Studio User Guide. |
|
Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and Job designs. Related topic: see Talend Studio User Guide. |
Set the user group to use |
Select this check box to specify the user group.
Information noteNote:
This check box is enabled only when you select Drop/Drop if exist/Drop and create/Drop if exist and create from the Operation list. By default, the value for this field is root. For more information about the user group in the server, contact your system administrator. |
Option |
Select a clause when you drop a database.
Information noteNote:
This list is enabled only when you select Database from the Operation on list and Drop/Drop if exist/Drop and create/Drop if exist and create from the Operation list. For more information about Drop operation on database, see https://cwiki.apache.org/Hive/. |
Set the permissions to use |
Select this check box to specify the permissions needed by the
operation you select from the Operation list.
Information noteNote:
This check box is enabled only when you select Drop/Drop if exist/Drop and create/Drop if exist and create from the Operation list. By default, the value for this field is rwxrw-r-x. For more information on user permissions, contact your system administrator. |
Set File location |
Enter the directory in which partitioned data is stored.
Information noteNote:
This check box is enabled only when you select Partition from the Operation on list and Create/Drop and create/Drop if exist and create from the Operation list. For further information about storing partitioned data in HDFS, see https://cwiki.apache.org/Hive/. |
Die on error |
This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows. |
Advanced settings
Comment |
Fill this field with the comment for the table you want to create.
Information noteNote:
This field is enabled only when you select Table from the Operation on list and Create/Drop and create/Drop if exist and create from the Operation list in the Basic settings view. |
Set HDFS location |
Select this check box to specify an HDFS location to which the
table you want to create is saved. Deselect it to save the table you
want to create in the warehouse directory defined in the key
hive.metastore.warehouse.dir
in Hive configuration file hive-site.xml.
Information noteNote:
This check box is enabled only when you select Table from the Operation on list and Create/Drop and create/Drop if exist and create from the Operation list in the Basic settings view. For further information about saving data in HDFS, see https://cwiki.apache.org/Hive/. |
Set row format(terminated by) |
Select this check box to use and define the row formats when you want to create a table: Field: Select this check box to use Field as the row format. The default value for this field is "\u0001". You can also specify a customized char in this field. Collection Item: Select this check box to use Collection Item as the row format. The default value for this field is "\u0002". You can also specify a customized char in this field. Map Key: Select this check box to use Map Key as the row format. The default value for this field is "\u0003". You can also specify a customized char in this field.
Line: Select this check box to
use Line as the row format. The
default value for this field is "\n". You can also specify a customized char in this
field.
Information noteNote:
This check box is enabled only when you select Table from the Operation on list and Create/Drop and create/Drop if exist and create from the Operation list in the Basic settings view. For further information about row formats in the HCatalog managed table, see https://cwiki.apache.org/Hive/. |
Properties |
Click [+] to add one or more
lines to define table properties. The table properties allow you to
tag the table definition with your own metadata key/value pairs.
Make sure that values in both Key
row and Value row must be quoted in
double quotation marks.
Information noteNote:
This table is enabled only when you select Database/Table from the Operation on list and Create/Drop and create/Drop if exist and create from the Operation list in the Basic settings view. For further information about table properties, see https://cwiki.apache.org/Hive/. |
Retrieve the HCatalog logs | Select this check box to retrieve log files generated during HCatalog operations. |
Standard Output Folder |
Browse to, or enter the directory where the log files are stored. Information noteNote:
This field is enabled only when you selected Retrieve the HCatalog logs check box. |
Error Output Folder |
Browse to, or enter the directory where the error log files are stored.
Information noteNote:
This field is enabled only when you selected Retrieve the HCatalog logs check box. |
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the Job level as well as at each component level. |
Global Variables
Global Variables |
ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide. |
Usage
Usage rule |
This component is commonly used in a single-component subJob. HCatalog is built on top of the Hive metastore to provide read and write interface for Pig and MapReduce, so that the latter systems can use the metadata of Hive to easily read and write data in HDFS. For further information, see Apache documentation about HCatalog: https://cwiki.apache.org/confluence/display/Hive/HCatalog. |
Prerequisites |
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio . The following list presents MapR related information for example.
For further information about how to install a Hadoop distribution, see the manuals corresponding to the Hadoop distribution you are using. |
Limitation |
When Use kerberos authentication is selected, the component cannot work with IBM JVM. |