Setting advanced connection properties
The table below describes the settings in the Advanced tab.
Setting | Description |
---|---|
File Format |
Expand this section to specify or view the file format settings. |
Select one of the following target storage formats: Text (the default), Avro, ORC, Parquet, Sequence. Information note
For both regular tables and Replicate Control Tables, creating and storing the tables in text format (the default) allows data to be appended to them. This in turn reduces the number of files created on Hadoop, improves query performance, and reduces the number of Hive jobs running. Information note
If Avro, ORC or Parquet is selected or if the target tables have skews/buckets, Replicate first converts the source data to a temporary sequence file and then runs a Hive process to convert the sequence file to the desired target format. As this will increase latency due to the additional Hive processes, it is recommended not use the aforementioned formats unless absolutely necessary. Information note
When using the default text Serde (see below), new lines as part of the data values are not supported (due to an Apache Hadoop limitation). Although other Serdes may support new lines, best practice is to use Sequence as the target storage format. Information note
Unlike other binary formats that need to be converted to the desired target format (see above), when Sequence format is selected, the data is loaded directly to the target and stored in an external table (in sequence format). Note that Snappy compression is not available for sequence format. See also: Prerequisites for using the Cloudera Distribution as a Hadoop target. |
|
Control Tables storage format |
|
Use Default SerDe |
Choose the SerDe interface to use when accessing the Hive database tables. The default is LazySimpleSerde. |
Other SerDe |
LazySimpleSerde creates the target files in delimited text file format. To create the target files in a different format, select the Other SerDe field and then specify the name of the SerDe that you want to use. |
Field delimiter |
The delimiter that will be used to separate fields in the target file. The default is Information note
When using other SerDe: The default name for the field delimiter property is |
Null value |
The value that will be used to indicate a null value in the target file. Example (where @ is the null value): mike,male,295678 sara,female,@ Information note
When using other SerDe: The default name for the null value property is |
Escape character |
When using LazySimpleSerde: The escape character is used to escape the field delimiter character. When a field delimiter is escaped, it is interpreted as actual data, and not as a field delimiter. Example (where \ is the escape character and a comma is the field delimiter): sunroof\,power-steering When using Other SerDe: The escape character is used to escape the quote character. Example (where \ is the escape character and double quotes is the quote character): "\"sunroof, power-steering\"" Information note
When using other SerDe: The default name for the escape character property is |
Record delimiter |
The Information note
When using other SerDe: The default name for the record delimiter property is |
Quote character |
The quote character is used to escape the field delimiter character. When a field delimiter is escaped, it is interpreted as actual data, and not as a field delimiter. Note that the quote character is not available when using the default SerDe (LazySimpleSerde). Example (where double-quotes is the quote character): "mike,male" Information note
When using other SerDe: The default name for the quote character property is |
Enter the SerDe properties if Other SerDe is selected and the SerDe properties are not the same as the Hadoop defaults ( The properties should be written using the following format: "KEY1=VALUE1,KEY2=VALUE2,KEY3=VALUE3" The list of properties should begin and end with a quotation mark. Example:
Information note
When " is specified as a value, it needs to be enclosed with quotation marks and escaped with a quotation mark, as follows: """" |
|
Add metadata header |
You can optionally add a header row to the data files. The header row can contain the source column names and/or the intermediate (i.e. Replicate) data types. Example of a target file with a header row when both With column names and With data types are selected: Position:DECIMAL(38,0),Color:VARCHAR(10) 1,"BLUE" 2,"BROWN" 3,"RED" ... Information note
This option is only available when "No Access" is selected as the Hive access method (in the General tab). |
File Attributes |
Expand this section to specify or view the file attributes. |
Use Hadoop defaults |
Select to work with the default block size of your Hadoop target. |
Use this block size (MB) |
Select to work with a different block size. The default value is 64. |
Maximum file size |
Specify the maximum file size of each target file. When the data reaches the maximum size, the file will be closed and written to the specified target folder. |
Select the compression method to use on HDFS. Information note
Cloudera ODBC drivers 2.5.20 or later do not support the Snappy compression method. Information note
To use Snappy compression when the Setting advanced connection properties is set to Avro, Parquet or Text, you must add the following values to the
Note that in some Hadoop Distributions, compression will only work if you specify the value without the " If the value(s) already exist in the See also: Prerequisites for using the Cloudera Distribution as a Hadoop target. |
|
Expand this section to specify or view change processing settings. |
|
Consider state idle when no changes have been processed for |
Specify how long to wait before considering the state to be idle. In idle state, you can create files from data that has already been processed if the specified size and time conditions are met (see below). |
File size reaches |
Specify the minimum size of the data required to create a file in idle state. |
Elapsed time reaches |
Specify the maximum time to wait before applying the changes in idle state. |
To facilitate rapid delivery of DDL messages, files are uploaded immediately, regardless of the specified File size reaches or Elapsed time reaches values.
Preventing ODBC connection timeouts
The default query timeout value is 600 seconds, which should be sufficient for most situations. However, when loading very large tables, you may need to increase the value to prevent timeouts. This can be done using the following internal parameter:
executeTimeout
See below for instructions on setting internal parameters.
Internal parameters
Internal parameters are parameters that are not exposed in the UI. You should only use them if instructed by Qlik Support.
To add internal Qlik Replicate parameters:
-
Click the Internal Parameters link.
The Internal Parameters dialog box opens.
- In the edit box, type the name of the parameter you need to add and then click it.
- The parameter is added to the table below the search box with its default value.
- Change the default value as required.
- To reset the parameter value to its default, click the "Restore default value" icon at the end of the row.
More options
These options are not exposed in the UI as they are only relevant to specific versions or environments. Consequently, do not set these options unless explicitly instructed to do so by Qlik Support or product documentation.
To set an option, simply copy the option into the Add feature name field and click Add. Then set the value or enable the option according to the instructions you received.
Settings summary
You can view a summary of your settings by clicking the Setting Summary link. This is useful if you need to send a summary of your settings to Qlik Support.