Skip to main content

Setting advanced connection properties

ON THIS PAGE

Setting advanced connection properties

The table below describes the settings in the Advanced tab.

Advanced tab settings
Setting Description

File Format

Expand this section to specify or view the file format settings.

Target storage format

Select one of the following target storage formats: Text (the default), Avro, ORC, Parquet, Sequence.

Note:

For both regular tables and Replicate Control Tables, creating and storing the tables in text format (the default) allows data to be appended to them. This in turn reduces the number of files created on Hadoop, improves query performance, and reduces the number of Hive jobs running.

Note:

If Avro, ORC or Parquet is selected or if the target tables have skews/buckets, Replicate first converts the source data to a temporary sequence file and then runs a Hive process to convert the sequence file to the desired target format. As this will increase latency due to the additional Hive processes, it is recommended not use the aforementioned formats unless absolutely necessary.

Note:

When using the default text Serde (see below), new lines as part of the data values are not supported (due to an Apache Hadoop limitation). Although other Serdes may support new lines, best practice is to use Sequence as the target storage format.

Note:

Unlike other binary formats that need to be converted to the desired target format (see above), when Sequence format is selected, the data is loaded directly to the target and stored in an external table (in sequence format).

Note that Snappy compression is not available for sequence format.

See also: Prerequisites for using the Cloudera Distribution as a Hadoop target.

Control Tables storage format

  • Text: This is the default method.
  • Same as the target storage format: This method is only recommended if, for whatever reason, you cannot use Text format. For example, this may be the case if you have an existing process that only knows how to read/interpret the Control Table data in a different format).

Use Default SerDe

Choose the SerDe interface to use when accessing the Hive database tables. The default is LazySimpleSerde.

Other SerDe

LazySimpleSerde creates the target files in delimited text file format. To create the target files in a different format, select the Other SerDe field and then specify the name of the SerDe that you want to use.

Field delimiter

The delimiter that will be used to separate fields in the target file. The default is \001. Note that field delimiters should be ASCII characters unless they are non-printable control characters, in which case they should be in Octal format (e.g. \247).

Note:

When using other SerDe:

The default name for the field delimiter property is field.delim. If you selected Other SerDe and the specified SerDe uses a different property name (e.g. separatorChar), in addition to specifying the property value here, you also need to specify both the property name and its value in the SerDe properties field (e.g. separatorChar=\t).

Null value

The value that will be used to indicate a null value in the target file.

Example (where @ is the null value):

mike,male,295678

sara,female,@

Note:

When using other SerDe:

The default name for the null value property is serialization.null.format. If you selected Other SerDe and the specified SerDe uses a different property name (e.g. nullChar), in addition to specifying the property value here, you also need to specify both the property name and its value in the SerDe properties field (e.g. nullChar=@).

Escape character

When using LazySimpleSerde: The escape character is used to escape the field delimiter character. When a field delimiter is escaped, it is interpreted as actual data, and not as a field delimiter.

Example (where \ is the escape character and a comma is the field delimiter):

sunroof\,power-steering

When using Other SerDe: The escape character is used to escape the quote character.

Example (where \ is the escape character and double quotes is the quote character):

"\"sunroof, power-steering\""

Note:

When using other SerDe:

The default name for the escape character property is escape.delim. If you selected Other SerDe and the specified SerDe uses a different property name (e.g. escapeChar), in addition to specifying the property value here, you also need to specify both the property name and its value in the SerDe properties field (e.g. escapeChar={).

Record delimiter

The \n delimiter is used to separate records (rows) in the target files. When using the default SerDe (LazySimpleSerde), the record delimiter cannot be changed.

Note:

When using other SerDe:

The default name for the record delimiter property is line.delim. If you selected Other SerDe and the specified SerDe uses a different property name (e.g. recordChar), in addition to specifying the property value here, you also need to specify both the property name and its value in the SerDe properties field (e.g. recordChar=\r).

Quote character

The quote character is used to escape the field delimiter character. When a field delimiter is escaped, it is interpreted as actual data, and not as a field delimiter. Note that the quote character is not available when using the default SerDe (LazySimpleSerde).

Example (where double-quotes is the quote character):

"mike,male"

Note:

When using other SerDe:

The default name for the quote character property is quote.delim. If you selected Other SerDe and the specified SerDe uses a different property name (e.g. quoteChar), in addition to specifying the property value here, you also need to specify both the property name and its value in the SerDe properties field (e.g. quoteChar=’).

SerDe properties

Enter the SerDe properties if Other SerDe is selected and the SerDe properties are not the same as the Hadoop defaults (field.delim, serialization.null.format, escape.delim, line.delim, quote.delim).

The properties should be written using the following format:

"KEY1=VALUE1,KEY2=VALUE2,KEY3=VALUE3"

The list of properties should begin and end with a quotation mark.

Example:

"separatorChar=\t,escapeChar={,quoteChar=’"

Note:

When " is specified as a value, it needs to be enclosed with quotation marks and escaped with a quotation mark, as follows: """"

Add metadata header

When the target storage format is set to Text, you can optionally add a header row to the data files. The header row can contain the source column names and/or the intermediate (i.e. Replicate) data types.

Example of a target file with a header row when both With column names and With data types are selected:

Position:DECIMAL(38,0),Color:VARCHAR(10)

1,"BLUE"

2,"BROWN"

3,"RED"

...

Note:

This option is only available when "No Access" is selected as the Hive access method (in the General tab).

File Attributes

Expand this section to specify or view the file attributes.

Use Hadoop defaults

Select to work with the default block size of your Hadoop target.

Use this block size (MB)

Select to work with a different block size. The default value is 64.

Maximum file size

Specify the maximum file size of each target file. When the data reaches the maximum size, the file will be closed and written to the specified target folder.

Compress files using

Select the compression method to use on HDFS.

Note:

Cloudera ODBC drivers 2.5.20 and above do not support the Snappy compression method.

Note:

To use Snappy compression when the Setting advanced connection properties is set to Avro, Parquet or Text, you must add the following values to the hive.security.authorization.sqlstd.confwhitelist.append property in the Hive configuration file:]

  • For Avro: |hive.exec.compress.output|avro.output.codec

  • For Parquet: |hive.exec.compress.output|parquet.compression

  • For Text: |hive.exec.compress.output|parquet.compression

Note that in some Hadoop Distributions, compression will only work if you specify the value without the "hive" prefix. For example |exec.compress.output instead of |hive.exec.compress.output.

If the value(s) already exist in the hive.security.authorization.sqlstd.confwhitelist property, you do not need to add them to the hive.security.authorization.sqlstd.confwhitelist.append property.

See also: Prerequisites for using the Cloudera Distribution as a Hadoop target.

Change Processing

Expand this section to specify or view change processing settings.

Consider state idle when no changes have been processed for

Specify how long to wait before considering the state to be idle. In idle state, you can create files from data that has already been processed if the specified size and time conditions are met (see below).

File size reaches

Specify the minimum size of the data required to create a file in idle state.

Elapsed time reaches

Specify the maximum time to wait before applying the changes in idle state.

Note:

To facilitate rapid delivery of DDL messages, files are uploaded immediately, regardless of the specified File size reaches or Elapsed time reaches values.

Preventing ODBC connection timeouts

The default query timeout value is 600 seconds, which should be sufficient for most situations. However, when loading very large tables, you may need to increase the value to prevent timeouts. This can be done using the following internal parameter:

executeTimeout

See below for instructions on setting internal parameters.

Internal parameters

Internal parameters are parameters that are not exposed in the UI. You should only use them if instructed by Qlik Support.

To add internal Qlik Replicate parameters:

  1. Click the Internal Parameters link.

    The Internal Parameters dialog box opens.

  2. In the edit box, type the name of the parameter you need to add and then click it.
  3. The parameter is added to the table below the search box with its default value.
  4. Change the default value as required.
  5. To reset the parameter value to its default, click the "Restore default value" icon at the end of the row.

Settings summary

You can view a summary of your settings by clicking the Setting Summary link. This is useful if you need to send a summary of your settings to Qlik Support.