tAthenaInput properties for Apache Spark Batch
These properties are used to configure tAthenaInput running in the Spark Batch Job framework.
The Spark Batch tAthenaInput component belongs to the Databases family.
The component in this framework is available in all subscription-based Talend products with Big Data and Talend Data Fabric.
The AWS account used in tAthenaConfiguration must have permissions to run Athena queries and to access the queried tables.
Basic settings
| Properties | Description |
|---|---|
| Schema and Edit schema |
A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields. Built-In: You create and store the schema locally for this component only. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:
The schema must match or be a subset of the columns returned by the SQL query. |
| Connection | Select the tAthenaConfiguration component to use from the drop-down list of all available configurations in the Job. |
| Query | Enter the SQL SELECT statement to run against Athena, such as select id, name from employee. |
Usage
| Usage guidance | Description |
|---|---|
|
Usage rule |
tAthenaInput is a source component. It starts a subJob and
produces one output flow. Supports a output connection and an OnSubjobOk trigger from a preceding subJob. A tAthenaConfiguration component must exist in the same Job and be selected in the Connection property. |