Skip to main content Skip to complementary content

tAthenaInput properties for Apache Spark Batch

These properties are used to configure tAthenaInput running in the Spark Batch Job framework.

The Spark Batch tAthenaInput component belongs to the Databases family.

The component in this framework is available in all subscription-based Talend products with Big Data and Talend Data Fabric.

The AWS account used in tAthenaConfiguration must have permissions to run Athena queries and to access the queried tables.

Basic settings

Properties Description
Schema and Edit schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Built-In: You create and store the schema locally for this component only.

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion.

    If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.

The schema must match or be a subset of the columns returned by the SQL query.

Connection Select the tAthenaConfiguration component to use from the drop-down list of all available configurations in the Job.
Query Enter the SQL SELECT statement to run against Athena, such as select id, name from employee.

Usage

Usage guidance Description

Usage rule

tAthenaInput is a source component. It starts a subJob and produces one output flow.

Supports a Row > Main output connection and an OnSubjobOk trigger from a preceding subJob.

A tAthenaConfiguration component must exist in the same Job and be selected in the Connection property.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!