tEmbeddingAI Standard properties

These properties are used to configure tEmbeddingAI running in the Standard Job framework.

The Standard tEmbeddingAI component belongs to the AI family.

Basic settings

Schema and Edit schema	A schema is a row description, and it defines the fields to be processed and passed on to the next component. Built-In: You create and store the schema locally for this component only. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Click Edit schema to make changes to the schema. If you make changes, the schema automatically becomes built-in. View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.
Guess schema	Click this button to retrieve the schema according to your settings.
Platform	Select or enter the platform used to embed your input data. If you are using Amazon Bedrock, only the Amazon Titan Text Embedding v1 and v2 models are supported, and the default credentials provider chain authentication method is used, which means the access to the AWS Service can be configured in a chain. If you are using Google Vertex AI, the Application Default Credentials authentication method is used, which means no additional parameters need to be defined as credentials are automatically found based on the application environment. If you are using Hugging Face, the server exception "Model thenlper/gte-large is currently loading" might occasionally happen and is due to the fact that the model used is a cold model. The error will disappear at the next retry. If you are using ONNX, you may get out-of-memory errors as ONNX models are local and therefore memory-consuming. It is recommended to set the JVM parameters to 2GB minimum in the Job run profile.
Ali Bailian (DashScope) parameters	Model name: Click the [...] button next to the field to select the embedding model of your choice. Token/API Key: Click the [...] button next to the field to enter the access token to your Dashscope platform.
Amazon Bedrock parameters	Model name: Click the [...] button next to the field to select the embedding model of your choice. Region: Click the [...] button next to the field to select one of the Amazon multi-regional locations you want to use.
Azure OpenAI parameters	Model name: Click the [...] button next to the field to select the embedding model of your choice. Token/API Key: Click the [...] button next to the field to enter the access token to your Azure OpenAI platform. Azure endpoint: Enter the endpoint to access your Azure OpenAI service API.
Cohere parameters	Model name: Click the [...] button next to the field to select the embedding model of your choice. Token/API Key: Click the [...] button next to the field to enter the access token to your Cohere platform.
Google Vertex AI parameters	Model name: Click the [...] button next to the field to select the embedding model of your choice. Location: Enter one of the Google Cloud multi-regional locations you want to use. See Vertex AI locations for more information. Google endpoint: Enter the appropriate regional endpoint corresponding to your location. See Vertex AI locations for more information. Google project ID: Enter your Google Cloud project ID. See Find your Google Cloud project ID for more information. Example: `my-sample-project-370819`
Hugging Face parameters	Model name: Click the [...] button next to the field to select the embedding model of your choice. Token/API Key: Click the [...] button next to the field to enter the access token to your Hugging Face platform.
ONNX parameters	Select the Use local ONNX file check box to use a local tokenizer file. Specify the model and tokenizer paths, as well as the pooling mode to apply. If unselected, the default embedded tokenizer file will be used. See Introduction to ONNX for more information.
Column for embedding	Select or enter the schema column you want to apply the embedding model on.

Advanced settings

tStatCatcher Statistics	Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE	The error message generated by the component when an error occurs. This is an After variable and it returns a string.

Usage

Usage rule	This component can be used as a standalone component or as a start component of a Job or subJob.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here