Using the Java or the SQL engine
About this task
After setting the analysis parameters in the analysis editor, you can use either the Java or the SQL engine to execute your analysis.
The choice of the engine can sometimes slightly change analysis results, for example when you select the summary statistics indicators to profile a DB2 database. This is because indicators are computed differently depending on the database type, and also because Talend uses special functions when working with Java.
SQL engine:
If you use the SQL engine to execute a column analysis:
-
An SQL query is generated for each indicator used in the column analysis, the analysis runs multiple indicators in parallel and results are refreshed in the charts while the analysis is still in progress.
-
Data monitoring and processing are carried on the DBMS.
-
Only statistical results are retrieved locally.
By using this engine, you guarantee system better performance. You can also access valid/invalid data in the data explorer.
Java engine:
If you use the Java engine to execute a column analysis:
-
Only one query is generated for all indicators used in the column analysis,
-
All monitored data are retrieved locally to be analyzed,
-
You can set the parameters to decide whether to access the analyzed data and how many data rows to show per indicator. This will help to avoid memory limitation issues since it is impossible to store all analyzed data.
When you execute the column analysis with the Java engine, you do not need different query templates specific for each database. However, system performance is significantly reduced in comparison with the SQL engine. Executing the analysis with the Java engine uses disk space as all data is retrieved and stored locally. If you want to free up some space, you may delete the data stored in the main Talend Studio directory, at Talend-Studio>workspace>project_name>Work_MapDB.
To set the parameters to access analyzed data when using the Java engine, do the following:
Procedure
Results
You can now run your analysis and then have access to the analyzed data according to the set parameters.