Harvest several Models from a directory of external metadata files.
It is common for an organization to have a large number of external metadata files, but does not use an external metadata repository. Often, this organization would like to import the files into Talend Data Catalog in batch in an automated fashion. Talend Data Catalog has the ability to support this scenario with the help of a harvesting script.
In this case, the files are stored under a file directory which is accessible to the Talend Data Catalog application server. The script scans the directory and its subdirectories for files of the particular external metadata type and finds matching Models under a particular folder in Talend Data Catalog . The Talend Data Catalog repository folder and model structure will match the structure of files and their directories on the file system. When the necessary Model does not exist the script creates one and imports the file. When the content is present the script will re-import it if the file’s version has not been harvested yet.
One can schedule MM to run the script periodically. It should allow customers to place files under the directory and be assured that Talend Data Catalog will import them automatically. It will work for any single model file-based bridges.
A special model named Settings must also be defined in order to control how the files will be imported (what source tool and what bridge parameters).
Ensure proper permissions
- Sign in as a user with at least the Metadata Managementcapability object role assignment on the Configuration you are in.
Create the configuration (as needed)
One may create, harvest and analyze metadata (models) from the MANAGE > Repository tree. However, generally it is best to create a configuration to work in (and to define the scope of analysis and search) and then go to MANAGE > Configuration and create and harvest there.
Create the Talend Data Catalog folder
- Go to the MANAGE > Repository
- Right click on a folder in the Repository Tree where you want to place the folder containing the results of the import and select New > Folder.
- Name the folder accordingly.
Create the Settings file to control the import of Models:
- Right click on that new folder in the Repository Tree and select New > Imported Model.
- Select the Overview tab in the Create Model dialog.
- Enter “Settings” in the NAME for the model.
- Select the correct source format in the IMPORT BRIDGE pull-down.
- Click OK.
- Select the Import Setup tab.
- For each of the Parameters, complete according to the tool-tips displayed in the right hand panel of the dialog. In particular, for the File parameter
- Click on the Browse icon and browse for a file inside the directory structure on the file system.
If you cannot find the location using the Browse function you must configure (as part of the installation) the available paths to present to users. More details may be found in the deployment guide.
- Update the File parameter to so that the path only refers to the top level of the directory structure on the file system (i.e., remove the file name and any sub-directory names as well as removing the trailing slash or backslash.
- Click SAVE.
Harvest the Models on demand:
- Right click the new folder in the Repository Panel and select Operations > Import new model(s) from folder.
- Click Run Operation.
- The Log Messages dialog then appears and log messages are presented as the import process proceeds.
- If you receive the Import Successful result, click Yes to open the Model. If instead you see the Import Failed result, inspect the log messages and correct the source Model file accordingly.
- Be sure to include those new models in the configuration.
You may now browse the Models.