Skip to main content Skip to complementary content

Validating data using advanced JSON Schema constraints

This scenario presents a Job that validates customer records using JSON Schema constraints (pattern, enumeration, range) and outputs structured violations for records that fail validation.

The incoming flow comes from a CSV file containing customer data with various constraint violations, such as invalid email format, invalid country codes, and ages outside the acceptable range. The component validates each row against a schema with multiple per-column constraints and sends invalid rows to the Rejects flow with detailed violation metadata.

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real-Time Big Data Platform, Talend Data Services Platform, and Talend Data Fabric.

Setting up the Standard Job

Procedure

  1. Place the following components from the Palette to the design workspace: a tFileInputDelimited, a tSchemaComplianceCheck, and two tLogRow components.
  2. Connect the tFileInputDelimited component to the tSchemaComplianceCheck component using a Row > Main connection.
  3. Connect the tSchemaComplianceCheck component to one tLogRow component using a Row > Main connection. This output flow will gather the valid data that passes all schema validations.
  4. Connect the tSchemaComplianceCheck component to the other tLogRow component using a Row > Rejects connection. This output flow will gather rows with schema violations, passing all input columns plus three additional columns: errorCode, errorMessage, and violationDetails. These columns provide structured information about constraint violations to enable detailed error handling and troubleshooting.
    A Job using the tFileInputDelimited, tSchemaComplianceCheck, and two tLogRow components.

Configuring the components

Procedure

  1. Double-click the tFileInputDelimited component to open its properties dialog.
  2. In the Basic settings tab, configure the file path to point to your CSV file containing customer records.

    Sample CSV structure:

    CustomerID,Email,Country,Age,Phone
    1001,john.doe@example.com,US,35,555-123-4567
    1002,invalid-email,XX,15,555-123-4567
    1003,jane.smith@company.com,CA,28,555-987-6543
    1004,bob@email.com,US,150,555-111-2222            
  3. Click Edit schema to define the schema for the CSV file. Ensure the schema includes columns: CustomerID (Integer), Email (String), Country (String), Age (Integer), and Phone (String).
  4. Click OK to confirm the tFileInputDelimited configuration.
  5. Double-click the tSchemaComplianceCheck component to open its properties dialog.
  6. In the Basic settings tab, select Check columns from a JSON schema URI.
  7. Enter the path to the JSON schema URI.
    The JSON Schema should define constraints for each column such as pattern for Email, enum for Country, minimum/maximum for Age, for example:
    
    {
      "type": "object",
      "properties": {
        "CustomerID": {
          "type": "integer",
          "minimum": 1,
          "maximum": 9999
        },
        "Email": {
          "type": "string",
          "pattern": "^[^@]+@[^@]+\\.[^@]+$"
        },
        "Country": {
          "type": "string",
          "enum": ["US", "CA", "MX"]
        },
        "Age": {
          "type": "integer",
          "minimum": 18,
          "maximum": 120
        },
        "Phone": {
          "type": "string",
          "pattern": "^\\d{3}-\\d{3}-\\d{4}$"
        }
      },
      "required": ["CustomerID", "Email", "Country", "Age"]
    }
                      
  8. Double-click the first tLogRow component to configure it for displaying valid records.
  9. Select Table (print values in cells of a table). This will log all valid records to the console.
  10. Do the same for the second tLogRow component.

Running the Job and examining the output

Procedure

Press F6 to save and execute the Job.

Results

The valid and invalid records are detected:
  • Valid records (Main output): Rows that pass all schema validations.

    
    CustomerID=1001, Email=john.doe@example.com, Country=US, Age=35, Phone=555-123-4567
    CustomerID=1003, Email=jane.smith@company.com, Country=CA, Age=28, Phone=555-987-6543
                            
  • Invalid records (Rejects output): Rows with constraint violations, including violation details.

    
    CustomerID=1002, Email=invalid-email, Country=XX, Age=15, Phone=555-123-4567
    "Violations":[{"SchemaLocation":"#/properties/Email/pattern","ErrorType":"pattern",
    "ErrorCode":"1023","ErrorMessage":"$.Email: does not match the regex pattern ^[^@]+@[^@]+\\.[^@]+$","ElementValue":"invalid-email","ElementLocation":"$.Email"},
    {"SchemaLocation":"#/properties/Country/enum","ErrorType":"enum","ErrorCode":"1008","ErrorMessage":"$.
    Country: does not have a value in the enumeration [\"US\", \"CA\", \"MX\"]","ElementValue":"XX","ElementLocation":"$.Country"},{"SchemaLocation":"#/properties/Age/minimum","ErrorType":"minimum","ErrorCode":"1015","ErrorMessage":"$.
    Age: must have a minimum value of 18","ElementValue":"15","ElementLocation":"$.Age"}]}]}}
    
    CustomerID=1004, Email=bob@email.com, Country=US, Age=150, Phone=555-111-2222
    "Violations":[{"SchemaLocation":"#/properties/Age/maximum","ErrorType":"maximum",
    "ErrorCode":"1011","ErrorMessage":"$.Age: must have a maximum value of 120","ElementValue":"150","ElementLocation":"$.Age"}]}]}}
                            

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!