Skip to main content Skip to complementary content

Analyzing data with tDSQL4JSON

This scenario describes a Standard Job that transforms and analyzes JSON data using the tDSQL4JSON component.

In this scenario, you have a JSON dataset that lists countries, their cities, and UTC offsets. You want to produce a report of countries spanning multiple time zones, organized by UTC offset and cities.

Here is a sample of the JSON dataset used for the purpose of this scenario:
Sample data of world time zones in JSON format.

Before you begin

You have a sample JSON data file for input.

Procedure

  1. Create a Standard Job and add the following components:
    • tFileInputRaw
    • tDSQL4JSON
    • tFileOutputRaw
    Standard Job with tDSQL4JSON component.
  2. Configure the tFileInputRaw component:
    1. In the Filename property, enter the input file path to your JSON sample data. In this example it is world_time_zones_and_utc_offsets.json.
    2. Make sure the schema has a single column containing the JSON structure.
  3. Configure the tDSQL4JSON component:
    1. Paste the following script in the Script property to extract countries with multiple time zones:
    
    SELECT {
       countries = (
          // Iterate on the main input data collection.
          FROM data
    	  // Expand each country locations into individual records.
          UNNEST locations
    	  // Group location records by country name.
          GROUP BY country_name
    	  // Name this group for reference in subsequent clauses.
          GROUP AS g_country_name
    	  // Count the number of time zones per country with the use of a variable.
          LET $nbr_of_time_zones = count(country_name)
    	  // Filter to keep only countries with more than one time zone.
          HAVING $nbr_of_time_zones > 1
    	  // Sort results by number of time zones in descending order.
          ORDER BY $nbr_of_time_zones DESC
          SELECT {
             country_name,
             nbr_of_time_zones = $nbr_of_time_zones,
    		 // Build a nested list of time zones for each country.
             time_zones = (
    		    // Iterate over location records within each country group.
                FROM g_country_name
                SELECT {
                   utc_offset,
    			   // Build a nested list of cities for each time zone.
                   main_cities = (
    			      // Expand the city_list array, an implicit alias city_list is automatically defined and can be used to select items.
                      FROM city_list
                      SELECT city_list
                   )
                }
             )
          }
       )
    }
                        
    1. In Sample data file, select your input JSON file to preview and validate output.
    2. Select the Enable Test Run checkbox to verify the results before running the full Job.
  4. Configure the tFileOutputRaw component:
    1. In the Filename property, enter the output file path to the location where you want to store processed results.
    2. Use a single-column schema if your output is structured as a JSON field.
  5. Run the Job.

Results

An output file is created documenting all countries with more than one time zone, including their UTC offsets and associated cities:
{
   "countries" : [
      {
         "country_name" : "Russia",
         "nbr_of_time_zones" : 11,
         "time_zones" : [
            {
               "utc_offset" : "UTC+2",
               "main_cities" : [ "Kaliningrad" ]
            },  
            {
               "utc_offset" : "UTC+3",
               "main_cities" : [ "Kazan", "Moscow", "Nizhny Novgorod", "Rostov-on-Don", "Saint Petersburg" ]
            },
    // List of other countries with more than one time zone.
  ]
}
            

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!