Skip to main content Skip to complementary content

Grouping elements

Availability-noteBeta
Create a hierarchy from a flat file using the Data Shaping Query Language GROUP BY and GROUP AS clauses.

Before you begin

  • You have created an input and an output structure. You can use the JSON samples below to create your structures.

About this task

In this example, you will create a DSQL map to extract data about a store from a flat JSON file and write it to a hierarchical structure.

In the sample JSON below, each object in the items array contains data about an item, including its store and department. In the output, instead of a record per item, you want to first have a record per store, then a record per department in each store, and finally a record per object in each department.

The input data looks like this:
{
    "items": [
        {
            "store_name": "plusstore",
            "store_location": "saint louis",
            "department_name": "sports",
            "salesperson_id": "25",
            "item_name": "MBIKE"
        },
        {
            "store_name": "plusstore",
            "store_location": "saint louis",
            "department_name": "sports",
            "salesperson_id": "25",
            "item_name": "ESCOOT"
        },
        {
            "store_name": "plusstore",
            "store_location": "saint louis",
            "department_name": "electronics",
            "salesperson_id": "32",
            "item_name": "Portable SSD T5"
        },
        {
            "store_name": "plusstore",
            "store_location": "saint louis",
            "department_name": "electronics",
            "salesperson_id": "32",
            "item_name": "Extreme Portable SSD"
        },
        {
            "store_name": "goodstore",
            "store_location": "nashville",
            "department_name": "electronics",
            "salesperson_id": "47",
            "item_name": "Toshiba P300"
        },
        {
            "store_name": "goodstore",
            "store_location": "nashville",
            "department_name": "fresh",
            "salesperson_id": "47",
            "item_name": "Italian tomatoes"
        }
    ]
}
The output structure looks like this:
{
    "stores": [
        {
            "name": "",
            "location": "",
            "departments": [
                {
                    "name": "",
                    "salesperson_id": "",
                    "items": [
                        {
                            "name": ""
                        }
                    ]
                }
            ]
        }
    ]
}

Procedure

  1. Create a new DSQL Map and add the input and output structures.
    A first mapping between the root elements is automatically created. You can see the corresponding Data Shaping Query Language expression in the root element of the output structure.
  2. Click the output stores element and enter the following expression:
    FROM items
    GROUP BY store_name
    GROUP AS gstore
    The GROUP BY clause ensures that the output will only produce one occurrence of stores for each unique store_name value. The GROUP AS clause is used to assign a name to the current group, which can then be used in other expressions. For more information, see the Talend Data Shaping Language Reference Guide.
  3. Drag and drop store_name on the output name element under stores.
  4. Click the output location element and enter the following expression:
    gstore[0].store_location

    You cannot simply use store_location as an expression in this case, because there could be multiple different store_location values in the same group. Since the grouping is done on store_name, this is the only element that is guaranteed to have a unique value in each group.

    The expression above indicates that output value should be the value of the store_location element in the first item of the current group.

  5. Click the departments element and enter the following expression:
    FROM gstore
    GROUP BY department_name
    GROUP AS gdepartment

    This uses the same process as the stores element, this time to produce an occurrence of the departments array for each unique department_name value.

  6. Drag and drop department_name on the output name element under departments, then click the output saledperson_id element and enter the following expression:
    gdepartment[0].salesperson_id
  7. Click the output items element and enter FROM gdepartment, then drag and drop item_name on the name element under items.

Results

Your map is configured, you can use the Test Run feature to check the result. In this example, the following result is returned:
{
   "stores":[
      {
         "name":"plusstore",
         "location":"saint louis",
         "departments":[
            {
               "name":"electronics",
               "salesperson_id":"32",
               "items":[
                  {
                     "name":"Portable SSD T5"
                  },
                  {
                     "name":"Extreme Portable SSD"
                  }
               ]
            },
            {
               "name":"sports",
               "salesperson_id":"25",
               "items":[
                  {
                     "name":"MBIKE"
                  },
                  {
                     "name":"ESCOOT"
                  }
               ]
            }
         ]
      },
      {
         "name":"goodstore",
         "location":"nashville",
         "departments":[
            {
               "name":"electronics",
               "salesperson_id":"47",
               "items":[
                  {
                     "name":"Toshiba P300"
                  }
               ]
            },
            {
               "name":"fresh",
               "salesperson_id":"47",
               "items":[
                  {
                     "name":"Italian tomatoes"
                  }
               ]
            }
         ]
      }
   ]
}

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!