SELECT clauses
Select all instances of an entity
You can query all instances of an entity. For example, to return all instances of "Type1" records, use the query:
{
"select": {
"from": ["Type1"]
}
}
Note that the "from" element is an array type, allowing you to select multiple types. This does not mean the query can return all Type1 and Type2 instances. The array is used for joins.
Paging: start and limit
Paging may be needed in some use cases such as building a table. This can be done with "start" and "limit" elements.
For example, to return results which start with the first instance of "Type1" (index is 0 based), together with the next 10 instances, use the query:
{
"select": {
"from": ["Type1"],
"start": 0,
"limit": 10
}
}
-
Standard execution of such queries implies a creation of a list of size limit on the server side. If you specify a large value, it may lead to performance issues or an OutOfMemoryError. Therefore, make sure to use paging with relatively small values of limit.
- A limit of Integer.MAX_VALUE (2^31 -1) is considered by MDM code as "no limit". Make sure to use a streaming strategy instead of a list of results. However, this means a limit of (2^31 - 2) will try to create a list of that size (usually causing OutOfMemoryError).
Select field values
Instead of querying the whole record, you can query only one field in the record.
For example, to return values of "id" field of "Type1" type, use the query:
{
"select": {
"from": ["Type1"],
"fields": [
{"field": "Type1/id"}
]
}
}
To select more than one field, repeat the "field" element. For example, to return values of "id", "value1" and "value2" for each record of "Type1", use this query:
{
"select": {
"from": ["Type1"],
"fields": [
{"field": "Type1/id"},
{"field": "Type1/value1"},
{"field": "Type1/value2"}
]
}
}
Aliasing
For naming or client reasons, you may want to alias the returned element. Aliasing can help differentiate values in case of name collision. This can be achieved with the "alias" element.
For example, to return all "id" values in all Type1 instances, but instead of wrapping values inside an element "id", the wrapping element will be named "a", use the query:
{
"select": {
"from": ["Type1"],
"fields": [
{"field": "Type1/id"},
{
"alias": [
{"name": "a"},
{"field": "Type1/id"}
]
}
]
}
}
For example, use the query to return results with "Type1/containedField1/value1" wrapped in "v1" and "Type1/containedField2/value1" in "v2", thus avoiding the possible naming collision that both "Type1/containedField1/value1" and "Type1/containedField2/value1" would be returned wrapped in a "value1" element:
{
"select": {
"from": ["Type1"],
"fields": [
{
"alias": [
{"name": "v1"},
{"field": "Type1/containedField1/value1"}
]
},
{
"alias": [
{"name": "v2"},
{"field": "Type1/containedField2/value1"}
]
}
]
}
}
Distinct field values
You can use the "distinct" keyword to get all distinct values for an entity field.
For example:
{
"select": {
"from": ["Type1"],
"fields": [
{"distinct": {"field": "Type1/value1"}}
]
}
}
This query will return all distinct values for "value1" in type "Type1".
You can also alias a result using the "alias" element. For example:
{
"select": {
"from": ["Type1"],
"fields": [
{
"alias": [
{"name": "a0"},
{"distinct": {"field": "Type1/value1"}}
]
}
]
}
}
This returns all distinct values for "value1" but inside an element named "a0".
Count
You can count how many results are returned by a query.
For example, to count how many Type1 instances are available, use the query:
{
"select": {
"from": ["Type1"],
"fields": [
{"count": {}}
]
}
}
You can also combine the count operation with conditions to count how many instances match the specified criteria.
For example, to return how many instances of Type1 have a value1 greater than 1, use the query:
{
"select": {
"from": ["Type1"],
"fields": [
{"count": {}}
],
"where": {
"gt": [
{"field": "Type1/value1"},
{"value": "1"}
]
}
}
}
Count may also be used in order_by clauses.
Maximum and minimum
You can select the maximum and/or minimum value for a field in an entity.
For example, to retrieve the smallest value of "id" in the entity Type1, use the query:
{
"select": {
"from": ["Type1"],
"fields": [
{"min": {"field": "Type1/id"}}
]
}
}
For example, to retrieve the largest value of "id" in Type1, use the query:
{
"select": {
"from": ["Type1"],
"fields": [
{"max": {"field": "Type1/id"}}
]
}
}
For example, to retrieve both the largest and the smallest values of "id" in Type1, use the query:
{
"select": {
"from": ["Type1"],
"fields": [
{"max": {"field": "Type1/id"}},
{"min": {"field": "Type1/id"}}
]
}
}
Order by field's value
You can use as many "order by" clauses as needed in a query either in descending or ascending order to set the order in the returned result.
For example, to select all instances of Type1 and order results by id (from the largest to the smallest) and value1 (from the smallest to the largest), use the query:
{
"select": {
"from": ["Type1"],
"order_bys": [
{
"order_by": [
{"field": "Type1/id"},
{"direction": "DESC"}
]
},
{
"order_by": [
{"field": "Type1/value1"},
{"direction": "ASC"}
]
}
]
}
}
Order by field value occurrence
You can also order by field value occurrence so that the most common value is returned first down to the least common value (in case of DESC direction).
Here is one query example to return the most common value for "value1" down to the least common:
{
"select": {
"from": ["Type1"],
"fields": [
{"field": "Type1/value1"},
],
"order_bys": [
{
"order_by": [
{"count": {"field": "Type1/value1"}},
{"direction": "DESC"}
]
}
]
}
}
You can also include "limit" in the query to obtain the top N common values.
Here is one query example to obtain the top 10 values:
{
"select": {
"from": ["Type1"],
"fields": [
{"field": "Type1/value1"},
],
"order_bys": [
{
"order_by": [
{"count": {"field": "Type1/value1"}},
{"direction": "DESC"}
]
}
],
"limit": 10
}
}
Conditions
The following operands are available for use:
Query language operand |
Meaning |
---|---|
eq |
equals |
gt / gte |
greater than/greater than or equals |
lt / lte |
less than / less than or equals |
startsWith |
starts with |
contains |
contains |
isEmpty |
is empty ('') |
isNull |
is null |
in | specifies multiple values in a where clause |
full_text | performs a full text search |
Bear in mind the following:
- The structure is always "operand": { field, value }.
- The contains and full_text operands cannot be used together with other search operators. Since a full text search internally covers conditions from contains or full_text, these operands cannot be combined together.
- If the field is of numeric type (int, short, integer, long, float, double, decimal) and you provide a numeric value for it, the value can be interpreted with or without being enclosed in double quotes. Otherwise, the value must be enclosed in double quotes. Make sure that the value matches the element type. For example, "value": "abc" is incorrect if the field "Type1/id" is of numeric type.
- If you have no access to any of the fields used in the condition, the query will return nothing.
- When searching for a boolean type field that is equal to the true value, data records that contain only the true value are returned.
- When searching for a boolean type field that is equal to the false value, empty value, or any other value, data records that contain the false and null values are returned.
- When searching for a boolean type field that is not equal to the true value, data records that contain the false and null values are returned.
- When searching for a boolean type field that is not equal to the false value, empty value, or any other value, data records that contain only the true value are returned.
Example 1: The query below using the eq operand returns all instances of Type1 where the id element equals 1:
{
"select": {
"from": ["Type1"],
"where": {
"eq": [
{"field": "Type1/id"},
{"value": "1"}
]
}
}
}
Example 2: The query below using the isEmpty operand returns all instances of Store where the value of the Address element equals ''.
{
"select": {
"from": ["Store"],
"where": {
"isEmpty": {
"field": "Store/Address"
}
}
}
}
Example 3: The query below using the not operator and the isNull operand returns all instances of Store where the value of the Address element is not null.
{
"select": {
"from": ["Store"],
"where": {
"not": {
"isNull": {
"field": "Store/Address"
}
}
}
}
}
Index based condition
You may specify an index for your condition in case the field is a repeatable element. The index is 0-based.
For example, to return all instances of Type1 where the second value of "list" equals "1", use the query:
{
"select": {
"from": ["Type1"],
"where": {
"eq": [
{"index": [
{"field": "Type1/list"},
1
]},
{"value": "1"}
]
}
}
}
Conditions between fields
You may use conditions between different fields. Only "eq" (EQUALS) is currently supported for such comparisons.
For example, to return the instances of Type1 where "id" equals "value1", use the query:
{
"select": {
"from": ["Type1"],
"where": {
"eq": [
{"field": "Type1/id"},
{"value": {"field": "Type1/value1"}}
]
}
}
}
in operator
You can use the operator in within a query.
For example, to return all instances of Type1 where "id" equals 1 or 2 or 5, use the query:
{
"select": {
"from": ["Type1"],
"where": {
"in": [
{"field": "Type1/id"},
{"value": ["1","2","5"]}
]
}
}
}
Bear in mind the following:
- If the field is of numeric type (int, short, integer, long, float, double, decimal) and you provide a numeric value for it, the value can be interpreted with or without being enclosed in double quotes. Otherwise, the value must be enclosed in double quotes. Make sure that the value matches the element type. For example, "value": ["a", "b", "c"] is incorrect if the field "Type1/id" is of numeric type.
- The "value" is formatted as a JSON array, and the values listed must be static. For example, ["value1", "value2", "value3"...].
- Only simple type fields at the root level are supported. However, the mandatory multi-occurrence foreign key fields are not supported.
- The operator in cannot be used together with the full text search.
- The operator in can work with the operator not. For more information, see the section not operator.
Logic operators
To allow more complex queries, boolean operators (and/or/not) may be included. The structure is similar to what is previously presented "operator": { left, right } where left and/or right can be logic operators too.
For example, to return all Type1 instances where "id = 1 OR value1 = 0", use the query:
{
"select": {
"from": ["Type1"],
"where": {
"or": [
{
"eq": [
{"field": "Type1/id"},
{"value": "1"}
]
},
{
"eq": [
{"field": "Type1/value1"},
{"value": "0"}
]
}
]
}
}
}
A binary logic operator (and/or) must have 2 and only 2 children. For example, you can write a condition with "a OR b OR c":
{
"select": {
"from": ["Type1"],
"where": {
"or": [
{
"eq": [
{"field": "Type1/id"},
{"value": "1"}
]
},
{
"or": [
{
"eq": [
{"field": "Type1/id"},
{"value": "2"}
]
},
{
"eq": [
{"field": "Type1/value1"},
{"value": "4"}
]
}
]
}
]
}
}
}
You can also use mixed logic operators. There is no ambiguity in the evaluation order of conditions.
For example, to return all Type1 instances where "(id = 1 OR (id = 2 and value1 = 4))", use the query which mix both and and or:
{
"select": {
"from": ["Type1"],
"where": {
"or": [
{
"eq": [
{"field": "Type1/id"},
{"value": "1"}
]
},
{
"and": [
{
"eq": [
{"field": "Type1/id"},
{"value": "2"}
]
},
{
"eq": [
{"field": "Type1/value1"},
{"value": "4"}
]
}
]
}
]
}
}
}
not operator
The operator not is supported and follows the structure "not": { condition }.
For example, to return all Type1 instances where "id" is not equal to 1, use the query:
{
"select": {
"from": ["Type1"],
"where": {
"not": {
"eq": [
{"field": "Type1/id"},
{"value": "1"}
]
}
}
}
}
You can also use the operator not in a more complex condition. For example:
{
"select": {
"from": ["Type1"],
"where": {
"not": {
"or": [
{
"eq": [
{"field": "Type1/id"},
{"value": "2"}
]
},
{
"eq": [
{"field": "Type1/id"},
{"value": "4"}
]
}
]
}
}
}
}
You can use the operator not together with the operator in. The queried result does not include instances where the specified field has a null value.
For example:
{
"select": {
"from": ["Type1"],
"where": {
"not": {
"in": [
{"field": "Type1/id"},
{"value": ["1","3","4","5"]}
]
}
}
}
}
This will return all instances of Type1 where "id" does not equal any of the values listed, that is, 1, 3, 4 and 5.
Full text (field scope)
The query language includes support for full text search.
For example, to perform a full text search on the field "Type1/id" with the value "1", use the query:
{
"select": {
"from": ["Type1"],
"where": {
"full_text": [
{"field": "Type1/id"},
{"value": "1"}
]
}
}
}
The full text query also works on subelements. For example, if you have the following type structure:
Type1
* id
* element
* value1
* value2
Then the query:
{
"select": {
"from": ["Type1"],
"where": {
"full_text": [
{"field": "Type1/element"},
{"value": "1"}
]
}
}
}
Is equivalent to:
{
"select": {
"from": ["Type1"],
"where": {
"or": [
{
"full_text": [
{"field": "Type1/element/value1"},
{"value": "1"}
]
},
{
"full_text": [
{"field": "Type1/element/value2"},
{"value": "1"}
]
}
]
}
}
}
Full text (entity scope)
If you do not want to specify any field but want to look for a value in the entity, the following query gives an example of how to achieve this:
{
"select": {
"from": ["Type1"],
"where": {
"full_text": [
{"value": "1"}
]
}
}
}
If you omit the field, MDM performs full text search on all Type1 fields.
Metadata fields
MDM adds "metadata" fields to the records it manages. These fields are not present in the user data model but can be used in the query.
Supported metadata fields are:
- taskId (also known as groupId)
- timestamp (last modification time)
- groupSize
For the staging storage, there are additional metadata fields:
- Staging error
- Staging source
- Staging status
- Staging block key
-
Staging has task
Note that the field "Staging has task" indicates whether a Talend Data Stewardship task is linked with the record.
If the query contains staging-only fields, you may expect an error that indicates the incompatible expressions. The incompatible expressions are checked in selected fields, conditions, order by and join expressions.
Get metadata field values
To get a metadata field, use "metadata" instead of "field" in the selected fields as shown in the following example:
{
"select": {
"from": ["Type1"],
"fields": [
{"metadata": "timestamp"},
{"metadata": "task_id"},
{"metadata": "group_size"},
{"metadata": "staging_error"},
{"metadata": "staging_source"},
{"metadata": "staging_status"},
{"metadata": "staging_blockkey"},
{"metadata": "staging_hastask"}
]
}
}
You can also mix fields from the entity with fields from the record:
{
"select": {
"from": ["Type1"],
"fields": [
{"metadata": "timestamp"},
{"metadata": "task_id"},
{"field": "Type1/id"}
]
}
}
You can also use the "distinct" keyword for metadata fields. For example, to return all distinct values of "taskId" wrapped in an element named "distinctTaskId", use the query:
{
"select": {
"from": ["Type1"],
"fields": [
{
"alias": [
{"name": "distinctTaskId"},
{"distinct": {"metadata": "task_id"}}
]
}
]
}
}
Conditions involving metadata fields
You can involve metadata fields in conditions. As for the selected fields, just use "metadata" instead of "field". For example:
{
"select": {
"from": ["Type1"],
"where": {
"eq": [
{"metadata": "task_id"},
{"value": "1"}
]
}
}
}
Cache
MDM can cache query results. In this case, it keeps the query in cache and serves the cached result every time this query is executed.
For example, to cache a query result, use the query:
{
"select": {
"from": ["Type1"],
"where": {
"eq": [
{"field": "Type1/id"},
{"value": "1"}
]
},
"cache": true
}
}
By default, "cache" is false, but this query is also valid:
{
"select": {
"from": ["Type1"],
"where": {
"eq": [
{"field": "Type1/id"},
{"value": "1"}
]
},
"cache": false
}
}
A cached result can have at most 50 records. When the query yields too many results, the following log will appear in the MDM log at DEBUG level: Query is yielding more results than cache is allowed to keep, bypassing cache for query.
Join to other types
Joins are a way to retrieve data from another type following the foreign keys.
For example, you can retrieve the Address's street element by following the FK between Person and Address types.
A simple example of join would be:
{
"select": {
"from": [
"Type1",
"Type2",
],
"fields": [
{"field": "Type1/id"},
{"field": "Type2/value1"}
],
"joins": [
{
"from": "Type1/fk2",
"on": "Type2/id"
}
]
}
}
This query returns all Type1's "id" element as well as the "value1" element located in Type2. The join is performed using the "fk2" in Type1.
You may chain joins in the query too. For example:
{
"select": {
"from": [
"Type1",
"Type2",
"Type3"
],
"fields": [
{"field": "Type1/id"},
{"field": "Type2/value1"},
{"field": "Type3/value2"}
],
"joins": [
{
"from": "Type1/fk2",
"on": "Type2/id"
},
{
"from": "Type2/fk3",
"on": "Type3/id"
}
]
}
}
This query displays "id" from Type1, "value1" from Type2 and "value1" from Type3. The value for Type3's "value1" is obtained via a join between Type1, Type2 and Type3.
History related operators - “as of" operator
The "as_of" operator allows you to retrieve a record as it was at any date.
For example:
{
"select": {
"from": ["Type1"],
"as_of": {
"date": "1000"
}
}
}
The "as of" operator will not fail in case no history is available. Rather, it will return the record at its current state.
History browsing relies on the journal. If you inserted data straight into the SQL database or disabled event creation for tMDMOutput/tMDMDelete components, you should expect only partial history.
MDM can only build a history of the records based on information it stored about updates: if you disabled journal for some updates, it cannot be a complete and accurate history.
For example, to return all Type1 instances with the values they had one second after January 1st 1970 (1000 ms after EPOCH), use the query:
{
"select": {
"from": ["Type1"],
"as_of": {
"date": "1970-01-01T00:00:01"
}
}
}
History navigation shortcuts
The element "date" for the "as_of" element supports some handy shortcuts:
- yesterday: date 24 hours ago.
- creation: date of record creation. When you use "creation", you are not allowed to use the swing parameter. Otherwise, an expected error will be thrown.
- now: current date.
For example, to return all instances of Type1 as they were yesterday (24 hours ago), use the query:
{
"select": {
"from": ["Type1"],
"as_of": {
"date": "yesterday"
}
}
}
Filter history (get by id)
You may want to filter the history records instead of getting all instances.
For example, to return the instance of Type1 with "id = 1" as it was on January 1st 1970, use the query:
{
"select": {
"from": ["Type1"],
"where": {
"eq": [
{"field": "Type1/id"},
{"value": "1"}
]
},
"as_of": {
"date": "1000"
}
}
}
An improvement on this query would be to use all conditions with "as_of", which allows you to query on previous values. Note that conditions do not need to be only on id elements.
For example, to return the Type1 instance that contained "text" on January 1st, 1970, use the query:
{
"select": {
"from": ["Type1"],
"where": {
"contains": [
{"field": "Type1/value"},
{"value": "text"}
]
},
"as_of": {
"date": "1000"
}
}
}
However, this has scalability issues since it might force MDM to build a complete set of Type1 instances, thus leading to memory issues since states are computed in memory.
Using Joins together with as_of
You can use "joins" in a query together with the "as_of" operator.
For example, using the query below returns three fields ("id", "value1", "value2") coming from two types ("Type1", "Type2"):
{
"select": {
"from": [
"Type1",
"Type2"
],
"fields": [
{"field": "Type1/id"},
{"field": "Type1/value1"},
{"field": "Type2/value2"}
],
"joins": [
{
"from": "Type1/fk2",
"on": "Type2/id"
}
],
"as_of": {
"date": "1000"
}
}
}
The returned values have the features below:
- The "as_of" returns the values of "Type1" as they were on 01/01/1970.
- The FK value used for Type2 is the FK value as of 01/01/1970.
- The values for Type2 are those on 01/01/1970.
Security and user roles
Bear in mind that the query passed to the service is not always the query MDM will execute. Before executing the query, MDM will prune all elements the logged user is not allowed to access. This includes selected fields, conditions, and order by. All elements to hide from users are removed from the query without reporting any errors.
For example,
{
"select": {
"from": ["Type1"],
"fields": [
{"field": "Type1/id"},
{"field": "Type1/secret_field"}
]
}
}
In this example, MDM will automatically remove "secret_field" from the query if the current user (that is, the user used to log on the REST API) is not allowed to access "secret_field". Therefore, the actual query executed will be:
{
"select": {
"from": ["Type1"],
"fields": [
{"field": "Type1/id"}
]
}
}
Another example of such queries is:
{
"select": {
"from": ["Type1"],
"where": {
"gt": [
{"field": "Type1/secret_field"},
{"value": "1"}
]
}
}
}
This query will be read as "return all instances of Type1 where Type1's secret_field is greater than 1", and the returned result is expected to be Type1 instances where secret_field > 1.
However, if the current user's roles do not have access right to "secret_field", the actual query becomes:
{
"select": {
"from": ["Type1"]
}
}
This query will be read as "return all instances of Type1", and the returned result now is expected to be all Type1 instances since the current user is not allowed to access "secret_field".
This prevents unprivileged users to indirectly guess what are the values in "secret_field" using value conditions on this field.