MDM data model to RDBMS mapping
This article describes how the MDM data model is mapped to a RDBMS database schema. It starts from a simple model that gets enriched to illustrate mapping features.
This article applies to MDM Server from version 5.2 onwards.
Basics Column namesThis first section introduces basic concepts with the data model to database schema mapping.
Consider the following XML:
<Person> <id>1</id> <lastname>Smith</lastname> <firstname>John</firstname> </Person>
The following database schema is created:
Person |
x_id |
x_lastname |
x_firstname |
This mapping is straightforward: all elements contained in the Person entity type are mapped to a column in a table named "Person". You may notice the "x_" prefix before each column name: adding this prefix ensures no column name conflicts with SQL reserved keywords ("count", "select"...). If you take a closer look at the generated table, you may notice additional columns:
Person |
x_id |
x_lastname |
x_firstname |
x_talend_taskid |
x_talend_timestamp |
Each column prefixed by "x_talend_" is a technical column not exposed to the end user: they don't exist in the user's data model. They are internally used to store task id (for the Data Stewardship Console) and the last modification timestamp. Although these values are not exposed to user, they are still used (for queries on the last modification time, for instance).
Column names might be shortened depending on the database. MDM has the following maximum length:
Database |
Maximum length for SQL names (column and table names) |
Oracle | 30 |
MySQL | 64 |
SQL Server | 128 |
Postgres | 63 |
H2 | No limit |
DB2 | 30 |
Each column of the table has a type that depends on the element type in the data model.
<Person> <id>1</id> <!-- xsd:integer --> <lastname>Smith</lastname> <!-- xsd:string --> <firstname>John</firstname> <!-- xsd:string --> </Person>
The column types are:
Person | Type |
x_id | int |
x_lastname | varchar(255) |
x_firstname | varchar(255) |
x_talend_taskid | varchar(255) |
x_talend_timestamp | bigint(20) |
Note: Technical fields follow the same typing rules as user-defined elements.
Here's the complete mapping for element type to SQL type:
Xml schema type | SQL column type (MySQL) | SQL column type (Oracle) | SQL column type (DB2) | SQL column type (SQL Server) | SQL column type (PostgreSQL) | SQL column type (H2) |
---|---|---|---|---|---|---|
Built In Type | ||||||
anyURI | VARCHAR(255) | VARCHAR2(255 Char) | VARCHAR(255) | nvarchar(255) | varchar(255) | VARCHAR(255) |
base64Binary | LONGTEXT | VARCHAR2(4000 Char) | LONG VARCHAR | nvarchar(max) | text | VARCHAR(2147483647) |
boolean | BIT(1) | NUMBER(1) | SMALLINT | bit | boolean | BOOLEAN(1) |
byte | TINYINT(4) | NUMBER(3) | SMALLINT | smallint | smallint | TINYINT(3) |
date | DATETIME | TIMESTAMP(6) | TIMESTAMP | datetime2 | timestamp | TIMESTAMP(26, 6) |
dateTime | DATETIME | TIMESTAMP(6) | TIMESTAMP | datetime2 | timestamp | TIMESTAMP(26, 6) |
decimal | DECIMAL(19,2) | NUMBER(19,2) | DECIMAL(19,2) | numeric(19,2) | numeric(19,2) | DECIMAL(19, 2) |
double | DOUBLE | FLOAT(126) | DOUBLE | float | float8 | DOUBLE(17) |
duration | VARCHAR(255) | VARCHAR2(255 Char) | VARCHAR(255) | nvarchar(255) | varchar(255) | VARCHAR(255) |
float | FLOAT | FLOAT(126) | DOUBLE | float | float4 | DOUBLE(17) |
hexBinary | VARCHAR(255) | VARCHAR2(255 Char) | VARCHAR(255) | nvarchar(255) | varchar(255) | VARCHAR(255) |
int | INT(11) | NUMBER(10) | INTEGER | int | integer | INTEGER(10) |
integer | INT(11) | NUMBER(10) | INTEGER | int | integer | INTEGER(10) |
long | BIGINT(20) | NUMBER(19) | BIGINT | bigint | bigint | BIGINT(19) |
negativeInteger | INT(11) | NUMBER(10) | INTEGER | int | integer | INTEGER(10) |
nonNegativeInteger | INT(11) | NUMBER(10) | INTEGER | int | integer | INTEGER(10) |
nonPositiveInteger | INT(11) | NUMBER(10) | INTEGER | int | integer | INTEGER(10) |
positiveInteger | INT(11) | NUMBER(10) | INTEGER | int | integer | INTEGER(10) |
short | SMALLINT(6) | NUMBER(5) | SMALLINT | smallint | smallint | SMALLINT(5) |
string | VARCHAR(255) | VARCHAR2(255 Char) | VARCHAR(255) | nvarchar(255) | varchar(255) | VARCHAR(255) |
time | DATETIME | TIMESTAMP(6) | TIMESTAMP | datetime2 | timestamp | TIMESTAMP(26, 6) |
unsignedByte | TINYINT(4) | NUMBER(3) | SMALLINT | smallint | smallint | TINYINT(3) |
unsignedInt | INT(11) | NUMBER(10) | INTEGER | int | integer | INTEGER(10) |
unsignedLong | BIGINT(20) | NUMBER(19) | BIGINT | bigint | bigint | BIGINT(19) |
unsignedShort | SMALLINT(6) | NUMBER(5) | SMALLINT | smallint | smallint | SMALLINT(5) |
Custom Type | ||||||
UUID | VARCHAR(255) | VARCHAR2(255 Char) | VARCHAR(255) | nvarchar(255) | varchar(255) | VARCHAR(255) |
PICTURE | VARCHAR(255) | VARCHAR2(255 Char) | VARCHAR(255) | nvarchar(255) | varchar(255) | VARCHAR(255) |
URL | VARCHAR(255) | VARCHAR2(255 Char) | VARCHAR(255) | nvarchar(255) | varchar(255) | VARCHAR(255) |
AUTO_INCREMENT | VARCHAR(255) | VARCHAR2(255 Char) | VARCHAR(255) | nvarchar(255) | varchar(255) | VARCHAR(255) |
MULTI_LINGUAL | LONGTEXT | VARCHAR2(4000 Char) | LONG VARCHAR | nvarchar(max) | text | VARCHAR(2147483647) |
- Custom type of MULTI_LINGUAL
- Built In type of base64Binary
For more information, see the order-by-clause query on http://itsuite.it.brighton.ac.uk/suite/docs/db2sqlref/frame3.htm.
Depending on data model definition, database mapping may enforce NOT NULL values. For instance:
<Person> <id>1</id> <!-- minOccurs=1, maxOccurs=1 --> <lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> <firstname>John</firstname> <!-- minOccurs=0, maxOccurs=1 --> </Person>
We can see only the "id" element is mandatory (i.e. minOccurs=1).
Starting from version 5.2, all "id" elements (all elements included in entity key) must be minOccurs="1".
No Person record can be created without a value in <id>: this is expressed with a NOT NULL constraint on the x_id column.
Person | Type | Constraint |
x_id | int | NOT NULL |
x_lastname | varchar(255) | |
x_firstname | varchar(255) | |
x_talend_taskid | varchar(255) | |
x_talend_timestamp | bigint(20) | NOT NULL |
Note: Technical fields follow the same rules for NOT NULL constraints as user-defined elements.
Column lengthIt is possible to configure the length of elements based on string type. You can use restrictions on xsd:string in your data model:
<xsd:simpleType name="limitedString_15"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="15"/> </xsd:restriction> </xsd:simpleType>
(in this example, "limitedString_15" is a simple type that inherits from xsd:string but has a maximum length of 15 characters).
You may then use this simple type in your entity:
<xsd:element name="Product"> <xsd:complexType> <xsd:sequence maxOccurs="1" minOccurs="1"> <xsd:element maxOccurs="1" minOccurs="1" name="Name" type="limitedString_15"> </xsd:sequence> </xsd:complexType> </xsd:element>
(in this example, the length value of "Name" must be less than or equal to 15 characters).
MDM will generate this database schema:
Product | Type | Constraint |
x_id | int | NOT NULL |
x_name | varchar(15) |
If maximum length exceeds 255 characters, MDM will automatically switch to a CLOB type for all databases but Oracle. When using Oracle, MDM will create VARCHAR columns with a maximum size of 4000.
Repeatable elements mappingLet's now consider that the entity Person may have multiple firstnames (an infinity of firstnames). An XML record would then be:
<Person> <id>1</id> <!-- minOccurs=1, maxOccurs=1 --> <lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> <firstname>John</firstname> <!-- minOccurs=0, maxOccurs=unbounded --> <firstname>William</firstname> <!-- minOccurs=0, maxOccurs=unbounded --> </Person>
Note #1 : In most cases, firstname element would be enclosed in a "firstnames" element. This scenario and impacts on database mapping is exposed in next section.
Note #2 : Foreign keys are also handled differently; this is described later in this article.
In this situation, two tables will be created: one to store the repeated elements, and one for the other elements.
Person |
x_id |
x_lastname |
Person_x_firstname |
x_id |
value |
pos |
The table " Person_x_firstname" represents all the repeated element values. "x_id" in this table points to "x_id" of Person and "pos" indicates the position of the value in the value list (this is a reverse mapping). Since XML elements are ordered (and order matters to the user: he/she doesn't want to store "John" then "William" and retrieve the values in another order).
Consider the values actually stored in the database:
Person
x_id | x_lastname |
1 | Smith |
2 | Doe |
Person_x_firstname
x_id | value | pos |
1 | John | 0 |
1 | William | 1 |
2 | John | 0 |
In the example above, Person id #1 has 2 firstnames ("John", "Willliam") and Person id #2 only has 1 firstname ("John").
Note: It becomes very easy to add an element at the end or rearrange elements within the list: you only need to modify the "pos" column.
Hierarchy mapping with 0..1Starting from previous step where "firstname" became a repeatable element, let's now assume all "firstname" are wrapped in a unique XML element named "firstnames".
<Person> <id>1</id> <!-- minOccurs=1, maxOccurs=1 --> <lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> <firstnames> <!-- minOccurs=0, maxOccurs=1 --> <firstname>John</firstname> <!-- minOccurs=0, maxOccurs=unbounded --> <firstname>William</firstname> <!-- minOccurs=0, maxOccurs=unbounded --> </firstnames> </Person>
In this case (where "firstnames" is maxOccurs=1), there's no big change in the database mapping:
Person |
x_id |
x_lastname |
Person_x_firstnames_firstname |
x_id |
value |
pos |
The table name changed from " Person_x_firstname" to "Person_x_firstnames_firstname". Since firstnames is maxOccurs=1, there's no need for an intermediate table.
Hierarchy mapping with 0..nThis section now describes what happens if a wrapper element is itself repeatable. Let's take this example:
<Person> <id>1</id> <!-- minOccurs=1, maxOccurs=1 --> <lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> <firstnames> <!-- minOccurs=0, maxOccurs=1 --> <firstname>John</firstname> <!-- minOccurs=0, maxOccurs=unbounded --> <firstname>William</firstname> <!-- minOccurs=0, maxOccurs=unbounded --> </firstnames> <details> <!-- minOccurs=0, maxOccurs=unbounded --> <name>Allergies</name> <value>peanuts</name> </details> <details> <name>Beverage</name> <value>coffee</value> </details> </Person>
When MDM detects a repeatable element that isn't a simple element (an element that contains "string", "int"), it automatically switches to a different mapping. This mapping is internally called "scattered" (the one applied before is called "flat"). When switching to scattered, all sub elements will have their own table.
Person |
x_id |
x_lastname |
x_ details |
X_Anonymous0 |
x_talend_id |
x_name |
x_value |
Let's now consider Person has an FK to a Group entity. A Person may or may not linked to a Group. From an XML perspective, this would give:
<Person> <id>1</id> <!-- minOccurs=1, maxOccurs=1 --> <lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> <group>[1]</group> <!-- minOccurs=0, maxOccurs=1 --> </Person>
In this case, this Person instance references a Group instance with id "1". This creates a simple FK column in the table.
Person |
x_id |
x_lastname |
x_ group |
Group |
x_id |
x_name |
The column "x_group" of the table "Person" is declared as a FK to the column "x_id" of the "Group" table. Please note that NOT NULL constraints also applies to the FK columns: if minOccurs >= 1 for "group" element, a NOT NULL constraint will also be added.
Note that when using version 5.2 and SQL, the next XML is exactly the same:
<Person> <id>1</id> <!-- minOccurs=1, maxOccurs=1 --> <lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> <group>1</group> <!-- minOccurs=0, maxOccurs=1 --> </Person>
The square brackets can be omitted on insertion.
Note:
- Once you read back the instance Person, MDM will put the square brackets back.
- If 'group' is a composite key (i.e. key of entity "Group" is composed of > 1 element), this would raise an exception.
It is possible to use any type for the key elements on an entity but the FK element must be a xsd:string .
Consider this example:
<Person> <id>1</id> <lastname>Smith</lastname> <group>[1]</group> </Person>
If you have this record, it becomes clear that the <group> FK element must be a string (it contains square brackets). But the Group entity could use an xsd:int for id.
MDM throws an exception if the content does not match the PK type, so this record will cause an exception when you try to save it:
<Person> <id>1</id> <lastname>Smith</lastname> <group>[abc]</group> </Person>Foreign key mapping 0..n and n..m
If Person has multiple FK to Address entity, the XML of the record would be:
<Person> <id>1</id> <!-- minOccurs=1, maxOccurs=1 --> <lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> <address>[1]</address> <!-- minOccurs=0, maxOccurs=unbounded --> <address>[2]</address> <!-- minOccurs=0, maxOccurs=unbounded --> </Person>
In this example, this Person has 2 FK to Address (one to Address with id 1, another to id 2). The generated database tables are:
Person |
x_id |
x_lastname |
Person_2_Address |
x_id |
x_address_id |
Address |
x_address_id |
x_street |
x_city |
The table "Person_2_Address" will store all relationships from Person to Address. In this case, we didn't use a reverse FK because:
- Address might be referenced by other entities from the data model.
- We keep the reverse FK for containment relationships. Since Address is an entity, it exists without a Person instance.
At the time of writing, no test ensures correct behavior when Address and Person share the same PK column name (Address PK column would be x_id).
Inheritance mapping Entity inheritanceOne Entity can inherit from another in data models created from MDM Studio. This simply handled through a "per concrete type" strategy (one table per concrete type of the inheritance tree). This means there's no effect on the mapping. If you have type A and type B and B inherits from A, you will have two tables "A" and "B". The only difference you may notice is the "browse_items_A" view would return both instances of A and B (this is not what XML database is doing).
From version 5.2 onwards, a sub entity can not redefine the key from its super type (B cannot add a new ID field, since ID is declared in the top level entity and - once declared - cannot be changed by sub entities).
About record editing and views
Inheritance behaves differently when you use SQL storage. In a view for an entity that has sub types, instances of sub type will appear. If you have an Employee type that inherits from Person, both Person and Employee will appear in the Person view.
You can also edit values of a record from its super type view: if Person declares a field "name", the value of "name" for an Employee of this field can be edited from either the Person view or the Employee view.
About Foreign Key Picker
The Foreign Key Picker now shows all possible candidates for relation: if the FK points to type "Person", the Foreign Key Pcker will show instances from "Person" and "Employee" (a "Employee" is a "Person" due to the inheritance relationship).
Reusable type inheritanceReusable types may inherit from each other (actually, this is the only type of inheritance that MDM Studio lets you define). In this case, a "per hierarchy" strategy is used: this means there will be a single table that will contain the union of all elements defined in the inheritance tree.
Let's take this sample data model (this is not a table model):
A |
x_value_a |
B |
x_value_b |
C |
x_value_c |
In this example, both B and C inherit from A. In the database, this would create a table "A":
X_A |
x_talend_id |
x_talend_class |
x_value_a |
x_value_b |
x_value_c |
There are some details worth noticing:
- A technical ID has been created (this is the "x_talend_id" column that contains an auto-generated UUID not intended to be exposed to the end user).
- A "discriminator" column has also been created (this is the "x_talend_class" column), that will help MDM to know what is the actual type of the reusable type instance.
- A "X_" prefix was added: this is done to differentiate reusable types from entity types (a data model may define a entity "A" and a reusable type "A": this is legal and supported).