Note: This article is not applicable to all destinations and data types. It applies to:
- Destinations, which do not natively support nested data structures:
- Amazon Redshift
- Amazon S3 (CSV)
- PostgreSQL
- Data types
- PostgreSQLÂ JSONÂ &Â JSONBÂ datatypes*
*Not applicable for PostgreSQL (v1) – data types they will be stored as strings. In the PostgreSQL (HP) (v2) integration, these data types are stored as JSON objects.
For a better understanding of how SageData loads the received data, you need to know more about JSON and its structure.
NoSQL databases and many SaaS integrations are using nested structures, in which, as the name states, each attribute (or column) could have its own set of attributes. For better querying through the data, SageData will split these structures into separate tables.
JSON data structures
When pulling data from an integration, SageData is extracting a series of JSON records. JSON defines only two data structures: objects and arrays.
Objects
An object is a set of name-value pairs, where each set is called a property. Objects are enclosed in braces ({}), name-value pairs are separated by a comma (,), and the name and value in a pair are separated by a colon (:). Names in an object are strings, whereas values may be of any value type, including another object or an array.
{
"customer_id": "123",
"firstName": "John",
"lastName": "Doe",
"age": 22,
"address": {Â Â Â Â Â Â Â Â // object begins
"street":"Fake Street",
"streetNumber":"5",
"zipCode":"12345",
"city": "JSON City"
} // object ends
}
If SageData would receive an object, the name-value pairs inside the object are split into a table, where columns are created. The created columns have the following naming convention: [object_name]__[property_name]
Following the example above, the columns would be created like that:
customer_id | firstName | lastName | age | address__street | address_streetNumber | address_zipCode | address_city |
123 | John | Doe | 22 | Fake Street5 | 12345 | JSON city |
Arrays
An array is a list of values. Arrays are enclosed in brackets ([]), and their values are separated by a comma (,). Each value in an array may be of a different type, including another array or an object.
Here’s an example:
{
"order_id":"1234567",
"customer_id":"1000",
"order_items":[ // array begins
{
"product_id":"243745",
"price":"7.95",
"currency":"EUR",
}
] // array ends
}
SageData will split a nested array or an array that is inside a JSON record (like the example above) and will create a subtable.
Deconstruction of nested arrays
For a better understanding of how SageData is deconstructing nested arrays, we will give you a walk-through example using an order record. The order record will consist of 3 parts:
- Main order data
- Order items
- Payment type and fees
Here’s what the JSON for the order looks like:
{
"order_id":"1234567",
"created_at":"2022-05-02 12:45:58",
"customer_id":"1000",
"order_items":[ // order record begins
{
"product_id":"243745",
"price":"7.95",
"currency":"EUR",
"quantity":"2"
"payment_type":[ // payment type record begins
{
"type":"card",
"provider":"MasterCard",
"fee":"0.58"
}
] // payment type record ends
}
]
In this record, you can find three levels of data because of nested arrays. This record contains three levels of data due to the nested arrays. SageData will start deconstructing the JSON from the top-level (main order information) and will create subtables.
From this one order record, three tables will be created:
Table name | Description |
orders | Contains the core order data: order_id, created_at timestamp, and customer ID. |
Order_line _items | Contains order info: product_id, price, currency, and quantity. |
orders__line_items__payment_type_lines | This table contains the payment type info: type, provider, fee |
Subtable connection to top-level data
The subtables created, need to contain information that connects them to top-level records. Therefore, SageData will append a few columns, which will be used as composite keys. Those keys will help you track back the sub records to their parent. The tables, based on the example above, will show you how this exactly works.
In this section:
- Top-level: Core order data
- Second level: Order line items
- Third level: Payment type lines
First level: Core order data
This table contains the order record’s Primary Key, order_id.
order_id [pk] | created_at | customer |
1234567 | 2022-05-02 12:45:58 | 1000 |
Second level: Order line items
SageData will add further columns to the second level tables, besides the key-value pairs in the nested record:
Column name | Description |
_sdc_source_key_[pk_column_name] | Contains the top-level record’s Primary Key. [pk_column_name] will take the name of the Primary Key column for the top-level record. In this example: sdc_source_key_order_id. |
_sdc_level_0_id | Part of the composite primary key, used for connecting nested rows down the line. It uses auto-increment for each unique record in the table, beginning with 0. In a combination with _sdc_source_key, it creates a unique identifier for the row. For our example, the value for the first line item record would be 0, the second 1, the third 2, and so on. |
Here’s what the order__line_items table would look like if another line item were added to the order record:
_sdc_source_key_order_id | _sdc_level_0_id | product_id | price | quantity | currency |
1234567 | 0 | 243745 | 7.95 | 2 | EUR |
1234567 | 1 | 872382 | 15.40 | 1 | EUR |
If you wanted to return all line items for order number 1234, you’d run the following query:
SELECT *
FROM orders__line_items oli
WHERE _sdc_source_key_order_id = 1234567
Third level: Payment type lines
SageData also adds further columns to third level records:
Column name | Description |
_sdc_source_key_[pk_column_name] | Contains the top-level record’s Primary Key. [pk_column_name] will take the name of the Primary Key column for the top-level record. In this example: sdc_source_key_order_id. |
_sdc_level_0_id | The foreign key for the second level (order__line_items) table. When combined with the source key (_sdc_source_key_order_id), it helps you find the parent |
_sdc_level_1_id | Part of the composite primary key, used for connecting nested rows down the line. It uses auto-increment for each unique record in the table. For our example, the value for the first line item record would be 0, the second 1, the third 2, and so on. |
Here’s what the order__line_items__payment_type_lines table would look like if another payment type line were added to the order record:
_sdc_source_key_order_id | _sdc_level_0_id | _sdc_level_1_id | price | type | fee |
1234567 | 0 | 0 | 7.95 | Mastercard | 0.58 |
1234567 | 1 | 0 | 15.40 | PayPal | 0.11 |
If you wanted to find the order line items and payment type lines for order number 1234567, you will use the following query:
SELECT *
FROM order__line_items oli
INNER JOIN order__line_items__payment_type_lines ptl       ON ptl.__sdc_level_0_id = oli.__sdc_level_0_id      AND ptl.__sdc_source_key_order_id = oli.__sdc_source_key_order_id
WHERE _sdc_source_key_order_id = 1234567
Total row count in your destination
As SageData is using a special method of deconstructing nested arrays, it is very likely that you will see more rows in SageData and your destination, than the initial data in the source. This means that the row count in the initial data source won’t be equal to the row count in your destination and the SageData app.
Looking at the example above for order 1234567: it is not a single row in the destination
Top-level record – row in the orders table:
order_id [pk] | created_at | customer |
1234567 | 2022-05-02 12:45:58 | 1000 |
Second level record – rows in the order__line_items table:
_sdc_source_key_order_id | _sdc_level_0_id | product_id | price | quantity | currency |
1234 | 0 | 243745 | 7.95 | 2 | EUR |
1234 | 1 | 872382 | 15.40 | 1 | EUR |
Third level record – rows in the orders__line_items__tax_lines table:
_sdc_source_key_order_id | _sdc_level_0_id | _sdc_level_1_id | price | type | fee |
1234567 | 0 | 0 | 7.95 | Mastercard | 0.58 |
1234567 | 1 | 0 | 15.40 | PayPal | 0.11 |
The total number of rows created by SageData in your destination in this case will be a count of 5. As it is visible from the example – there is a single order record, that can create 5 different rows on replication in your destination.