Loading and Row Usage for Nested Data Structures

Note: This article is not applicable to all destinations and data types. It applies to:
Destinations, which do not natively support nested data structures:
Amazon Redshift
Amazon S3 (CSV)
PostgreSQL
Data types
PostgreSQL JSON & JSONB datatypes*
*Not applicable for PostgreSQL (v1) – data types they will be stored as strings. In the PostgreSQL (HP) (v2) integration, these data types are stored as JSON objects.

For a better understanding of how SageData loads the received data, you need to know more about JSON and its structure.

NoSQL databases and many SaaS integrations are using nested structures, in which, as the name states, each attribute (or column) could have its own set of attributes. For better querying through the data, SageData will split these structures into separate tables.

JSON data structures

When pulling data from an integration, SageData is extracting a series of JSON records. JSON defines only two data structures: objects and arrays.

Objects

An object is a set of name-value pairs, where each set is called a property. Objects are enclosed in braces ({}), name-value pairs are separated by a comma (,), and the name and value in a pair are separated by a colon (:). Names in an object are strings, whereas values may be of any value type, including another object or an array.
{ "customer_id": "123", "firstName": "John", "lastName": "Doe", "age": 22, "address": { // object begins "street":"Fake Street", "streetNumber":"5", "zipCode":"12345", "city": "JSON City" } // object ends }

If SageData would receive an object, the name-value pairs inside the object are split into a table, where columns are created. The created columns have the following naming convention: [object_name]__[property_name]

Following the example above, the columns would be created like that:

customer_id	firstName	lastName	age	address__street	address_streetNumber	address_zipCode	address_city
123	John	Doe	22	Fake Street5		12345	JSON city

Arrays

An array is a list of values. Arrays are enclosed in brackets ([]), and their values are separated by a comma (,). Each value in an array may be of a different type, including another array or an object.

Here’s an example:

{ "order_id":"1234567", "customer_id":"1000", "order_items":[ // array begins { "product_id":"243745", "price":"7.95", "currency":"EUR", } ] // array ends }

SageData will split a nested array or an array that is inside a JSON record (like the example above) and will create a subtable.

Deconstruction of nested arrays

For a better understanding of how SageData is deconstructing nested arrays, we will give you a walk-through example using an order record. The order record will consist of 3 parts:

Main order data
Order items
Payment type and fees

Here’s what the JSON for the order looks like:

{ "order_id":"1234567", "created_at":"2022-05-02 12:45:58", "customer_id":"1000", "order_items":[ // order record begins { "product_id":"243745", "price":"7.95", "currency":"EUR", "quantity":"2" "payment_type":[ // payment type record begins { "type":"card", "provider":"MasterCard", "fee":"0.58" } ] // payment type record ends } ]

In this record, you can find three levels of data because of nested arrays. This record contains three levels of data due to the nested arrays. SageData will start deconstructing the JSON from the top-level (main order information) and will create subtables.

From this one order record, three tables will be created:

Table name	Description
orders	Contains the core order data: order_id, created_at timestamp, and customer ID.
Order_line _items	Contains order info: product_id, price, currency, and quantity.
orders__line_items__payment_type_lines	This table contains the payment type info: type, provider, fee

Subtable connection to top-level data

The subtables created, need to contain information that connects them to top-level records. Therefore, SageData will append a few columns, which will be used as composite keys. Those keys will help you track back the sub records to their parent. The tables, based on the example above, will show you how this exactly works.

In this section:

Top-level: Core order data
Second level: Order line items
Third level: Payment type lines

First level: Core order data

This table contains the order record’s Primary Key, order_id.

order_id [pk]	created_at	customer
1234567	2022-05-02 12:45:58	1000

Second level: Order line items

SageData will add further columns to the second level tables, besides the key-value pairs in the nested record:

Column name

Description

_sdc_source_key_[pk_column_name]

Contains the top-level record’s Primary Key.

[pk_column_name] will take the name of the Primary Key column for the top-level record. In this example: sdc_source_key_order_id.

_sdc_level_0_id

Part of the composite primary key, used for connecting nested rows down the line. It uses auto-increment for each unique record in the table, beginning with 0. In a combination with _sdc_source_key, it creates a unique identifier for the row.

For our example, the value for the first line item record would be 0, the second 1, the third 2, and so on.

Here’s what the order__line_items table would look like if another line item were added to the order record:

_sdc_source_key_order_id	_sdc_level_0_id	product_id	price	quantity	currency
1234567	0	243745	7.95	2	EUR
1234567	1	872382	15.40	1	EUR

If you wanted to return all line items for order number 1234, you’d run the following query:

SELECT *FROM orders__line_items oli WHERE _sdc_source_key_order_id = 1234567

Third level: Payment type lines

SageData also adds further columns to third level records:

Column name	Description
_sdc_source_key_[pk_column_name]	Contains the top-level record’s Primary Key. [pk_column_name] will take the name of the Primary Key column for the top-level record. In this example: sdc_source_key_order_id.
_sdc_level_0_id	The foreign key for the second level (order__line_items) table. When combined with the source key (_sdc_source_key_order_id), it helps you find the parent
_sdc_level_1_id	Part of the composite primary key, used for connecting nested rows down the line. It uses auto-increment for each unique record in the table. For our example, the value for the first line item record would be 0, the second 1, the third 2, and so on.

Here’s what the order__line_items__payment_type_lines table would look like if another payment type line were added to the order record:

_sdc_source_key_order_id	_sdc_level_0_id	_sdc_level_1_id	price	type	fee
1234567	0	0	7.95	Mastercard	0.58
1234567	1	0	15.40	PayPal	0.11

If you wanted to find the order line items and payment type lines for order number 1234567, you will use the following query:

SELECT * FROM order__line_items oli INNER JOIN order__line_items__payment_type_lines ptl ON ptl.__sdc_level_0_id = oli.__sdc_level_0_id AND ptl.__sdc_source_key_order_id = oli.__sdc_source_key_order_id WHERE _sdc_source_key_order_id = 1234567

Total row count in your destination

As SageData is using a special method of deconstructing nested arrays, it is very likely that you will see more rows in SageData and your destination, than the initial data in the source. This means that the row count in the initial data source won’t be equal to the row count in your destination and the SageData app.

Looking at the example above for order 1234567: it is not a single row in the destination

Top-level record – row in the orders table:

order_id [pk]	created_at	customer
1234567	2022-05-02 12:45:58	1000

Second level record – rows in the order__line_items table:

_sdc_source_key_order_id	_sdc_level_0_id	product_id	price	quantity	currency
1234	0	243745	7.95	2	EUR
1234	1	872382	15.40	1	EUR

Third level record – rows in the orders__line_items__tax_lines table:

_sdc_source_key_order_id	_sdc_level_0_id	_sdc_level_1_id	price	type	fee
1234567	0	0	7.95	Mastercard	0.58
1234567	1	0	15.40	PayPal	0.11

The total number of rows created by SageData in your destination in this case will be a count of 5. As it is visible from the example – there is a single order record, that can create 5 different rows on replication in your destination.