Data Science – Data Validation

Data Validation is one of the essential tools to ensure that your data is timely and consistent all the time. SageData Data Validation module examines each of your tables to ensure that:

  • There are no duplicates
  • There is fresh data (data for today or for yesterday)

Don’t have have the time to read the whole article?

Check out the Data Validation module in action in this 2 min video.

Step 1 – Add Email

Data Validation engine will send notifications about data inconsistencies and inaccuracies to a defined email address. You should add an email address to the module where you or your team want to receive notifications when a particular table is missing data or data is not consistent.

Step 2 – Select the data

To add tables for Data Validation module to monitor for data errors, select the connection to the data repository and choose the schema where the tables that you want to monitor are located.

After you click Submit, you will see a list of tables in that schema. Put a checkmark for the tables you want to monitor in the column ÔÇ£Activate MonitoringÔÇ£

You can add not only tables but also views to be monitored by Data Validation module

Step 3 – Define Monitoring Rules

There is a defined set of rules for which the Data Validation module will monitor the data:

Monitoring for Data Duplication

If you select an ID column for a table, that will tell the SageData Data Validation module that there is a unique identifier for this table and the system will monitor it to ensure that there are no duplicates on that ID.

Making sure there is Daily Data

If there is a date or datetime column in the table or view, you will be able to select it under the Select Date Column dropdown. This will tell the SageData Data Validation module that there is an expectation of new data coming in every day based on this date column.

Specify if you expect daily data or data only for the previous day. Daily data will assume that data will be arriving every day up and including today.

Additionally, specify if you expect data to arrive only during workdays (Mon – Fri). This is good for businesses that do no processing on the weekends. Data Frequency and Workdays Only options only work if there is a Data Column selected.