Data Modeling – Git and Code Repository

SageData provides a Gitlab repo together with your ELT/ETL server to ensure you can safely store your SQL and other code that is executed on your data. Your Git repo tracks all the changes that are done to your code and in case you ever need to rollback to previous state, you can do this with a few clicks of a button.

Our Date Modeling workflow allows our clients to run integrated Development and Production Airflow instances with CI/CD. From the high level, the deployment process looks something like the Gitflow chart bellow.

A less pretty version of it, with the 2 Airflow containers may look like this

The standard workflow process how most users prefer to work with it goes like this:

  1. A data feature is developer locally by analyst. Once complete, it is pushed into Git repo and a MR/PR is raised into Develop branch.
  2. SageData CI/CD automatically grabs the files and tests them on the Dev instance of the Airflow
  3. If the test is successful, PR/MR is market Green and allowed to be merged into Prod branch

If the Dev branch failed, an error is displayed the Data Analyst should look to fix the issue.

The link to your Git repo is available in the ETL page of your SageData portal

Git is also used to completely and fully recover your ETL/ELT system in case the instance ever goes down.

Please contact SageData team if you need to add users to your Git repository

We recommend that you and your team follow standard Git approval process to ensure the accuracy of your code and proper review process:

Everything that is merged into the Prod will be automatically deployed to your ETL/ELT instance within minutes.