Data Warehouse – Naming Conventions

At SageData, we understand the importance of proper data management. As such, we believe that data warehouse naming conventions are a critical aspect of this process. In this article, we will provide you with best practices for naming your data warehouse components, including tables, columns, and schemas, to ensure consistency and accuracy in your data management.

Tables vs. Schemas vs. Columns

Tables

Tables are a fundamental component of a data warehouse. Naming conventions for tables should be clear, concise, and descriptive. The name should reflect the content and purpose of the table. Ideally, the name should be easily recognizable by someone who is not familiar with the data warehouse.

For instance, a table that contains customer information should be named “customer” or “customer_info.” The name should not be too long, as long names can be difficult to read and may lead to errors. Additionally, the table name should be lowercase, and underscores should be used to separate words.

Columns

Columns are another critical component of a data warehouse. Naming conventions for columns should follow a similar approach to tables. The name should reflect the content of the column, and abbreviations should be avoided. Furthermore, column names should be lowercase and separated by underscores.

For example, a column that contains a customer’s last name should be named “last_name.” Similarly, a column that contains a product’s price should be named “product_price.” These conventions make it easier for users to understand the contents of the table.

Schemas

Schemas are a logical grouping of database objects. Naming conventions for schemas should reflect the purpose of the schema. The name should be descriptive and provide a clear understanding of the contents of the schema.

For example, a schema that contains sales data should be named “sales” or “sales_data.” Schemas should be lowercase and an underscore should be used to separate words.

So, without much ado, I present to you our… (drum roll)… SAGEDATA NAMING CONVENTION

SageData Data Warehouse Table and Entity Naming Convention

SageData uses the following naming convention within the DWH production schema (dwh):

  • Use singular forms rather than plural (user not users).
  • Keep the entity names relatively short. Abbreviate any words longer than 9 characters (“conn” not “connection”) but minimum 3 letters.
  • Prepend entity name with the entity category definition:
  • dim_ for dimensions
  • fact_ for fact tables
  • view_ for views
  • agg_ for aggregations
  • brt_ for bridge table that links two tables together
  • tableau_ for tables that are specific data sources for tableau
  • rep_ reporting tables or tables that have been created with the only purpose to provide data for a report or small set of reports
  • func_ for functions or procedures that are created in the database
  • Append column names with data type, where possible
  • _ts – columns that contain timestamps
  • _dt – columns that contain dates
  • _id – unique identifier, such as user_id. Only use this in tables, where id is not the main key. For example, id of the user in table dim_user will be named id but in table fact_transactions the same column will be called user_id.

You might be interested in -> Architecture of a (good) Data Warehouse [How To]

 

Naming Conventions for your Data Warehouse

Dimension column names can repeat parts of table names, so a dimension table containing countries can have the following DDL, where column name country is repeated from table name dim_country:

CREATE TABLE dwh.dim_country(
id INT
,country varchar(124)
);

Update:

We started adding _name where possible

CREATE TABLE dwh.dim_country(
id INT
,country_name varchar(124)
);
  • Separate words in entity name with  (example: dim_user)
  • Avoid spaces in object names
  • Avoid using SQL and database engine-reserved keywords as identifiers (i.e., names of databases, tables, indexes, columns, aliases, views, stored procedures, partitions, tablespaces, and other objects.)

If such must be used, please use it with an underscore, for example:

SELECT COUNT(*) AS count_ FROM my_table

That is all folks! Learn it, live it, and love it!

Clear and consistent naming conventions are the foundation of good data management and can lead to increased productivity, accuracy, and better decision-making.
Kirill Andriychuk
Data Evangelist
Comments are closed.