ELT vs ETL Explained
More than a simple transposition of the letters T and L, choosing a data integration approach based on ELT or ETL can make a huge difference with your Data Integration strategy. This is true when building an Operational Data Store (ODS), Enterprise Data Warehouse (EDW), or really any solution involving ‘data movement ‘and/or ‘transformations.
It’s well known that most of the time required to build a Data Warehouse (DW) is spent on ELT/ETL development efforts. There are many great tools available for general data integration, but without adherence to key development standards, and predetermined integration strategies, these efforts can consume far too much time in the development cycle. A strategy based on “ELT patterns” can help to avoid many these time-consuming pitfalls and place you in better alignment with client expectations.
An ELT patterned approach breaks down into 2 main components, in order:
EL (the dataflows)
T (the transformations)
Extract & Load
When populating a Data Warehouse, you generally Extract data from operational systems, preferably using the least possible overhead on the source system. On the other end of the dataflow, raw tables are Loaded to a staging schema in the DW. This simple data pipeline approach is something that many products do almost effortlessly, and is an easily reproduced ‘pattern’, which can often be duplicated very quickly.
Next (and last) are the Transformations. For a Data Warehouse, this often means populating a “Kimball style” dimensional model, which mostly consists of two types of tables: Fact and Dimension. Albeit more difficult to create as ‘patterns’ for rapid development (for this you’ll want to leverage metadata along with ‘code generators’), the most important take away is that all the required transformation logic can be handled directly in the database. From the perspective of orchestration, these ‘Transformation’ steps are managed by simply using ‘Execute SQL’ tasks, which will perform best when running directly on the DW.
So now you have a complete Data Warehouse ELT solution by only using two generic tasks:
Dataflows – “source to destination” table movement for DW staging
Transformations – native SQL Statements to populate Fact and Dimensions
ELT Increases Speed & Flexibility
To be honest, I often get challenged for ignoring the rich ETL functionality of products such as SSIS, Informatica, or similar, where you can place all transformations inside a dataflow pipeline. However here are my reasons for performing transformations natively in the database:
Nothing is going to work faster for set based operations than code in the database layer
Database logic can be encapsulated in stored procedures, with the added benefit of providing ‘where used’ (i.e., table dependency) capability
Much of the transformation logic, especially for dimension tables will be based on the SQL ‘Merge’ statement. These statements can be auto-generated by leveraging the metadata in your data (more on this topic soon)
Code changes are easier in the database
More developers have experience with ‘SQL’ than ‘ETL’ tools
‘SQL’ centric solutions are more ‘cloud adaptable’
ETL Promotes Vendor Lock
Finally, it seems like all major ‘ETL’ Products encourage developers to place transformations inside their pipelines, which also happens to ensure “Vendor Lock”. If instead, you design your dataflows ‘EL’ separate from transformations ‘T’, you will have a solution centric to just ‘SQL’, which is the true “lingua franca” of today’s popular databases.
In summary, an ELT solution to populate your Data Warehouse should provide both a superior “time to market”, as well improved manageability of your data integration processes. By adding “metadata management”, along with some automated pattern capabilities, dramatic acceleration is possible for both constructing, and maintaining your Data Warehouse projects.