More than a simple transposition of the letters ‘T’ and ‘L’, choosing a data integration approach based on ELT or ETL can make a huge difference with your SSIS development. If you’re building a Data Warehouse, using ELT (preferably ‘pattern based’) is a critical decision for delivery of your data integration projects.
It’s well known that most of the time required to build a Data Warehouse (DW) is spent on ELT/ETL development efforts. SSIS is a great tool for general data integration, but without adherence to key development standards, and predetermined integration strategies, SSIS efforts can consume far too much time in the DW development cycle. A strategy based on ‘ELT’ can help to avoid some SSIS time-consuming pitfalls.
An ELT approach breaks down into 2 components, in order:
‘EL’ (the dataflows)
‘T’ (the transformations)
When populating a Data Warehouse, you generally ‘Extract’ data from operational systems, preferably using the least possible overhead on the source system. On the other end of the dataflow, raw tables are ‘Loaded’ to a specific schema in the DW. This simple data pipeline approach is something that SSIS does almost effortlessly, and is an easily reproduced ‘pattern’, which can be duplicated very quickly (with or without help from Biml) .
Next (and last) are the ‘Transformations’. For a Data Warehouse, this often means populating a “Kimball style” dimensional model, which mostly consists of 2 types of tables: Fact and Dimension. Albeit more difficult to create as ‘patterns’ for rapid development (for this you’ll want to leverage Biml and Metadata), the most important take away is that all the required transformation logic can usually be handled directly in the database. From the perspective of orchestration, these ‘Transformation’ steps are managed by simply using the ‘Execute SQL Task’ in SSIS, which offers predictably fast run-time performance.
So you have a complete Data Warehouse ELT solution by only using two SSIS tasks:
To be honest, I often get challenged at this point for ignoring the rich ETL functionality of SSIS, and for moving all transformations into the database layer. And while I’m sure there are good opposing viewpoints, and some of course out of necessity, here is my reasoning for heartily promoting the ELT strategy:
Nothing is going to work faster for set based operations than code in the database layer
Database logic can be encapsulated in stored procedures, with the added benefit of providing ‘where used’ (i.e., table dependency) capability
Much of the transformation logic, especially for dimension tables will be based on the SQL ‘Merge’ statement. These statements can be auto-generated by leveraging the metadata in your data (more on this soon in a new blog post for just this topic)
Code changes are easier in the database
It’s more common for developers to have skills with SQL than with SSIS
If and when needed, the end result of an ELT solution is more ‘cloud adaptable’
In summary, though not applicable for all scenarios, an ELT solution to populate your Data Warehouse should provide both a superior “time to market”, as well improved manageability of your data integration processes. By adding “metadata management”, along with some simple Biml capabilities (more to come on this topic), dramatic acceleration is suddenly possible for both constructing, and maintaining your Data Warehouse projects.