Accelerated ETL using BI Markup Language (Biml)
A common complaint in the construction of a Business Intelligence (BI) solution is the length of time, and the associated cost with building the Extract, Transform, Load (ETL) routines to populate the associated data repositories (i.e., DataMart’s, Data Warehouse). Some estimates place this ETL development at up to 80% of the overall BI development effort.
There are a few reasons for this ETL conundrum that include:
Consolidation of multiple source (operational) systems
Efficiencies with data pulls (full vs. incremental)
Proper interpretation of each systems’ business rules
Maintaining historical perspectives
None of which is addressed in this article…
But perhaps an equally challenging problem with the construction of a Data Warehouse occurs with the process of coding the ETL logic, which is really an issue inherent to the ETL developer’s toolkit. In short, most ETL tools are centric to the graphical user interface (think very repetitive ‘drag and drop’, ’point and click’). These interfaces create a nice visual representation of data mapping, but at the same time this method generally abandons decades of proven software development techniques, such as ‘code reuse’.
Microsoft recognized this issue several years ago, and sought to address with project Vulcan (the remnant of this now defunct project can be viewed here: https://vulcan.codeplex.com/). Fortunately, a key member of project Vulcan (Scott Currie) has formed a company called Varigence to continue this effort, which has given the BI community an important XML dialect called Biml, along with related products and technologies (Mist, BimlScript, BimlOnline, BimlExpress) which form a healthy part of the growing ‘Biml ecosystem’.
It should be mentioned that while most of the work leveraging Biml is centric to the Microsoft product line, nothing in the Biml language precludes developing emitters for other platforms. The beauty of Biml is it’s a highly readable XML declaration of Business Intelligence assets. As a side note, while database, cube, and other definitions are also a part of Biml, what appears to be of most interest in the BI community, is its application for ETL processes. Consider the following Biml file:
Adhering to the design intent of XML, this Biml file is both a human readable document, as well as fully functional. When emitted using Microsoft Integration Services, this script will create an executable project, containing 2 data connections, 2 packages, 2 execute SQL statements, and 2 Data flow transformations.
Approaches for ETL Acceleration
Now that we have a way to author ETL packages using just XML tags, code acceleration can be achieved by automating how these Biml tags are built. The most common approach is to use BimlScript, which introduces C# or VB.Net code nuggets that generate Biml:
These code nuggets (delimited with <# and #>) are generally used to access database metadata, and yield reusable patterns that could be used for perhaps hundreds of tables. For more information on BimlScript usage and techniques please see: http://www.bimlscript.com/walkthrough/Details/3115
Whereas BimlScript ‘code nuggets’ is a common approach to Biml tag automation, there are many other approaches that can be used (refer to: http://www.bimlscript.com/walkthrough/Details/3114 for a list of Biml generation alternatives).
Of particular interest to this author is the use of SQL Server scripts to automate Biml tag creation. One reason for this is most all the metadata that you’ll want to reference is natively available in SQL (e.g., Information Schema Views). In addition, SQL statement can use the ‘FOR XML’ clause, which then provides a perfect arrangement for generating pure XML (Biml) files directly in SQL Server. For a working example of this concept, please see the blog entry:
Does it really work?
Yes. Many developers who have waded into the Biml waters have found 2x, 5x, and even 10x productivity improvements with ETL development. In addition, beyond rapid development, since Biml packages are code generated, the quality is better, and the re-factoring of even hundreds of existing packages is now practical.
Are there drawbacks?
Yes. Biml is a pretty big shift in how ETL developers generally create packages. It requires significant re-thinking in how to construct packages in code (Biml), which was formally performed by hand (on the ETL design canvas). Like all big changes, it’s best done methodically, accommodating the learning curve, and having someone in house to champion the cause.
In summary, Biml, along with its growing ecosystem, has the potential to revolutionize delivery times for any project that requires numerous ETL packages. We can boast of many happy customers, due directly to the adoption of Biml for SSIS package development. Even more exciting, the ability to leverage many of the time tested software development techniques using a Biml/SQL approach, keeps this ETL guy amazed at all the possibilities.