ELTSnap Desktop

 

Introduction

 

EltSnap is a Windows-based application which allows users to orchestrate and control data integration processes, including the ELT used for populating a Data Warehouse. We focus on the ELT paradigm, representing Extract, Load, and Transform activities which leverages the elasticity available in modern, cloud-based, data warehouses. EltSnap uses the common Project/Package model to define data integration configurations and activities. EltSnap comes with a custom run-time engine which supports both On-prem and Cloud-based database platforms. ELTSnap includes an intuitive windows front-end application, a back-end SQL database, and a cross-platform Python runtime engine supporting parallel operations.

How ELTSnap works

 

ELTSnap controls the complete data integration process by allowing the user to define projects, packages, connections, parameters and environments (e.g., Dev, Test & Prod). These 'build resources' (projects, connections and packages), once defined, can be reused and included into multiple ELTSnap projects. Several types of automated package patterns are accessible and can be created by simply filling out application forms. Once defined, these packages can be run (or scheduled) with just a few clicks. Only basic SQL programming skills are required.

 

Project

 

Creating a project lets user organize connections, parameters, packages and environments within one unit. A project will be created after entering the project name and the project template group. Specific template group allows the user to add features to all existing projects and packages. These features are fully integrated with our complimentary ELT runtime framework.

 

The following template groups are supported:

 

  1. No Framework - the minimalist template and can be used when specific features are not required.

  2. Framework Logging – provides the following framework features:

    • Project restart ability

    • Package row count

    • Package runtime

    • Package error logging.

  3. Framework Logging with Alert Emails - provides all features along with receiving alert emails.

  4. Framework Logging with Alerts and Informational Emails - provides all features and alerts along with additional information being sent via email.

 

Note that based on the template group selection, certain connection and parameter resources will be added automatically to the created ELTSnap projects. Template group value can be changed at any time.

 

Connections can be: edited, deleted and cloned. Every connection has specific dependencies which user should take into consideration when editing or deleting the specific connection.

 
Connection

 

Creating a connection means defining all the necessary information needed for data warehouse integration. It connects the user to a location where data is stored or to which the data needs to be extracted. Depending on the project pattern, a package may require one, two or even more predefined connections except Execute Process Pattern Package which does not require a connection.

 

Every specific connection is driven by the parameters created for that specific connection. After specifying the desired connection type and successfully creating the connection, it will be linked to a specific project. Note that, once created, a connection can be reused later for other packages.

 

Note that, once created, the connection can always be edited, cloned or deleted.

 

An overview of all available connection types is provided in the following segments.

 

 

OleDB Connections

 

To create a new OleDb connection, the user should choose the Add a new OleDB connection from the Connections submenu at the main menu.

 

The user should provide the following information:

  • Connection Name - it is recommended to make the connection name as descriptive as it can;

  • Server Name – the actual server name for the specific connection;

  • Database Name – the existing database name for the specified server;

  • Custom Connect String – the expression that uses the SQL standard security to have bigger control over the connection string. This argument is optional.

SMTP Connections

 

To create a new SMTP connection, the user should choose the Add a new SMTP connection from the Connection submenu at the main menu.

 

To add SMTP connection, the user should provide the following information:

  • Connection name- it is recommended to make the connection name as descriptive as it can be

  • Server Name- actual server name for the specific connection;

  • SMTP Username- Valid SMTP username. This value is optional

  • SMTP Password- Valid SMTP password. This value is optional

  • Server name Expression- the column value will be compute automatically depended on provided parameters

 

 
Flat Files Connections

 

Flat file connections are used to access flat files that are stored somewhere. To create a new flat file connection, the user should choose the Add a new Flat File connection from the Connection submenu at the main menu.

 

The user should provide the following information:

  • Connection Name - it is recommended to make the connection name as descriptive as it can be

  • File path - the physical path to the location where the desired file is stored

  • File format - the format of the file that is stored. This value can be chosen from a drop-down list of existing custom formats.

 

After the correct information is provided, file path and format expressions are automatically computed. User does not need to provide data for these fields.

 

 

Flat File Formats

 

User can also add flat file formats, which are later added to the drop-down list used for flat files connections. The flat file format is used to define a format of flat file columns. It is used to map the flat file columns into the database tables and make the creation of new tables in the destination database easier.

 

In order to be able to work correctly, the flat file connection needs to have a valid flat file format, so its correct configuration is very important and should be done with caution.

 

To create a new flat file format, the user should choose the Add a new Flat File format from the Flat File Connection submenu at the main menu.

 

User should provide the following information:

  • File format – the name of the user-defined file format

  • Code page – by default ‘1252’

  • Column delimiter – by default ‘Comma’

  • Row delimiter – by default ‘CRLF’

  • Text qualifier – by default ‘_x0022_’

  • Metadata server – the server on which the metadata is stored

  • Metadata database – the database in which the metadata is located

  • Metadata schema – the schema of the metadata

  • Metadata table – the table of the metadata

 

 

Parameters

 

Based on the type of the created connection, parameters will be automatically created and added to the box ‘Parameters in Project’, which means they are automatically bonded to the project that is being worked on. An option to create standalone parameters also exists.

It is possible to edit, delete or clone parameters and to check dependencies on packages in the project.

 

 
Packages

 

Creating a package means defining the configuration and handling the execution of the desired data integration actions. To create a package user should populate logically ordered fields and define what kind of data transformation or movements he wants.

 

All packages added to the project will at first have the sequence number set to zero, but it can be changed at any time. This number is used to determine the package execution order. If two packages have the same sequence number, they will be run in parallel, otherwise the packages will be run sequentially (from lower to higher numbers).

 

In order to make the package definition process easier, a set of predefined package pattern exists. These patterns will be explained in the following paragraphs.

 

 

Data Flow Package Pattern

 

This package type allows the user to move data from a specific location to the wanted destination, and it is based on the principle source-destination movement. For this purpose, the user creates two kinds of connections:

  • The source connection- the starting point from which the data is transferred and

  • Source Query- valid SQL query or valid SQL Expression

  • The destination connection- the endpoint at which the newly moved data will be stored

  • Schema- the schema of the destination table

  • Table- the destination table to write the content into

 

If the Source Query is expression than the checkbox ‘Source Query is Expression’ should be checked and ‘Evaluate Expression’ button is enabled. This is very helpful because it allows the user to check expression validity before running the package.

 

 
Execute SQL Package Pattern

 

The Execute SQL package runs SQL statements or stored procedures. User should define:

  • The source connection- the location where the query is going to be executed

  • Query- specific SQL query, another case is Expression.

 

Same as in the Data Flow Package Pattern, if the Source Query is expression than the checkbox ‘Source Query is Expression’ should be checked and ‘Evaluate Expression’ button is enabled. This is very helpful because it allows the user to check expression validity before running the package.

 

This package pattern has four different code generators:

 
  1. Query Config- Dim Table Merge (Standard)

 

The generated Merge statement conforms to the Kimball ‘Type 1’ and ‘Type 2’ dimension model. Use the SnapMart metadata application to specify the dimension type to be used for each attribute The MERGE statement is used when you want to apply changes that include inserts and updates of data based on the contents of the associated ‘staging’ table. In order to do that by using the Code generator ‘Query Config – Dim Table Merge’, you will need to provide the Metadata Database, which is the database parameter for the specific OleDb Connection. After that, you will need to provide the Merge Source (Staging Table) Schema, Table and Database Parameter and Merge Destination (Dimension Table) Schema, Table and Database Parameter, and the Added Dimension Column Naming Schema. The Added Dimension Column Naming Schema can be chosen from the dropdown as predefined ‘Standard’ or can be added by clicking ‘Open Added Column Naming Form’ button.

  2. Query Config- Fact Table Partition (Standard)

Very large tables can be difficult to manage because of their size and the amount of time it takes to maintain. ELTSnap includes ‘code generators’ which can be used to ‘partition’ these large tables into manageable segments. Before using the table partitioning code generators, you must first manually create the tables’s associated partition function and scheme. For more information on this task, see:

https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-2017

In addition, for the ELTSnap code generators to work, the partitioning column must be defined as a ‘date’ datatype, and all indexes for the table need to be ‘partitioned aligned’.

 

 

  3. Query Config- Fact Table Merge (Basic)

 

The generated Merge statement conforms to the Kimball ‘Type 1’ and ‘Type 2’ dimension model. The MERGE statement is used when you want to apply changes that include inserts and updates of data based on the contents of the associated ‘staging’ table. 

 

In order to do that by using the Code generator ‘Query Config – Fact Table Merge’, you will need to provide the Metadata Database, which is the database parameter for the specific OleDb Connection. After that, you will need to provide the Merge Source (Staging Table) Schema, Table and Database Parameter and Merge Destination (Fact Table) Schema, Table and Database Parameter, and the Added Dimension Column Naming Schema. 

 

The Added Dimension Column Naming Schema can be chosen from the dropdown as predefined ‘Standard’ or can be added by clicking ‘Open Added Column Naming Form’ button.

 

  4. Query Config- Fact Table Switch (Standard)

 

One of the benefits of the table partitioning is that you can speed up loading the data by using partition switching. Partition switching moves entire partition between tables almost instantly. To switch the partitions between tables, you need to follow the following requirements:

  • The source and destination partitions must have identical columns, indexes and use the same partition column, scheme and function

  • The source and destination partitions must exist on the same filegroup

  • The ‘switch-to’ destination partition must be empty.

 

 

Foreach Data Flow Package Pattern

 

By using the foreach loop container, the user can create a looping structure that iterates through a collection of objects or data values and take actions specific to each of those members. User should define:

  • Foreach Connection

  • Foreach Query Expression

  • Source connection

  • Source Query Expression

  • Destination Connection

  • Destination Schema

  • Destination table

 

 

Foreach Execute SQL Package Pattern

 

The Execute Process package pattern allows the user to use their own executable application to run. In that case, the user can run an application for data manipulation. User should define:

  • Foreach Connection

  • Foreach Query Expression

  • Source connection

  • Query Expression

 

 

Execute Process Package Pattern

 

The Execute Package Pattern allows the user to run the manually defined process.

 

User should define:

  • Process- in the form of the ‘Expression’

  • Arguments- for the process execution in the form of scripts/ command functions defined by Expression

  • Working Directory- this is optional

 

 

Json Table Transfer Package Pattern

 

The Json Table Transfer Package Pattern allows:

  • Json Table Transfer from Source Connection to Destination Connection

  • Json Table Transfer Flat File source to Destination Connection

  • Json Table Transfer from Source Connection to Flat File Connection.

 

User has three options regarding table selection for the transfer:

  • Include All Tables

  • Include Only ‘Table List’ Selections

  • Include All Except ‘Table List’ Selection, where user select/ unselect wanted tables from the list.

 

User has also an option to use a Flat File Connection, which will automatically provide specific Flat File Source which consists of:

  • File Path

  • File Format.

 

 

Environments

 

Environments are used to set parameter value for a specific environment. They can be manually added. The user creates an environment by Add new selection and from the drop-down list, select specify Parameter Name and Parameter Value.

 

 

Run

 

The run option is used to run packages using the custom ELT Runtime Framework.

It is possible to choose one of two options:

  • Run project - run the selected project as a unit.

  • Run single package - run the selected package.

 

Change connection

 

On the ELTSnap main screen, user can change the connection to a specific server and the Database. There are two Authentications available:

  • Windows authentication

  • SQL Server Authentication

In the case of SQL Server Authentication user needs to enter SQL Server Credentials:

  • Username

  • Password.

 
 
 
 
 
 
 

2019 BI Tracks, LLC

  • Facebook - White Circle
  • Twitter - White Circle
  • LinkedIn - White Circle