Extract Transform Load (ETL) – What it’s All About

Overview

Extract Transform Load (ETL) is a three-step process that extracts data from a variety of operational applications (such as Customer Databases, Sales Databases, and Product Databases), transforms and improves the consistency of the data, and ultimately loads the finished data into a Data Warehouse for use. The ETL process provides meaningful insights from data by consolidating it in ways in which it will be able to be analyzed. These analytics will provide insights for decision-makers, giving opportunities to anticipate changes in markets, analyze consumer behavior, and make inferences that benefit the business.

The ETL Process

Extract

The first step is to collect data from an external source or sources. Usually, this raw data is unstructured or stored in different file formats depending on the source’s Database Management System, Operating System before extraction. Therefore, extraction may be the most complicated and time-consuming step in the ETL process. The extracted data will need to be transferred to a staging area where it can be manipulated and transformed.

There are two methods of extracting data: Full Extraction and Incremental Extraction. Full extraction collects every piece of data available from a source system, regardless of its data type. The advantages of this method are that it provides every data available and is completely up-to-date.

Transform

Data within a warehouse needs to be in a consistent format to make sense to its users. If half a company’s data is collected using the metric system and the other half the imperial system, it would be a tough time drawing inferences from this data! Often, extracted data is unstructured and will provide little meaning or value.

To meet its destination Data Warehouse’s storing requirements, it must go through a transformation process. The transformation process allows for the data to joined with the existing data in the data warehouse. The transformation process may be complex because data can be extracted from multiple different sources but needs to end up in an identical format.

The transformation stage creates value from the data as it is cleaned and formatted to meet its end-users requirements.

Load

The final step is to load the cleaned data into a data warehouse. The data warehouse is designed to be a useful tool for end-users who are attempting to analyze the data. Whereas the initial raw data collected is unstructured and stored in multiple locations, the data at this step will be cleaned, and in a consistent format. Only now will it be ready to be used by business intelligence tools such as Tableau or Microsoft Power BI.

Advances in Data Warehousing technologies, such as Cloud Computing, have allowed for Data Warehouses to store massive amounts of data. This data is useless to its end-users if it is not formatted correctly and presented in a way that is easy to interpret.

ETL Use Cases

Data Warehouse

ly loaded into a Data Warehouse. The data stored here is structured and homogeneous, enabling relatively simple analysis across multiple metrics using data from multiple sources. Decision-makers can make sense of the data and use it to their advantage.

Data Transformation in MySQL

MySQL is a relational database management system. It uses a query language to draw inferences from data. Therefore, it is not included with extraction tools. This data needs to be taken from outside sources and put into tables within MySQL.

To import data and provide a place for it within MySQL, you need to write in a query language known as Data Definition Language (DDL). Examples of DDL include: CREATE, ALTER, and DROP code lines. DDL changes

To transform the data or structure of the data holding, you need to use a query language known as Data Manipulation Language (DML). Examples of DML include: SELECT, INSERT, UPDATE, and ROLLBACK. DML allows you to manipulate data to be stored in ways that are meaningful.

Big Data

Corporations have adopted the advantages and insights that Big Data can give them. Massive amounts of data is collected daily; to provide any significance this data needs to be cleaned and put into a format that makes sense in a universal database. When a company is attempting to draw inferences from Big Data, they are dealing with potentially terabytes of data in multiple mediums.

Managing this massive amount of data can be aata managers such as Apache Hadoop have been developed. Hadoop is an open sourced software that stores data on a collection of clustered servers and utilizes an ELT process. This allows for the data to be distributed efficiently on potentially thousands of nodes, is scalable, and can save money on data storage costs.

Hadoop is useful for businesses because it utilizes a “no-schema-on-write” system, meaning the data does not need to be defined before entering Hadoop. If a company needs to collect large amounts of data in real time, such as data collected from a Twitter feed, Hadoop may be a solution as it stores the data directly after being extracted.

ETL vs. ELT

An ELT system will load the data before transforming it into a consistent structure. This allows for the data to be accessed immediately and unlimitedly after extraction. Although this structure may gain traction in the future, currently, tools supporting ETL are not sophisticated enough to handle massive amounts of data.

Bibliography

docs.oracle.com, Paul Lane, “Database Data Warehousing Guide.”, 2005

srmuniv.acin, Lakshmi Priya and Sultana Razia, “ETL Process in Data Warehouse.”

sas.com, SAS Institute Inc., “ETL – What it is and why it matters.”

blogs.technet.microsoft.com, Chadrowe, “Data Warehouse – Anatomy of Extract, Transform, Load (ETL).”, 2009

searchdatamanagement.techtarget.com, Mark Whitehorn, “The ETL process and MySQL

searchdatamanagement.techtarget.com, Jill Dyché, “ETL tools and EDR tools: What’s the difference?”

etl-tools.info, ETL Tools Info, “Data warehouse.”

software.intel.com, Intel® Software, “Extract, Transform, and Load Big Data with Apache Hadoop.”

docs.oracle.com, Oracle Help, “Oracle® Database SQL Reference 10g Release 1 (10.1).”

blog.panoply.io, Roi Avinoam, “ETL vs ELT: The Difference is in the How.” , 2016

dataconomy.com, Tony Branson, “The 5 Best Reasons to Choose MySQL – and its 5 Biggest Challenges.”, 2017

data-warehouses.net, “ETL Process – Guide to Data Warehousing and Business Intelligence.”

dummies.com, Judith Hurwitz, “The Role of Traditional ETL in Big Data.”

 

References

Lane, Paul, et al. “Database Data Warehousing Guide.” Oracle Help Center, Oracle, Dec. 2005, docs.oracle.com. Accessed 10 Sept. 2017.

Priya, Lakshmi , and Razia Sultana. “ETL Process in Data Warehouse.” SRM Unviersity , SRM University, www.srmuniv.ac.in. Accessed 10 Sept. 2017.

“ETL – What it is and why it matters.” SAS, SAS Institute Inc., www.sas.com. Accessed 10 Sept. 2017.

Chadrowe. “Data Warehouse – Anatomy of Extract, Transform, Load (ETL).” The Official System Center Service Manager Blog, Microsoft, 3 June 2009, blogs.technet.microsoft.com. Accessed 11 Sept. 2017.

Whitehorn, Mark. “The ETL process and MySQL.” SearchDataManagement, TechTarget, searchdatamanagement.techtarget.com. Accessed 10 Sept. 2017.

Dyché, Jill. “ETL tools and EDR tools: What’s the difference?” SearchDataManagement, TechTarget, searchdatamanagement.techtarget.com. Accessed 11 Sept. 2017.

“Data warehouse.” ETL Tools Info, ETL Tools Info, etl-tools.info. Accessed 10 Sept. 2017.

“Extract, Transform, and Load Big Data with Apache Hadoop.” Intel® Software, Intel, software.intel.com. Accessed 10 Sept. 2017.

“Oracle® Database SQL Reference 10g Release 1 (10.1).” Oracle Help , Oracle, docs.oracle.com. Accessed 10 Sept. 2017.

Avinoam, Roi. “ETL vs ELT: The Difference is in the How.” Panoply.io Blog , Panoply, 6 Nov. 2016, blog.panoply.io. Accessed 10 Sept. 2017.

Branson, Tony, et al. “The 5 Best Reasons to Choose MySQL – and its 5 Biggest Challenges.” Dataconomy, Dataconomy, 6 Apr. 2017, dataconomy.com. Accessed 10 Sept. 2017.

“ETL Process.” The ETL Process in Data Warehousing – An Architectural Overview, Data-Warehosues.net, data-warehouses.net. Accessed 10 Sept. 2017.

Hurwitz, Judirh , et al. “The Role of Traditional ETL in Big Data.” Dummies, Dummies, www.dummies.com. Accessed 10 Sept. 2017.

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *