Glossary: ETLGlossary

ETL — short for extract, transform, and load — is a process where data is moves from multiple sources into a centralized repository, such as a data warehouse.

ETL, as its name implies, involves three steps:

  1. Extract: The data is pulled from a various sources — databases, flat files, APIs, and other data repositories — with the goal of gathering all relevant data needed for analysis.
  2. Transform: In this step, the extracted data is cleaned, transformed, formatted, and prepared for loading. This can involve various operations such as filtering out duplicate or invalid data, sorting, aggregating, and applying business rules to ensure the data is in the correct format and quality for analysis.
  3. Load: The transformed data is loaded into the destination data repository, such as a data warehouse or data lake.

The end result of ETL is a more complete collection of data gathered from a wide array of sources that provides a unified view in order to provide more effective data analysis, insights, and decision-making.

Are we missing anything? Let us know
Was this page helpful?