Azure Data Factory: Building ETL Pipelines

Azure Data Factory (ADF) is Microsoft’s cloud ETL service. Move and transform data at scale without managing infrastructure.

Core Components

  • Pipelines: Logical grouping of activities
  • Activities: Tasks like copy, transform, run stored proc
  • Datasets: Reference to data (tables, files)
  • Linked Services: Connection strings to sources
  • Triggers: When to run (schedule, event)

Simple Copy Pipeline

Copy data from Blob Storage to SQL Database:

  1. Create Linked Services for Blob and SQL
  2. Create Datasets for source (CSV) and sink (table)
  3. Create Pipeline with Copy activity
  4. Map columns
  5. Add trigger

Data Flows

For transformations, use Mapping Data Flows—visual ETL designer that runs on Spark under the hood. No code required for common transforms.

References


Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.