Azure Synapse Analytics’ serverless SQL pool lets you query data lake files with T-SQL. No infrastructure to manage, pay only for queries. Here’s how it works.
Query Data Lake with SQL
-- Query Parquet files directly
SELECT
ProductCategory,
SUM(SalesAmount) as TotalSales
FROM OPENROWSET(
BULK 'https://myaccount.dfs.core.windows.net/data/sales/*.parquet',
FORMAT = 'PARQUET'
) AS sales
GROUP BY ProductCategory
ORDER BY TotalSales DESC;
Create External Tables
CREATE EXTERNAL TABLE Sales (
ProductId INT,
SalesAmount DECIMAL(10,2),
SalesDate DATE
)
WITH (
LOCATION = 'sales/*.parquet',
DATA_SOURCE = MyDataLake,
FILE_FORMAT = ParquetFormat
);
Use Cases
- Ad-hoc exploration: Query without loading
- Data transformation: CETAS for ETL
- Logical data warehouse: Views over lake files
Pricing Model
Pay per TB scanned. Use partitioning and column pruning to minimize costs. Parquet format is highly recommended for cost efficiency.
References
Discover more from C4: Container, Code, Cloud & Context
Subscribe to get the latest posts sent to your email.