Tips and Tricks – Use Window Functions for Running Calculations

Self-joins and subqueries make SQL calculations messy and
slow. Need running totals? Join the table to itself. Need rankings? Nested subqueries. Need previous row values?
More self-joins. Window functions eliminate this complexity, calculating aggregates, rankings, and comparisons
across ordered sets with elegant, performant syntax.

This guide covers production-ready window function patterns
that replace complex queries with single, readable statements. We’ll transform convoluted SQL into clean, efficient
analytics.

Why Window Functions Change SQL

The Problem with Traditional Approaches

Without window functions, you resort to:

  • Self-joins: Join table to itself multiple times, killing performance
  • Correlated subqueries: Row-by-row execution is painfully slow
  • Temporary tables: Complex multi-step processes
  • Application logic: Move calculations out of database
  • Unreadable code: Nobody understands nested queries

Window Function Benefits

  • No self-joins: Single table scan instead of multiple
  • Readable: Clear intent in single query
  • Performant: Optimized execution, often 10-100x faster
  • Powerful: Running totals, rankings, moving averages, lag/lead
  • Composable: Combine multiple window functions easily

Pattern 1: Ranking Functions

ROW_NUMBER, RANK, DENSE_RANK

-- Sample sales data
WITH sales AS (
    SELECT 'Alice' as salesperson, 'Q1' as quarter, 100000 as revenue UNION ALL
    SELECT 'Bob', 'Q1', 120000 UNION ALL
    SELECT 'Alice', 'Q2', 150000 UNION ALL
    SELECT 'Bob', 'Q2', 150000 UNION ALL
    SELECT 'Carol', 'Q1', 80000 UNION ALL
    SELECT 'Carol', 'Q2', 95000
)

SELECT
    salesperson,
    quarter,
    revenue,
    
    -- ROW_NUMBER: Unique sequential number (1,2,3,4...)
    ROW_NUMBER() OVER (ORDER BY revenue DESC) as row_num,
    
    -- RANK: Gaps for ties (1,2,2,4...)
    RANK() OVER (ORDER BY revenue DESC) as rank,
    
    -- DENSE_RANK: No gaps for ties (1,2,2,3...)
    DENSE_RANK() OVER (ORDER BY revenue DESC) as dense_rank,
    
    -- Rank within partition (per quarter)
    RANK() OVER (PARTITION BY quarter ORDER BY revenue DESC) as rank_in_quarter

FROM sales
ORDER BY revenue DESC;

/* Results:
salesperson | quarter | revenue | row_num | rank | dense_rank | rank_in_quarter
Alice       | Q2      | 150000  | 1       | 1    | 1          | 1
Bob         | Q2      | 150000  | 2       | 1    | 1          | 1
Bob         | Q1      | 120000  | 3       | 3    | 2          | 1
Alice       | Q1      | 100000  | 4       | 4    | 3          | 2
*/

Top N Per Group

-- Get top 3 products per category by sales
WITH ranked_products AS (
    SELECT
        category,
        product_name,
        total_sales,
        ROW_NUMBER() OVER (
            PARTITION BY category 
            ORDER BY total_sales DESC
        ) as rank
    FROM product_sales
)

SELECT 
    category,
    product_name,
    total_sales,
    rank
FROM ranked_products
WHERE rank <= 3
ORDER BY category, rank;

-- Common use cases:
-- - Top N customers per region
-- - Latest N orders per customer
-- - Highest N scores per student

Pattern 2: Aggregate Window Functions

Running Totals and Moving Averages

-- Daily sales with running totals and moving averages
SELECT
    sale_date,
    daily_revenue,
    
    -- Running total (cumulative sum)
    SUM(daily_revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total,
    
    -- Grand total for percentage calculation
    SUM(daily_revenue) OVER () as grand_total,
    
    -- Percentage of grand total
    ROUND(100.0 * daily_revenue / SUM(daily_revenue) OVER (), 2) as pct_of_total,
    
    -- 7-day moving average
    AVG(daily_revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as moving_avg_7day,
    
    -- 30-day moving average
    AVG(daily_revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) as moving_avg_30day

FROM daily_sales
ORDER BY sale_date;

Running Totals by Group

-- Running total per customer
SELECT
    customer_id,
    order_date,
    order_amount,
    
    -- Running total for this customer
    SUM(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as customer_lifetime_value,
    
    -- Order number for this customer
    ROW_NUMBER() OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) as order_sequence,
    
    -- Average order value so far
    AVG(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as avg_order_value

FROM orders
ORDER BY customer_id, order_date;

Pattern 3: LAG and LEAD Functions

Previous and Next Row Values

-- Compare current row to previous/next
SELECT
    product_id,
    month,
    monthly_sales,
    
    -- Previous month's sales
    LAG(monthly_sales, 1) OVER (
        PARTITION BY product_id
        ORDER BY month
    ) as prev_month_sales,
    
    -- Next month's sales
    LEAD(monthly_sales, 1) OVER (
        PARTITION BY product_id
        ORDER BY month
    ) as next_month_sales,
    
    -- Month-over-month growth
    monthly_sales - LAG(monthly_sales, 1) OVER (
        PARTITION BY product_id
        ORDER BY month
    ) as mom_growth,
    
    -- Month-over-month growth percentage
    ROUND(
        100.0 * (monthly_sales - LAG(monthly_sales, 1) OVER (
            PARTITION BY product_id ORDER BY month
        )) / NULLIF(LAG(monthly_sales, 1) OVER (
            PARTITION BY product_id ORDER BY month
        ), 0),
        2
    ) as mom_growth_pct,
    
    -- Year-over-year comparison (12 months ago)
    LAG(monthly_sales, 12) OVER (
        PARTITION BY product_id
        ORDER BY month
    ) as same_month_last_year

FROM product_monthly_sales
ORDER BY product_id, month;

Identifying Changes

-- Detect when values change
WITH user_status AS (
    SELECT
        user_id,
        status_date,
        status,
        
        -- Previous status
        LAG(status) OVER (
            PARTITION BY user_id
            ORDER BY status_date
        ) as prev_status
    
    FROM user_status_history
)

SELECT
    user_id,
    status_date,
    status,
    prev_status,
    
    -- Flag status changes
    CASE 
        WHEN prev_status IS NULL THEN 'Initial'
        WHEN status != prev_status THEN 'Changed'
        ELSE 'No Change'
    END as change_type

FROM user_status
WHERE prev_status IS NULL OR status != prev_status
ORDER BY user_id, status_date;

Pattern 4: Frame Clauses

ROWS vs RANGE

-- Understanding frame clauses
SELECT
    order_date,
    daily_sales,
    
    -- ROWS: Physical row count
    SUM(daily_sales) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as sum_last_3_rows,
    
    -- RANGE: Logical value range (handles ties)
    SUM(daily_sales) OVER (
        ORDER BY order_date
        RANGE BETWEEN INTERVAL '2 days' PRECEDING AND CURRENT ROW
    ) as sum_last_3_days,
    
    -- Custom frame: Current row and next 2
    AVG(daily_sales) OVER (
        ORDER BY order_date
        ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
    ) as avg_next_3_days,
    
    -- Centered moving average (1 before, current, 1 after)
    AVG(daily_sales) OVER (
        ORDER BY order_date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) as centered_avg_3day

FROM daily_sales
ORDER BY order_date;

Pattern 5: FIRST_VALUE and LAST_VALUE

First and Last in Window

-- Get first and last values in window
SELECT
    employee_id,
    salary_date,
    salary,
    
    -- First salary for this employee
    FIRST_VALUE(salary) OVER (
        PARTITION BY employee_id
        ORDER BY salary_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as initial_salary,
    
    -- Latest salary
    LAST_VALUE(salary) OVER (
        PARTITION BY employee_id
        ORDER BY salary_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as current_salary,
    
    -- Salary growth since hire
    salary - FIRST_VALUE(salary) OVER (
        PARTITION BY employee_id
        ORDER BY salary_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as total_growth,
    
    -- Highest salary ever for this employee
    MAX(salary) OVER (
        PARTITION BY employee_id
        ORDER BY salary_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as peak_salary_to_date

FROM salary_history
ORDER BY employee_id, salary_date;

Pattern 6: Cohort Analysis

User Retention Analysis

-- Calculate user retention by cohort
WITH user_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', first_login) as cohort_month,
        DATE_TRUNC('month', login_date) as activity_month
    FROM user_logins
),

cohort_sizes AS (
    SELECT
        cohort_month,
        COUNT(DISTINCT user_id) as cohort_size
    FROM user_cohorts
    GROUP BY cohort_month
),

retention_data AS (
    SELECT
        uc.cohort_month,
        uc.activity_month,
        COUNT(DISTINCT uc.user_id) as active_users,
        cs.cohort_size,
        
        -- Months since cohort start
        EXTRACT(MONTH FROM AGE(uc.activity_month, uc.cohort_month)) as months_since_cohort
    
    FROM user_cohorts uc
    JOIN cohort_sizes cs ON uc.cohort_month = cs.cohort_month
    GROUP BY uc.cohort_month, uc.activity_month, cs.cohort_size
)

SELECT
    cohort_month,
    activity_month,
    months_since_cohort,
    active_users,
    cohort_size,
    
    -- Retention rate
    ROUND(100.0 * active_users / cohort_size, 2) as retention_pct,
    
    -- Previous month retention for comparison
    LAG(ROUND(100.0 * active_users / cohort_size, 2)) OVER (
        PARTITION BY cohort_month
        ORDER BY activity_month
    ) as prev_month_retention_pct

FROM retention_data
ORDER BY cohort_month, activity_month;

Pattern 7: Percentiles and Distribution

NTILE and Percentile Functions

-- Divide data into buckets
SELECT
    customer_id,
    total_purchases,
    
    -- Divide into quartiles (4 groups)
    NTILE(4) OVER (ORDER BY total_purchases) as quartile,
    
    -- Divide into deciles (10 groups)
    NTILE(10) OVER (ORDER BY total_purchases) as decile,
    
    -- Divide into percentiles (100 groups)
    NTILE(100) OVER (ORDER BY total_purchases) as percentile,
    
    -- Calculate actual percentile rank (0-1)
    PERCENT_RANK() OVER (ORDER BY total_purchases) as pct_rank,
    
    -- Cumulative distribution
    CUME_DIST() OVER (ORDER BY total_purchases) as cum_dist,
    
    -- Median purchases
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_purchases) 
        OVER () as median_purchases

FROM customer_purchases
ORDER BY total_purchases DESC;

-- Identify high-value customers (top 20%)
WHERE NTILE(5) OVER (ORDER BY total_purchases) = 5;

Pattern 8: Real-World Example – Sales Dashboard

Comprehensive Analytics Query

-- Complete sales analytics with window functions
WITH daily_metrics AS (
    SELECT
        sale_date,
        region,
        SUM(amount) as daily_revenue,
        COUNT(*) as daily_orders,
        COUNT(DISTINCT customer_id) as daily_customers
    FROM sales
    GROUP BY sale_date, region
)

SELECT
    sale_date,
    region,
    daily_revenue,
    daily_orders,
    daily_customers,
    
    -- Running totals
    SUM(daily_revenue) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS UNBOUNDED PRECEDING
    ) as ytd_revenue,
    
    -- Moving averages
    AVG(daily_revenue) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as avg_revenue_7day,
    
    -- Growth metrics
    daily_revenue - LAG(daily_revenue, 1) OVER (
        PARTITION BY region ORDER BY sale_date
    ) as daily_growth,
    
    ROUND(100.0 * (
        daily_revenue - LAG(daily_revenue, 7) OVER (
            PARTITION BY region ORDER BY sale_date
        )
    ) / NULLIF(LAG(daily_revenue, 7) OVER (
        PARTITION BY region ORDER BY sale_date
    ), 0), 2) as wow_growth_pct,
    
    -- Rankings
    RANK() OVER (
        PARTITION BY sale_date
        ORDER BY daily_revenue DESC
    ) as region_rank_daily,
    
    -- Peak performance
    MAX(daily_revenue) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS UNBOUNDED PRECEDING
    ) as peak_daily_revenue,
    
    -- Performance vs peak
    ROUND(100.0 * daily_revenue / MAX(daily_revenue) OVER (
        PARTITION BY region
        ORDER BY sale_date
        ROWS UNBOUNDED PRECEDING
    ), 2) as pct_of_peak

FROM daily_metrics
ORDER BY region, sale_date;

Performance Optimization

Index Strategy

-- Window functions benefit from proper indexes
-- Index on PARTITION BY and ORDER BY columns

-- For: PARTITION BY customer_id ORDER BY order_date
CREATE INDEX idx_orders_customer_date 
ON orders(customer_id, order_date);

-- For: PARTITION BY region, product_id ORDER BY sale_date
CREATE INDEX idx_sales_region_product_date
ON sales(region, product_id, sale_date);

-- Include columns used in SELECT to enable index-only scans
CREATE INDEX idx_orders_customer_date_amount
ON orders(customer_id, order_date) 
INCLUDE (amount, product_id);

Common Performance Pitfalls

-- BAD: Separate window functions with same OVER clause
SELECT
    SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as sum1,
    AVG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as avg1,
    COUNT(*) OVER (PARTITION BY customer_id ORDER BY order_date) as cnt1
FROM orders;
-- Database computes same window 3 times

-- GOOD: Reuse window with WINDOW clause
SELECT
    SUM(amount) OVER w as sum1,
    AVG(amount) OVER w as avg1,
    COUNT(*) OVER w as cnt1
FROM orders
WINDOW w AS (PARTITION BY customer_id ORDER BY order_date);
-- Database computes window once, reuses for all functions

Best Practices

  • Use WINDOW clause: Define window once, reuse across multiple functions
  • Index wisely: Create indexes on PARTITION BY and ORDER BY columns
  • Be explicit with frames: Specify ROWS/RANGE clause to avoid surprises
  • Choose right ranking: ROW_NUMBER for unique, RANK for gaps, DENSE_RANK for no gaps
  • Use CTEs for readability: Break complex window queries into steps
  • Test edge cases: NULLs, ties, empty partitions

Common Pitfalls

  • Missing ORDER BY: Window functions often require ORDER BY
  • Default frame is wrong: Default is RANGE UNBOUNDED PRECEDING, not always what you want
  • Forgot NULLIF in divisions: Causes divide-by-zero errors
  • Not handling NULLs: LAG/LEAD return NULL at boundaries
  • Redundant windows: Same OVER clause computed multiple times
  • Wrong frame type: ROWS vs RANGE matters with ties

Key Takeaways

  • Window functions replace self-joins and correlated subqueries
  • Use ROW_NUMBER for deduplication, RANK for rankings
  • LAG/LEAD essential for time-series analysis and growth calculations
  • Running totals and moving averages are simple with proper frame clauses
  • WINDOW clause improves performance by reusing window definitions
  • Index on PARTITION BY and ORDER BY columns for best performance
  • Always specify frame clause explicitly (ROWS/RANGE)

Window functions are SQL’s secret weapon for analytics. Master them, and you’ll write queries that are clearer,
faster, and more powerful than anything possible with joins and subqueries alone.


Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.