Hero image for SQL Query Advanced: Window Functions, CTEs & Beyond

SQL Query Advanced: Window Functions, CTEs & Beyond

sql-server database query performance

Prerequisites: Understand basic SQL and joins. See DB 02 Software Layer.

You know SELECT, WHERE, JOIN. Now let’s level up with analytical query superpowers.


Part A: Window Functions — Analytics Without GROUP BY

1. The Problem: Ranking Without Losing Detail

Scenario: Show each employee with their salary AND their rank within their department.

-- Traditional GROUP BY loses individual rows:
SELECT Department, MAX(Salary) FROM Employees GROUP BY Department;
-- Result: Only shows ONE row per department!

What we want: Keep ALL employees, but add a rank column.

2. Solution: Window Functions

SELECT 
    Name,
    Department,
    Salary,
    ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS DeptRank
FROM Employees;

Result:

NameDepartmentSalaryDeptRank
AliceEngineering1500001
BobEngineering1200002
CarolEngineering1000003
DaveSales900001
EveSales850002
graph LR
    subgraph "Traditional GROUP BY"
        GB[One row per group]
    end
    
    subgraph "Window Function"
        WF[All rows + calculated column]
    end
    
    style GB fill:#e74c3c,color:#fff
    style WF fill:#27ae60,color:#fff

3. Key Window Functions

FunctionDescriptionExample Use
ROW_NUMBER()Sequential number (no ties)Pagination
RANK()Ranking with gaps (1,2,2,4)Competition ranking
DENSE_RANK()Ranking without gaps (1,2,2,3)Continuous ranking
NTILE(n)Divide into n bucketsPercentiles
LAG(col, n)Previous row valueCompare to yesterday
LEAD(col, n)Next row valueForecast comparison
SUM() OVERRunning totalCumulative sales
AVG() OVERMoving averageTrend analysis

4. PARTITION BY vs ORDER BY

SELECT 
    Name,
    Department,
    Salary,
    -- Partition = reset the window for each group
    SUM(Salary) OVER (PARTITION BY Department) AS DeptTotal,
    -- Order = running total within partition
    SUM(Salary) OVER (PARTITION BY Department ORDER BY Salary) AS RunningTotal
FROM Employees;
NameDepartmentSalaryDeptTotalRunningTotal
CarolEngineering100000370000100000
BobEngineering120000370000220000
AliceEngineering150000370000370000

5. Practical Examples

Running Total (Cumulative Sum)

SELECT 
    OrderDate,
    Amount,
    SUM(Amount) OVER (ORDER BY OrderDate) AS RunningTotal
FROM Orders;

Compare to Previous Row (LAG)

SELECT 
    OrderDate,
    Amount,
    LAG(Amount, 1) OVER (ORDER BY OrderDate) AS PreviousAmount,
    Amount - LAG(Amount, 1) OVER (ORDER BY OrderDate) AS Difference
FROM Orders;

Window Frames: The Moving Average Gotcha

By default, ORDER BY in a window function uses RANGE UNBOUNDED PRECEDING — meaning it includes ALL rows from the start up to the current row. This is fine for running totals, but wrong for moving averages.

-- ❌ Wrong: This is a "cumulative average", not a 3-day moving average
SELECT Date, Sales,
       AVG(Sales) OVER (ORDER BY Date) AS WrongMovingAvg
FROM DailySales;

-- ✅ Correct: Specify the frame to get a true 3-day moving average
SELECT Date, Sales,
       AVG(Sales) OVER (
           ORDER BY Date 
           ROWS BETWEEN 2 PRECEDING AND CURRENT ROW  -- Frame clause!
       ) AS ThreeDayMovingAvg
FROM DailySales;
Frame ClauseMeaningUse Case
ROWS BETWEEN 2 PRECEDING AND CURRENT ROWCurrent row + 2 rows before3-period moving average
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWAll rows from start to currentRunning total (default)
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWINGCurrent row ± 1 rowSmoothing/centered average

Top N per Group

WITH RankedEmployees AS (
    SELECT 
        Name, Department, Salary,
        ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS Rank
    FROM Employees
)
SELECT * FROM RankedEmployees WHERE Rank <= 3;

Part B: Common Table Expressions (CTE)

6. The Problem: Unreadable Subqueries

-- Nested subqueries = nightmare to read:
SELECT * FROM (
    SELECT * FROM (
        SELECT * FROM Orders WHERE Status = 'Active'
    ) AS ActiveOrders
    WHERE Amount > 1000
) AS LargeActiveOrders
WHERE CustomerID IN (SELECT CustomerID FROM VIPCustomers);

7. Solution: CTE

WITH ActiveOrders AS (
    SELECT * FROM Orders WHERE Status = 'Active'
),
LargeActiveOrders AS (
    SELECT * FROM ActiveOrders WHERE Amount > 1000
),
VIPLargeOrders AS (
    SELECT * FROM LargeActiveOrders 
    WHERE CustomerID IN (SELECT CustomerID FROM VIPCustomers)
)
SELECT * FROM VIPLargeOrders;

Benefits:

  • ✅ Readable step-by-step logic
  • ✅ Can reference same CTE multiple times
  • ✅ Self-documenting names

⚠️ Performance Note: CTEs improve readability, not necessarily performance. In SQL Server, a CTE is usually just “syntax sugar” — the query optimizer treats it like a subquery and may re-execute the logic every time it’s referenced. If you reference a complex CTE multiple times, consider using a Temp Table (#Table) instead for better performance.

8. Recursive CTE — The Magic

Scenario: Show organization hierarchy (who reports to whom).

WITH OrgChart AS (
    -- Anchor: Start with CEO (no manager)
    SELECT EmployeeID, Name, ManagerID, 0 AS Level
    FROM Employees
    WHERE ManagerID IS NULL
    
    UNION ALL
    
    -- Recursive: Find employees who report to previous level
    SELECT e.EmployeeID, e.Name, e.ManagerID, oc.Level + 1
    FROM Employees e
    INNER JOIN OrgChart oc ON e.ManagerID = oc.EmployeeID
)
SELECT * FROM OrgChart ORDER BY Level, Name;

Result:

EmployeeIDNameManagerIDLevel
1CEO JohnNULL0
2VP Alice11
3VP Bob11
4Manager Carol22
5Manager Dave22
graph TD
    CEO[CEO John - Level 0]
    CEO --> VP1[VP Alice - Level 1]
    CEO --> VP2[VP Bob - Level 1]
    VP1 --> MGR1[Manager Carol - Level 2]
    VP1 --> MGR2[Manager Dave - Level 2]

🚨 Safety Tip: MAXRECURSION

Recursive CTEs can cause infinite loops if data has cycles (e.g., an employee reports to themselves). Use OPTION (MAXRECURSION n) as a safety net:

SELECT * FROM OrgChart
OPTION (MAXRECURSION 100);  -- Stop after 100 levels (default is 100)
-- Use MAXRECURSION 0 for unlimited (dangerous!)

Part C: Cursors — When You MUST Go Row-by-Row

9. What is a Cursor?

A Cursor processes rows one at a time, like a for-loop.

DECLARE @EmployeeID INT, @Name NVARCHAR(100);

DECLARE employee_cursor CURSOR FOR
    SELECT EmployeeID, Name FROM Employees;

OPEN employee_cursor;
FETCH NEXT FROM employee_cursor INTO @EmployeeID, @Name;

WHILE @@FETCH_STATUS = 0
BEGIN
    PRINT 'Processing: ' + @Name;
    -- Do something with each row
    FETCH NEXT FROM employee_cursor INTO @EmployeeID, @Name;
END

CLOSE employee_cursor;
DEALLOCATE employee_cursor;

10. Why You Should AVOID Cursors

AspectSet-BasedCursor
SpeedProcess all rows at onceOne row at a time
PerformanceOptimized by SQL ServerForces sequential processing
MemoryEfficientHolds open connection
Best Practice✅ Preferred⚠️ Last resort

Convert Cursor to Set-Based

-- Cursor approach (SLOW):
DECLARE cursor... 
WHILE...
    UPDATE Employees SET Bonus = Salary * 0.1 WHERE EmployeeID = @ID;

-- Set-based approach (FAST):
UPDATE Employees SET Bonus = Salary * 0.1;  -- All rows at once!

When cursors ARE acceptable:

  • Complex row-by-row logic that can’t be expressed in SQL
  • Calling stored procedures for each row
  • Administrative tasks (rebuilding indexes one by one)

Part D: Subqueries & Correlated Subqueries

11. Subquery in WHERE

-- Find employees earning more than average:
SELECT Name, Salary
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);

12. Correlated Subquery (References Outer Query)

-- Find employees earning more than their department average:
SELECT e.Name, e.Department, e.Salary
FROM Employees e
WHERE e.Salary > (
    SELECT AVG(Salary) 
    FROM Employees 
    WHERE Department = e.Department  -- References outer 'e'
);

Warning: Correlated subqueries run once PER ROW — can be slow!


Summary: Quick Reference

Window Functions

FunctionUse Case
ROW_NUMBER()Pagination, deduplication
RANK() / DENSE_RANK()Competition rankings
LAG() / LEAD()Compare to previous/next row
SUM/AVG OVERRunning totals, moving averages

CTE vs Subquery

FeatureCTESubquery
Readability✅ Excellent❌ Nested mess
Reusability✅ Reference multiple times❌ Repeat code
Recursion✅ Supported❌ Not supported

Best Practices

  1. Prefer Window Functions over self-joins for ranking
  2. Use CTEs for complex multi-step queries
  3. Avoid Cursors — think set-based first
  4. Watch Correlated Subqueries — they run per row

💡 Practice Questions

Conceptual

  1. What is the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()? Provide an example where each would give different results.

  2. Explain what PARTITION BY does in a window function. How is it different from GROUP BY?

  3. What is a Recursive CTE? When would you use one?

  4. Why should you generally avoid cursors in SQL Server?

Hands-on

-- Write a query to find the top 3 highest-paid employees per department using a CTE and window function.

-- Calculate the month-over-month sales growth using LAG():
-- Show: Month, Sales, PreviousMonthSales, GrowthPercentage
💡 View Answer
-- Top 3 per department
WITH RankedEmployees AS (
    SELECT 
        DepartmentID,
        EmployeeName,
        Salary,
        ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS rn
    FROM Employees
)
SELECT DepartmentID, EmployeeName, Salary
FROM RankedEmployees
WHERE rn <= 3;

-- Month-over-month growth
SELECT 
    SalesMonth,
    Sales,
    LAG(Sales) OVER (ORDER BY SalesMonth) AS PreviousMonthSales,
    -- Pro Tip: Use NULLIF to avoid divide-by-zero errors!
    -- NULLIF(x, 0) returns NULL if x = 0, preventing arithmetic errors
    ROUND((Sales - LAG(Sales) OVER (ORDER BY SalesMonth)) * 100.0 
          / NULLIF(LAG(Sales) OVER (ORDER BY SalesMonth), 0), 2) AS GrowthPercentage
FROM MonthlySales;

Scenario

  1. Optimization: A developer wrote a query using a cursor to calculate running totals. The query takes 5 minutes on 1M rows. How would you rewrite it using a window function?

  2. Real-world: You need to display an organization chart showing each employee, their manager chain (up to CEO), and their level in the hierarchy. Which SQL technique would you use and why?