SQL Query Advanced: Window Functions, CTEs & Beyond
Prerequisites: Understand basic SQL and joins. See DB 02 Software Layer.
You know SELECT, WHERE, JOIN. Now let’s level up with analytical query superpowers.
Part A: Window Functions — Analytics Without GROUP BY
1. The Problem: Ranking Without Losing Detail
Scenario: Show each employee with their salary AND their rank within their department.
-- Traditional GROUP BY loses individual rows:
SELECT Department, MAX(Salary) FROM Employees GROUP BY Department;
-- Result: Only shows ONE row per department!
What we want: Keep ALL employees, but add a rank column.
2. Solution: Window Functions
SELECT
Name,
Department,
Salary,
ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS DeptRank
FROM Employees;
Result:
| Name | Department | Salary | DeptRank |
|---|---|---|---|
| Alice | Engineering | 150000 | 1 |
| Bob | Engineering | 120000 | 2 |
| Carol | Engineering | 100000 | 3 |
| Dave | Sales | 90000 | 1 |
| Eve | Sales | 85000 | 2 |
graph LR
subgraph "Traditional GROUP BY"
GB[One row per group]
end
subgraph "Window Function"
WF[All rows + calculated column]
end
style GB fill:#e74c3c,color:#fff
style WF fill:#27ae60,color:#fff
3. Key Window Functions
| Function | Description | Example Use |
|---|---|---|
| ROW_NUMBER() | Sequential number (no ties) | Pagination |
| RANK() | Ranking with gaps (1,2,2,4) | Competition ranking |
| DENSE_RANK() | Ranking without gaps (1,2,2,3) | Continuous ranking |
| NTILE(n) | Divide into n buckets | Percentiles |
| LAG(col, n) | Previous row value | Compare to yesterday |
| LEAD(col, n) | Next row value | Forecast comparison |
| SUM() OVER | Running total | Cumulative sales |
| AVG() OVER | Moving average | Trend analysis |
4. PARTITION BY vs ORDER BY
SELECT
Name,
Department,
Salary,
-- Partition = reset the window for each group
SUM(Salary) OVER (PARTITION BY Department) AS DeptTotal,
-- Order = running total within partition
SUM(Salary) OVER (PARTITION BY Department ORDER BY Salary) AS RunningTotal
FROM Employees;
| Name | Department | Salary | DeptTotal | RunningTotal |
|---|---|---|---|---|
| Carol | Engineering | 100000 | 370000 | 100000 |
| Bob | Engineering | 120000 | 370000 | 220000 |
| Alice | Engineering | 150000 | 370000 | 370000 |
5. Practical Examples
Running Total (Cumulative Sum)
SELECT
OrderDate,
Amount,
SUM(Amount) OVER (ORDER BY OrderDate) AS RunningTotal
FROM Orders;
Compare to Previous Row (LAG)
SELECT
OrderDate,
Amount,
LAG(Amount, 1) OVER (ORDER BY OrderDate) AS PreviousAmount,
Amount - LAG(Amount, 1) OVER (ORDER BY OrderDate) AS Difference
FROM Orders;
Window Frames: The Moving Average Gotcha
By default, ORDER BY in a window function uses RANGE UNBOUNDED PRECEDING — meaning it includes ALL rows from the start up to the current row. This is fine for running totals, but wrong for moving averages.
-- ❌ Wrong: This is a "cumulative average", not a 3-day moving average
SELECT Date, Sales,
AVG(Sales) OVER (ORDER BY Date) AS WrongMovingAvg
FROM DailySales;
-- ✅ Correct: Specify the frame to get a true 3-day moving average
SELECT Date, Sales,
AVG(Sales) OVER (
ORDER BY Date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW -- Frame clause!
) AS ThreeDayMovingAvg
FROM DailySales;
| Frame Clause | Meaning | Use Case |
|---|---|---|
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW | Current row + 2 rows before | 3-period moving average |
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | All rows from start to current | Running total (default) |
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING | Current row ± 1 row | Smoothing/centered average |
Top N per Group
WITH RankedEmployees AS (
SELECT
Name, Department, Salary,
ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS Rank
FROM Employees
)
SELECT * FROM RankedEmployees WHERE Rank <= 3;
Part B: Common Table Expressions (CTE)
6. The Problem: Unreadable Subqueries
-- Nested subqueries = nightmare to read:
SELECT * FROM (
SELECT * FROM (
SELECT * FROM Orders WHERE Status = 'Active'
) AS ActiveOrders
WHERE Amount > 1000
) AS LargeActiveOrders
WHERE CustomerID IN (SELECT CustomerID FROM VIPCustomers);
7. Solution: CTE
WITH ActiveOrders AS (
SELECT * FROM Orders WHERE Status = 'Active'
),
LargeActiveOrders AS (
SELECT * FROM ActiveOrders WHERE Amount > 1000
),
VIPLargeOrders AS (
SELECT * FROM LargeActiveOrders
WHERE CustomerID IN (SELECT CustomerID FROM VIPCustomers)
)
SELECT * FROM VIPLargeOrders;
Benefits:
- ✅ Readable step-by-step logic
- ✅ Can reference same CTE multiple times
- ✅ Self-documenting names
⚠️ Performance Note: CTEs improve readability, not necessarily performance. In SQL Server, a CTE is usually just “syntax sugar” — the query optimizer treats it like a subquery and may re-execute the logic every time it’s referenced. If you reference a complex CTE multiple times, consider using a Temp Table (#Table) instead for better performance.
8. Recursive CTE — The Magic
Scenario: Show organization hierarchy (who reports to whom).
WITH OrgChart AS (
-- Anchor: Start with CEO (no manager)
SELECT EmployeeID, Name, ManagerID, 0 AS Level
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
-- Recursive: Find employees who report to previous level
SELECT e.EmployeeID, e.Name, e.ManagerID, oc.Level + 1
FROM Employees e
INNER JOIN OrgChart oc ON e.ManagerID = oc.EmployeeID
)
SELECT * FROM OrgChart ORDER BY Level, Name;
Result:
| EmployeeID | Name | ManagerID | Level |
|---|---|---|---|
| 1 | CEO John | NULL | 0 |
| 2 | VP Alice | 1 | 1 |
| 3 | VP Bob | 1 | 1 |
| 4 | Manager Carol | 2 | 2 |
| 5 | Manager Dave | 2 | 2 |
graph TD
CEO[CEO John - Level 0]
CEO --> VP1[VP Alice - Level 1]
CEO --> VP2[VP Bob - Level 1]
VP1 --> MGR1[Manager Carol - Level 2]
VP1 --> MGR2[Manager Dave - Level 2]
🚨 Safety Tip: MAXRECURSION
Recursive CTEs can cause infinite loops if data has cycles (e.g., an employee reports to themselves). Use
OPTION (MAXRECURSION n)as a safety net:SELECT * FROM OrgChart OPTION (MAXRECURSION 100); -- Stop after 100 levels (default is 100) -- Use MAXRECURSION 0 for unlimited (dangerous!)
Part C: Cursors — When You MUST Go Row-by-Row
9. What is a Cursor?
A Cursor processes rows one at a time, like a for-loop.
DECLARE @EmployeeID INT, @Name NVARCHAR(100);
DECLARE employee_cursor CURSOR FOR
SELECT EmployeeID, Name FROM Employees;
OPEN employee_cursor;
FETCH NEXT FROM employee_cursor INTO @EmployeeID, @Name;
WHILE @@FETCH_STATUS = 0
BEGIN
PRINT 'Processing: ' + @Name;
-- Do something with each row
FETCH NEXT FROM employee_cursor INTO @EmployeeID, @Name;
END
CLOSE employee_cursor;
DEALLOCATE employee_cursor;
10. Why You Should AVOID Cursors
| Aspect | Set-Based | Cursor |
|---|---|---|
| Speed | Process all rows at once | One row at a time |
| Performance | Optimized by SQL Server | Forces sequential processing |
| Memory | Efficient | Holds open connection |
| Best Practice | ✅ Preferred | ⚠️ Last resort |
Convert Cursor to Set-Based
-- Cursor approach (SLOW):
DECLARE cursor...
WHILE...
UPDATE Employees SET Bonus = Salary * 0.1 WHERE EmployeeID = @ID;
-- Set-based approach (FAST):
UPDATE Employees SET Bonus = Salary * 0.1; -- All rows at once!
When cursors ARE acceptable:
- Complex row-by-row logic that can’t be expressed in SQL
- Calling stored procedures for each row
- Administrative tasks (rebuilding indexes one by one)
Part D: Subqueries & Correlated Subqueries
11. Subquery in WHERE
-- Find employees earning more than average:
SELECT Name, Salary
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);
12. Correlated Subquery (References Outer Query)
-- Find employees earning more than their department average:
SELECT e.Name, e.Department, e.Salary
FROM Employees e
WHERE e.Salary > (
SELECT AVG(Salary)
FROM Employees
WHERE Department = e.Department -- References outer 'e'
);
Warning: Correlated subqueries run once PER ROW — can be slow!
Summary: Quick Reference
Window Functions
| Function | Use Case |
|---|---|
| ROW_NUMBER() | Pagination, deduplication |
| RANK() / DENSE_RANK() | Competition rankings |
| LAG() / LEAD() | Compare to previous/next row |
| SUM/AVG OVER | Running totals, moving averages |
CTE vs Subquery
| Feature | CTE | Subquery |
|---|---|---|
| Readability | ✅ Excellent | ❌ Nested mess |
| Reusability | ✅ Reference multiple times | ❌ Repeat code |
| Recursion | ✅ Supported | ❌ Not supported |
Best Practices
- Prefer Window Functions over self-joins for ranking
- Use CTEs for complex multi-step queries
- Avoid Cursors — think set-based first
- Watch Correlated Subqueries — they run per row
💡 Practice Questions
Conceptual
-
What is the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()? Provide an example where each would give different results.
-
Explain what PARTITION BY does in a window function. How is it different from GROUP BY?
-
What is a Recursive CTE? When would you use one?
-
Why should you generally avoid cursors in SQL Server?
Hands-on
-- Write a query to find the top 3 highest-paid employees per department using a CTE and window function.
-- Calculate the month-over-month sales growth using LAG():
-- Show: Month, Sales, PreviousMonthSales, GrowthPercentage
💡 View Answer
-- Top 3 per department
WITH RankedEmployees AS (
SELECT
DepartmentID,
EmployeeName,
Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS rn
FROM Employees
)
SELECT DepartmentID, EmployeeName, Salary
FROM RankedEmployees
WHERE rn <= 3;
-- Month-over-month growth
SELECT
SalesMonth,
Sales,
LAG(Sales) OVER (ORDER BY SalesMonth) AS PreviousMonthSales,
-- Pro Tip: Use NULLIF to avoid divide-by-zero errors!
-- NULLIF(x, 0) returns NULL if x = 0, preventing arithmetic errors
ROUND((Sales - LAG(Sales) OVER (ORDER BY SalesMonth)) * 100.0
/ NULLIF(LAG(Sales) OVER (ORDER BY SalesMonth), 0), 2) AS GrowthPercentage
FROM MonthlySales;
Scenario
-
Optimization: A developer wrote a query using a cursor to calculate running totals. The query takes 5 minutes on 1M rows. How would you rewrite it using a window function?
-
Real-world: You need to display an organization chart showing each employee, their manager chain (up to CEO), and their level in the hierarchy. Which SQL technique would you use and why?