I am sure you have all been there....
I was working on a report that stitched together customer orders, inventory, shipping statuses, and regional averages. It had nested subqueries so deep debugging it was like playing 3D chess in a hurricane. One small change would break something five levels down. I needed a better way...
That's when I learned about Common Table Expressions (CTEs). Once I understood how to break my query into logical, readable steps, everything changed. The queries made sense again. They ran faster. My team could actually read my SQL without swearing.
CTEs are one of those tools that once you start using them, you wonder how you ever survived without them. Here's what I wish someone had shown me earlier:
CTEs, or Common Table Expressions, provide a way to define temporary named result sets within a query, improving readability and performance by allowing you to pre-filter data and reference complex subqueries multiple times. They're particularly useful for avoiding repeated subqueries, eliminating scalar functions in WHERE clauses, and creating modular query components. Check out the following two examples.
SELECT o.order_id, o.customer_id, o.order_date, o.total_amount
FROM orders o
WHERE o.customer_id IN (
SELECT customer_id
FROM customers
WHERE registration_date >= '2023-01-01'
AND country = 'USA'
)
AND o.order_date >= '2023-06-01'
AND o.total_amount > (
SELECT AVG(total_amount)
FROM orders o2
WHERE o2.customer_id = o.customer_id
AND o2.order_date < o.order_date
)
ORDER BY o.order_date DESC;
WITH recent_usa_customers AS (
SELECT customer_id
FROM customers
WHERE registration_date >= '2023-01-01'
AND country = 'USA'
),
customer_orders_with_history AS (
SELECT
o.customer_id,
o.order_id,
o.order_date,
o.total_amount,
COALESCE(
AVG(o.total_amount) OVER (
PARTITION BY o.customer_id
ORDER BY o.order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
), 0
) as avg_previous_amount
FROM orders o
INNER JOIN recent_usa_customers c ON o.customer_id = c.customer_id
)
SELECT order_id, customer_id, order_date, total_amount
FROM customer_orders_with_history
WHERE order_date >= '2023-06-01'
AND total_amount > avg_previous_amount
ORDER BY order_date DESC;
Improved readability - Break complex queries into logical, named components that are easier to understand and maintain
Eliminates code duplication - Reference the same subquery multiple times without rewriting it, reducing redundancy and potential errors
Better performance - Pre-filter data early in CTEs to reduce the dataset size for subsequent operations, and avoid expensive correlated subqueries
Replaces scalar functions in WHERE clauses - Use window functions and joins instead of row-by-row function calls that hurt performance
Modular query design - Build queries step-by-step with intermediate results, making debugging and modification easier
Recursive capabilities - Handle hierarchical data structures like organizational charts or bill-of-materials with recursive CTEs
Simplified maintenance - Changes to business logic only need to be made in one place within the CTE rather than multiple subqueries
Enhanced debugging - Test and validate individual CTEs independently before combining them in the final query
Temporary result sets - Create virtual tables that exist only for the duration of the query without needing actual temporary tables
Find your ugliest nested subquery - the one nobody wants to touch - and refactor it using CTEs. You'll be amazed by how much faster, cleaner, and easier it is to maintain. Bonus points if you check the execution plan and measure the difference.
Recommended reading time: ~20 min for basics, 45 min for advanced.
WITH amazing_day AS (
SELECT 'Have an amazing day!' as message
),
farewell AS (
SELECT 'That''s all folks.' as closing
)
SELECT closing, message
FROM farewell
CROSS JOIN amazing_day;