PostgreSQL has been quietly accumulating superpowers for decades. Most developers use it like a fancy MySQL -- and leave enormous capability on the table. Here are gems that will change how you write migrations, queries, and maintenance scripts.
USINGThe problem: You stored time as a packed integer (BIGINT) -- say, 12345678 meaning 12:34:56.78 -- and now you need an actual TIME column. Normally you'd add a new column, backfill it, drop the old one. Three steps, extra risk, more downtime.
The treasure: ALTER COLUMN ... TYPE ... USING lets you apply an arbitrary expression to every row at the moment the type changes -- in a single statement.
ALTER TABLE events
ALTER COLUMN recorded_at TYPE TIME
USING MAKE_TIME(
(recorded_at / 1000000)::int, -- hours
((recorded_at / 10000) % 100)::int, -- minutes
((recorded_at % 10000) / 100.0) -- seconds.centiseconds
);
Another common use -- converting a VARCHAR price column to NUMERIC:
ALTER TABLE products
ALTER COLUMN price TYPE NUMERIC(10,2)
USING REPLACE(price, '$', '')::NUMERIC(10,2);
The USING clause is evaluated row-by-row during the rewrite. You have access to the full expression language: math, string functions, CASE, even subqueries. No temp columns, no backfill scripts, no cleanup.
When to reach for it: Any time you're converting encoded or packed values to a proper type -- epoch integers to TIMESTAMPTZ, cents integers to NUMERIC, formatted strings to DATE. One atomic DDL statement beats a multi-step migration every time.
CASE-Based Pivots to Eliminate Redundant JoinsNormally, PostgreSQL's query planner treats a WITH clause as a soft optimization hint -- it can inline or re-execute it if it thinks that's faster. Add MATERIALIZED and you give the planner an explicit contract: evaluate this CTE exactly once, store the result, and reuse it.
WITH base AS MATERIALIZED (
SELECT ... -- runs once, result is cached in memory/temp storage
)
SELECT * FROM base WHERE ...;
-- base is not re-scanned or re-executed above
This matters most when:
CASE-Based Pivot TrickInstead of joining the same table twice for two different periods, scan it once and pivot using conditional aggregation:
WITH active_users AS MATERIALIZED (
-- Expensive subquery runs exactly once
SELECT user_id, region
FROM get_active_users($start_date, $end_date)
),
sales_pivot AS (
SELECT
user_id,
MAX(CASE WHEN period = 'current' THEN revenue END) AS curr_revenue,
MAX(CASE WHEN period = 'current' THEN units END) AS curr_units,
MAX(CASE WHEN period = 'previous' THEN revenue END) AS prev_revenue,
MAX(CASE WHEN period = 'previous' THEN units END) AS prev_units
FROM sales_summary
WHERE period IN ('current', 'previous')
AND user_id IN (SELECT user_id FROM active_users)
GROUP BY user_id
)
SELECT
u.user_id,
u.region,
sp.curr_revenue,
sp.prev_revenue,
ROUND((sp.curr_revenue - sp.prev_revenue) / NULLIF(sp.prev_revenue, 0) * 100, 2) AS pct_change
FROM active_users u
LEFT JOIN sales_pivot sp ON sp.user_id = u.user_id
ORDER BY sp.curr_revenue DESC NULLS LAST;
Why this is faster than two joins:
| Approach | Table scans | Rows processed |
|---|---|---|
| Two separate joins (current + previous period) | 2 | 2× filtered set |
CASE-based pivot with GROUP BY |
1 | 1× filtered set |
The single scan also plays nicely with the materialized CTE -- the user_id IN (...) filter is applied once against the cached result set, not re-evaluated per join.
LIKE INCLUDING ALL + Atomic SwapThe problem: A queue or log table has grown to millions of rows. You need to keep only the most recent N records. DELETE WHERE id < X is slow, holds locks, generates massive WAL, and your application is hammering inserts the whole time.
The treasure: Build a new table as a structural clone, populate it offline, then swap with a millisecond lock window.
CREATE TABLE app.event_log_new (LIKE app.event_log INCLUDING ALL);
INCLUDING ALL copies:
nextval() pointing to the existing sequence -- no new sequence is created)Note: LIKE ... INCLUDING ALL does not copy foreign key constraints. If your table has outgoing FK constraints, re-add them manually on the new table before the swap.
INSERT INTO app.event_log_new
SELECT * FROM app.event_log
ORDER BY event_log_id DESC
LIMIT 10000;
This runs while inserts keep flowing into the original table. No locks, no drama.
BEGIN;
LOCK TABLE app.event_log IN ACCESS EXCLUSIVE MODE;
-- Capture rows inserted between phase 1 and the lock
INSERT INTO app.event_log_new
SELECT * FROM app.event_log
WHERE event_log_id > (SELECT MAX(event_log_id) FROM app.event_log_new)
ORDER BY event_log_id DESC
LIMIT 10000;
-- Advance the sequence to the current max so no IDs are reused after the swap
SELECT setval(
'app.event_log_event_log_id_seq',
(SELECT MAX(event_log_id) FROM app.event_log)
);
ALTER TABLE app.event_log RENAME TO event_log_old;
ALTER TABLE app.event_log_new RENAME TO event_log;
-- Critical: transfer sequence ownership BEFORE dropping the old table
-- or the sequence will be dropped along with it
ALTER SEQUENCE app.event_log_event_log_id_seq
OWNED BY app.event_log.event_log_id;
COMMIT;
The lock is held only for metadata operations -- three ALTER statements and a setval. These are catalog updates, not data rewrites. Typical lock window: under 10 milliseconds.
DROP TABLE app.event_log_old;
DROP TABLE doesn't zero out data on disk -- it simply unlinks the heap and index files from the filesystem. Effectively instantaneous regardless of table size.
The OWNED BY detail you cannot skip: If you drop event_log_old while the sequence still thinks it's owned by that table, the sequence goes with it -- and your next insert explodes. The ALTER SEQUENCE ... OWNED BY inside the transaction re-parents the sequence to the new table before the drop.
DISTINCT ON -- The Clause That Replaces a SubqueryMost developers reach for a subquery or window function to get the latest row per group. Postgres has dedicated syntax for it:
-- Get the most recent order for each customer
SELECT DISTINCT ON (customer_id)
customer_id, order_date, total_amount, status
FROM orders
ORDER BY customer_id, order_date DESC;
DISTINCT ON (expr) keeps exactly one row per unique value of expr -- the first row in the ORDER BY sequence. The rule: the DISTINCT ON columns must appear leftmost in ORDER BY.
Compare the alternatives:
-- Subquery version (verbose, often slower)
SELECT o.*
FROM orders o
JOIN (
SELECT customer_id, MAX(order_date) AS max_date
FROM orders GROUP BY customer_id
) latest ON o.customer_id = latest.customer_id
AND o.order_date = latest.max_date;
-- Window function version (requires a wrapper query)
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
FROM orders
) t WHERE rn = 1;
-- DISTINCT ON version (one scan, no wrapper)
SELECT DISTINCT ON (customer_id)
customer_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC;
With an index on (customer_id, order_date DESC), the DISTINCT ON version can satisfy the query with a pure index scan and zero sort. The others can't.
GENERATED ALWAYS AS -- Computed Columns That Maintain ThemselvesStop writing triggers to keep derived columns in sync. Postgres 12+ supports stored computed columns natively:
CREATE TABLE contacts (
id SERIAL PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
full_name TEXT GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED,
email TEXT,
email_domain TEXT GENERATED ALWAYS AS (
CASE WHEN email LIKE '%@%'
THEN LOWER(SPLIT_PART(email, '@', 2))
END
) STORED
);
The column is recomputed automatically on every INSERT or UPDATE. You can index it, filter on it, and join on it like any other column -- with zero application-layer maintenance.
Practical uses: full-name search columns, normalized/lowercased lookups, fiscal year derivations from a date column, or combining fields into a composite key. The expression must be immutable (no NOW(), no subqueries), but anything deterministic from other columns in the same row is fair game.
A full index on a 50-million-row table when you only ever query a small, well-defined subset is wasteful in size, memory, and write overhead. Partial indexes let you index just the rows that matter:
-- Only index unprocessed jobs -- the hot query path
CREATE INDEX idx_jobs_pending ON jobs (created_at, priority)
WHERE status = 'pending';
-- Only index non-deleted rows for a soft-delete pattern
CREATE INDEX idx_users_active_email ON users (email)
WHERE deleted_at IS NULL;
The index is smaller, fits in shared_buffers more easily, and is faster to update because completed or deleted rows never touch it. The planner uses it automatically when the query's WHERE clause matches the index predicate.
The trap: your query's WHERE clause must semantically match the index predicate exactly. WHERE status = 'pending' uses the index; WHERE status != 'done' does not -- even if that's logically equivalent given your data.
COPY with ON_ERROR SKIP -- Fault-Tolerant Bulk LoadsImporting a large CSV where a handful of rows have bad data? Instead of failing the entire load or pre-scrubbing the file, tell Postgres to skip bad rows and continue (Postgres 17+):
COPY staging.raw_imports (id, name, price, created_at)
FROM '/data/exports/products.csv'
WITH (FORMAT csv, HEADER true, ON_ERROR SKIP, LOG_VERBOSITY VERBOSE);
Bad rows are skipped and logged rather than aborting the transaction. LOG_VERBOSITY VERBOSE captures exactly which rows were rejected and why, so you can fix them separately.
For older versions, file_fdw with error handling gets you partway there -- but native ON_ERROR SKIP in COPY is cleaner by a mile.
EXPLAIN (ANALYZE, BUFFERS) + explain.dalibo.comYou know EXPLAIN ANALYZE. But adding BUFFERS reveals whether your query is hitting shared cache or reading from disk -- often the difference between fast and catastrophic:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42 AND status = 'open';
Look for:
Buffers: shared hit=N read=M -- hit is from cache, read is from disk. A high read count means your working set isn't fitting in shared_buffers.Rows Removed by Filter: N -- the planner's row estimate was wrong. This is usually why your index isn't being used.actual loops=N on a nested loop -- a nested loop running 50,000 times is a common hidden bottleneck.For complex plans, dump FORMAT JSON and paste it into explain.dalibo.com for an interactive visual flame graph of your query plan. Free, no signup, invaluable.
pg_stat_statements -- Find Your Worst Queries Without a ProfilerThe pg_stat_statements extension tracks execution statistics for every distinct query shape that has run since it was last reset. It's the fastest way to find your top offenders:
-- Enable once per database
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Top 10 queries by total time spent
SELECT
LEFT(query, 80) AS query_snippet,
calls,
ROUND(total_exec_time::numeric, 2) AS total_ms,
ROUND(mean_exec_time::numeric, 2) AS avg_ms,
ROUND(stddev_exec_time::numeric,2) AS stddev_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
total_exec_time shows you where wall-clock time is actually going -- not just the slowest single query, but the query called 200,000 times that's 2ms each. That's where your optimization ROI lives.
Reset stats after a tuning session with SELECT pg_stat_statements_reset();.
Need to ensure only one worker processes a job at a time? Most teams build a locked_at column and a loop. Postgres has a built-in mutex primitive that lives entirely in memory:
-- Try to acquire lock for job_id 12345 (non-blocking)
SELECT pg_try_advisory_lock(12345);
-- Returns TRUE if acquired, FALSE if another session holds it
-- Do the work...
-- Release when done
SELECT pg_advisory_unlock(12345);
Advisory locks are:
Use pg_try_advisory_lock (non-blocking) in job workers so they skip already-locked items rather than queuing up. Use pg_advisory_lock (blocking) when you need a strict serialization point.
-- Pattern: skip if already being processed
UPDATE jobs
SET status = 'processing', started_at = NOW()
WHERE id = (
SELECT id FROM jobs
WHERE status = 'pending'
AND pg_try_advisory_lock(id) -- skips locked rows
ORDER BY priority DESC, created_at
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING *;
Combine with FOR UPDATE SKIP LOCKED for a bulletproof queue worker with no application-layer coordination.
| Technique | The Old Way | The Postgres Way |
|---|---|---|
| Type conversion with transform | Add column → backfill → drop | ALTER ... USING expr |
| Multi-period aggregation | Join same table N times | One scan + CASE pivot + MATERIALIZED CTE |
| Pruning a live table | DELETE with downtime |
Clone → atomic swap → instant drop |
| Latest row per group | Subquery or window function | DISTINCT ON |
| Derived/computed columns | Triggers | GENERATED ALWAYS AS ... STORED |
| Selective indexing | Full index on everything | CREATE INDEX ... WHERE condition |
| Fault-tolerant bulk load | Pre-scrub CSV or ETL pipeline | COPY ... ON_ERROR SKIP |
| Query performance debugging | EXPLAIN ANALYZE alone |
EXPLAIN (ANALYZE, BUFFERS) + dalibo |
| Finding slow queries | Application logs, guesswork | pg_stat_statements |
| Distributed job locking | locked_at column + polling |
pg_advisory_lock + SKIP LOCKED |