Divyansh Gupta

Posted on Jun 25

Database optimization best practices

#postgressql #sql #database #productivity

Imagine your database as a wild animal sanctuary: some queries lumber like tortoises, while others sprint like cheetahs. Your job as a DBA is to coax every query into channeling its inner cheetah—fast, efficient, and resource-savvy. In this KB article, you’ll discover practical techniques, vibrant code examples, ASCII-art execution plans, and Mermaid flowcharts that transform sluggish SQL into scalpels of performance. fileciteturn0file0

1. Measure Twice, Cut Once: EXPLAIN & ANALYZE

Before refactoring, know your enemy. Use:

EXPLAIN (ANALYZE, BUFFERS)
SELECT ...;

to expose hidden bottlenecks: row estimates, buffer hits vs. reads, and CPU vs. I/O costs.

                                     QUERY PLAN
-----------------------------------------------------------------------------------
 Hash Join  (cost=150.00..500.00 rows=1000 width=64) (actual time=12.345..45.678 rows=950 loops=1)
   Hash Cond: (t1.id = t2.foreign_id)
   Buffers: shared hit=2000 read=1500

This tells you whether your query is CPU-bound, I/O-bound, or suffering from bad cardinality estimates.

2. Indexing Mastery: More Than Just B‑Trees

2.1 Partial & Expression Indexes

Target hot filter patterns without bloating:

CREATE INDEX idx_active_users ON users((lower(email)))
 WHERE status = 'active';

2.2 BRIN for Time-Series

Massive append-only tables? Try BRIN:

CREATE TABLE logs (
  ts TIMESTAMPTZ,
  event JSONB
) PARTITION BY RANGE (ts);
CREATE INDEX ON logs USING BRIN (ts);

This lightweight index slashes size at scale.

3. Encapsulate Complexity: Stored Functions & Views

Rather than embedding 10 JOINs in every API call, wrap logic in a function or view:

CREATE OR REPLACE FUNCTION daily_sales_summary(day DATE)
RETURNS TABLE(user_id UUID, total DECIMAL) AS $$
BEGIN
  RETURN QUERY
  SELECT s.user_id, SUM(amount)
  FROM sales s
  WHERE date_trunc('day', s.ts) = day
  GROUP BY s.user_id;
END;
$$ LANGUAGE plpgsql;

The planner can optimize a stable function more aggressively than ad-hoc SQL.

4. Aggregations & Windows: Tricks of the Trade

4.1 Materialized Aggregates

For metrics dashboards, precompute:

CREATE MATERIALIZED VIEW mv_user_errors AS
SELECT user_id, COUNT(*) AS error_count
FROM events
WHERE error_flag
GROUP BY user_id;
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_user_errors;

4.2 Window Functions vs. GROUP BY

When you need both raw rows and aggregates:

SELECT
  order_id,
  amount,
  SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM orders;

Use an index on (customer_id, amount) to speed windows.

5. Partitioning & Parallelism: Scale Out Safely

5.1 Declarative Partitioning

Split by time or key:

CREATE TABLE metrics (
  ts DATE,
  value DOUBLE PRECISION
) PARTITION BY RANGE (ts);
CREATE TABLE metrics_2025_q1 PARTITION OF metrics
 FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');

5.2 Harness Parallel Queries

Enable in postgresql.conf:

max_parallel_workers_per_gather = 4

Then large scans auto-split across CPUs.

6. Housekeeping: VACUUM, ANALYZE & Maintenance

6.1 Autovacuum Tuning

Ensure autovacuum thresholds fit your workload. For high-churn tables:

ALTER TABLE big_table
SET ( autovacuum_vacuum_scale_factor = 0.05,
      autovacuum_analyze_scale_factor = 0.02 );

6.2 Fillfactor for Write-Heavy Tables

Reserve free space to reduce page splits:

ALTER TABLE logs SET (fillfactor = 70);

7. Real‑World Case Study: 80% Speedup

Scenario: A nightly report took 10 minutes. By applying:

Partial index on status
Function-based view
Partition pruning on date
Autovacuum tuning

we tracked its execution plan changes:

-- Before:
Seq Scan on orders  (time: 600s)
-- After:
Index Only Scan using idx_status_date  (time: 120s)

From 10 min → 2 min: a success story to inspire your own triumphs.

Key Takeaways

Measure first with EXPLAIN ANALYZE (BUFFERS).
Index smartly: partial, expression, BRIN.
Encapsulate complex logic in functions/views.
Precompute heavy aggregates with materialized views.
Partition & parallelize for scale.
Maintain: VACUUM, ANALYZE, and fillfactor.

8. Beyond the Basics: Advanced Techniques

8.1 Adaptive Query Plans with pg_stat_statements

Track your most expensive statements:

CREATE EXTENSION pg_stat_statements;
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 5;

Use this insight to prioritize optimizations.

8.2 Plan Stability with Prepared Statements

For queries with variable patterns, prepared statements lock in good plans:

PREPARE fast_search(text) AS
SELECT * FROM products WHERE description ILIKE $1;
EXECUTE fast_search('%widget%');

8.3 In-Memory Speed with UNLOGGED Tables

Temp-heavy data can live in RAM:

CREATE UNLOGGED TABLE temp_hits AS
SELECT ...;

8.4 Smart Caching Layers

Combine Redis or PGSQL's native caching:

DO $$
BEGIN
  PERFORM pg_prewarm('hot_table');
END;
$$;

Creative Corner: Visualizing Data Flow

Bring your diagrams to life—they guide both your brain and your team.

Final Thoughts: The Art of Performance

Optimizing SQL is equal parts science and art. It’s a continuous journey: measure, tweak, observe, and repeat. With these techniques—from core index strategies to creative caching and plan management—you’re equipped to turn any tortoise into a cheetah.
Remember: the fastest query is the one you never run. Cache wisely, precompute where it counts, and let your database shine.

DEV Community