How we saved 40GB of Postgres space wasted on indexes

Slack alert about rising db write latency led us down an unexpected rabbit hole. The real bottleneck wasn’t Postgres, traffic spikes, or AWS - it was our own "future-proof" indexing decision.

Lazy summer afternoon and there was a Slack alert about increased write-throughput latency on our production db hosted on AWS RDS Postgres.

At first, we assumed it was just temporary traffic growth. CPU would spike every few minutes, connections counts increased and then dropped. So we got to digging deeper.

Normally, I’d manually go through:

slow query logs
bad N+1 production queries
heavy cron jobs running at intervals
suspicious background tasks

But now, I had AI and the first thing it did was find the list of tables and space occupied by them. In retrospect, this is such an obvious first move haha..

table_name | total_size | estimated_rows
------------------------+------------+----------------
notifications | 43 GB | 5.7M
audit_versions | 22 GB | 18.8M
marketplace_items | 14 GB | 369K
contracts | 11 GB | 2.0M
profiles | 9.7 GB | 543K
chat_rooms | 8.8 GB | 726K

What caught my attention is the size of notifications table given the number of rows. It didn't make sense. Because the other table contracts is also quite a big table but it's taking 1/4th the space.

So the next thing we ran was this query

SELECT
 pg_size_pretty(pg_total_relation_size('notifications')) AS total,
 pg_size_pretty(pg_relation_size('notifications')) AS table_only,
 pg_size_pretty(pg_indexes_size('notifications')) AS indexes;

And the output was like

 total | table_only | indexes 
-------+------------+---------
 43 GB | 2595 MB | 41 GB

Yep, indexes were to blame.

Mind that this notifications table had a json field too, and I was kinda afraid if that might be an issue, but apparently not.

So the next logical step is to find the unused indexes and get rid of them. And maybe optimise some indexes or business logic such that we don't have to waste so much space for simple things.

SELECT
 indexrelname AS index_name,
 pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
 idx_scan
FROM pg_stat_user_indexes
WHERE relname = 'notifications'
ORDER BY pg_relation_size(indexrelid) DESC;

This query gave us an insight into space taken by different indexes

| index_name | index_size | idx_scan |
|----------------------|------------|-----------|
| event | 11 GB | 64 |
| event_like | 11 GB | 261 |
| unread | 9224 MB | 0 |
| unseen | 6853 MB | 0 |
| recipient_id | 1692 MB | 32245493 |
| event_id | 725 MB | 1224333 |
| primary_key | 260 MB | 1763881 |

The first four indexes were unused. We updated the business logic such that unseen stuff was handled with cache. And got rid of the unused indexes.

Bottom line is that it was my own premature optimization to add indexes on fields that were never queried. I just assumed we may need that.

And you know, this all took less than 30 mins to find and resolve, all thanks to AI. It pointed me in the right direction. Gave me those SQL queries which I'd have struggled to figure out on my own, squandering through stackoverflow pre-AI.

TLDR: Don't randomly add indexes to your db tables. You never know when they could be a bottleneck.