What do the new Index Searches lines in EXPLAIN mean? - pgMustard

Cover photo credit: Diogo Nunes (nature draws indexes pretty well, huh?)

In Postgres 18 you’ll now see “Index Searches” lines in EXPLAIN ANALYZE output. If like me you were wondering what those mean exactly, you’re in the right place.

The simple case

The standard case is “Index Searches: 1” which means a single descent of the index. This may be very efficient, if everything we need is in the same area of the index. It may also be very inefficient, if the entries we need are not colocated, and therefore lots of the entries scanned through do not meet our conditions. More on this later!

What about when Index Searches > 1

In Postgres 17, there was a nice optimisation to “allow btree indexes to more efficiently find a set of values, such as those supplied by IN clauses” (release notes, commit). This built on previous work from way back in Postgres 9.2 to “teach btree to handle ScalarArrayOpExpr quals natively” (release notes, commit), as well as some even older work for (only) Bitmap Index Scans.

The docs include an example, showing a Bitmap Index Scan reporting that it searched the index four times, once for each item in the IN list. Here’s the output from Postgres 18, so that we see the Index Searches field:

In this case, a single index search (of the same index) would have needed to scan many more buffers, since it would have had to also scan through the pages containing the 995 other values between 1 and 999 that are not listed.

Previously, we couldn’t tell for sure from the EXPLAIN ANALYZE output whether the optimisation was being used. We got clues, like reduced timing and reduced buffers, but not the explicit Index Searches count. You can however see them in a couple of system views, for example pg_stat_user_indexes has an idx_scan column that counts these individual descents.

In Postgres 18, as well as adding Index Searches to EXPLAIN output, more work was done to add support for “skip scans” of btree indexes (release notes, commit).

Once again, the docs include a nice example, showing an Index Only Scan reporting that it searched the index three times, once for each value in a range:

Note that despite 1, 2, and 3 being consecutive values of the column “four”, their entries for the value unique1=42 would (very likely) not be near one another in an index on “four, unique1” (in that order). As such, 3 separate descents is a much more efficient way to get them from an index defined this way. The overhead of descending multiple times is much lower than the inefficiency of scanning many tuples where unique1 <> 42. Naturally, this becomes less true as the number of descents increases, so this optimisation has the most impact when there are relatively few values in the first column, and when our where condition is very selective.

Something I love about optimisations like these is that they have the potential to speed up existing queries, using existing indexes, without us needing to change anything!

Are more Index Searches good or bad?

In general, the most efficient possible scan would involve a single descent of an optimal index. This would result in the fewest possible buffers being read.

But, having an optimal index for each query does not scale well, as there are prices to be paid for each additional index. These include (but are not limited to) write amplification, the loss of HOT updates (for previously unindexed columns), and increased competition for space in shared_buffers.

As such, if you’re optimising an important query, and are willing to create and maintain an index for it, Index Searches > 1 likely implies there is a more optimal solution.

A simple example

Here’s a setup that I thought would be the simplest way to demonstrate this:

So we created a two-column table, inserted 100k rows, with one column very low cardinality (boolean values evenly split), and the other much higher cardinality (random integers 0—10k).

We added an index, on both columns, with the boolean column first (the column order is important). Finally, we ran VACUUM ANALYZE to update the visibilty map and gather statistics.

If we now run a query that filters only on our second column in the index, we would expect a much more efficient query plan in Postgres 18, using a skip scan.

If we run it on Postgres 17 first, here is the query plan we get:

Three things to notice here:

The buffers are much lower, 9 instead of 168
The execution time is lower (thanks to fewer buffer reads)
Index Searches: 4

So this is a great optimisation, allowing for a more efficient use of the index!

But wait a second, why do we get four index searches? Like me, you may have been expecting it to be two, one descent for TRUE, and one for FALSE. I was stumped for a while, so ended up asking on the performance mailing list. I’m grateful to Peter Geoghegan for the explanation. It turns out that for the general case, boundary conditions and NULLs (of course!) always need to be considered, so you can get one or two extra Index Searches when these can’t be ruled out.

Since I knew these optimisations were very flexible, I wondered if I could get two index searches by explicitly filtering to “only” the values that were TRUE or FALSE:

And finally, here are those last four query plans saved and visualised via pgMustard.

Are we using Index Searches for any tips yet?

So far we aren’t using the Index Searches directly for any pgMustard tips. We do display them under “Operation detail”, for cases where they can be helpful.

When index scans are particularly inefficient, you will still see a “Read Efficiency” tip when there are a lot of buffer reads compared to rows returned, and/or an “Index Efficiency” tip when Postgres reports a high proportion of the rows are being filtered.

Once we see how commonly issues come up around this in practice, and how much optimisation potential there tends to be, we may add something more specific!

Some practical advice in the meantime

Firstly, if you see Index Searches > 1 while optimising an important query, there may be a more optimal index definition for that query.

My main advice would be to still focus on all the usual things, like rows filtered, buffers, and timing.

If you think some of your less important (or less optimised) queries might benefit from these improvements, consider upgrading to (or at least testing) Postgres 18.