Why learn about the golden-section search

6 min read Original article ↗

This is Words and Buttons Online — a collection of interactive #tutorials, #demos, and #quizzes about #mathematics, #algorithms and #programming.

I learned about the golden-section search algorithm at university and immediately dismissed it as impractical. It's a little more complex to implement than a simple bisection and it has a little worse convergence too. So what's the point?

20 years later I came back to the same university to teach and didn't even want to add the golden-section algorithm to the syllabus. But then I relearned the algorithm reluctantly and noticed one implementation detail that immediately made it learning-worthy. You see, while the golden-section search converges worse than the bisection search indeed, it also has a cheaper iteration, so computation-wise it is still more effective! The very paradox, which is not at all paradoxical if you know both algorithms, makes learning both algorithms well worth your time.

Bisection search

Let's start with the bisection search. The idea is — have a function, an interval [a, b], and the function has one and only one minimum somewhere at that interval. You want to compute this minimum up to some predefined margin of error.

So what do you do with the bisection? You split the interval in half, determine where the minimum is now, and start over with the reduced interval. If the minimum is in the left half — you restart the algorithm on the interval [a, (a+b)/2]. If it's in the right half — the [(a+b)/2, b]. Pretty simple!

But how do you decide in which half the minimum lies? You do a kind of numeric differentiation. You take a point (a+b)/2 + σ, where σ is some small value less than your target precision, and compute the function there. Now if this value is lower than the value of the function in (a+b)/2 (with no σ), then the function is decreasing, and the minimum should be in the right half-interval. If the value with the σ is greater — then the function is increasing and the minimum should be somewhere in the left half-interval.

Of course, there is a third variant. If the values are strictly equal, then the minimum is somewhere in between the values. You can stop the algorithm then. Since the σ is less than the target precision by design, any value in between (a+b)/2 and (a+b)/2 + σ lies within the margin of precision from both (a+b)/2, (a+b)/2 + σ, and even (a + b + σ)/2.

But normally, you stop the algorithm when the length of the [a, b] interval becomes smaller than the requested precision times two. You know that no matter in which half, the minimum is always in between a and b, so if (b-a) is lower than the precision * 2, then (a+b)/2 ± precision is your answer.

You can play with the algorithm using the form below.


Golden-section search

With golden-section search, you split the interval not into 2 subintervals but in 3.

With golden-section search, you split the interval not into 2 subintervals but in 3. This means, that on 1st iteration you have to compute the target function twice. The good news — you don't have to mess with sigma anymore. Let's say you split your interval [a, b] into these three: [a, a1], [a1, b1], [b1, b].

Now, if a1 < b1 then the minimum is guaranteed to be somewhere in the [a, b1] subinterval. If b1 < a1 — then in the [a1, b].

There are some marginal cases like what if the function in a1 and b1 is exactly the same, or what if the function in a or b is less than in a1 or b1. These cases are interesting but not important for the core idea of the algorithm.

So let's say on each iteration, we narrow our interval to [a1, b] or [a, b1]. The proportion (b-a1) / (b-a) is exactly the same as (b - a1) / (b - a), it's a constant number and it's roughly 0.62 by design. Which is more than 0.5. This means that the algorithm narrows down the interval pretty much like the bisection algorithm does but... slower. That's the only difference.

Both bisection search and golden-section search are said to have linear convergence. This means that they both reduce interval linearly, the proportion between the interval length on (i+1)-th iteration and on i-th is always the same. But bisection has a better rate and therefore converges faster.

So why do we need to learn about the golden-section search then?

There is a trick. And the trick is almost magical. As in performance art magical, not miracles magical. You have to watch the magician's hands very carefully. With bisection, you have to compute your target function twice on every iteration: once in the middle of the interval, and the other — in the middle of the interval + σ. With golden-section search, you also have to compute the target function twice but... drum roll... only on the first iteration!

In every iteration after that, one of the points you already have the target function computed for remains stationary regardless of the current iteration's result. This means, that you only have to compute your target function once per iteration, not twice. And this makes the golden-section search more computationally efficient in practice even if, mathematically speaking, it has an undeniably worse convergence ratio.

But how come we get to keep one of the points? Glad you asked. That's exactly what we need the golden section for.

The golden ratio is by definition a number that connects the proportion of the part of an interval (b) to the whole interval (a) such as.

There is one and only one such number and it's approximately: 1.618033.

Well, you were probably expecting to see 0.62 from before. No problem:

Here you go.


Conclusion

So the reason why we want to learn about the golden-section search is that it teaches us two important things:

1. The convergence analysis of an algorithm can be deceptive. You always have to take the specifics of iteration into account. You'll see the same with Newtonian methods vs. quasi-Newtonian or even Nelder-Mead vs. Powell. Convergence is important but the computational cost is importanter.

2. Geometry can be pragmatic. Without this golden-ration trick, the algorithm would have made no sense. It's only due to this one special proportion we can save one computation per iteration and overtake the bisection search in terms of computational cost.