Increase MAX_MOVES to prevent buffer overflow and stack corruption by ZealanL · Pull Request #4558 · official-stockfish/Stockfish

4 min read Original article ↗

I'd like to emphasize that Stockfish is not Linux, v8 or any other system programs. It is ok for SF to crash, and it is tolerated to neglect possible vulnerabilities that depend on extreme luck. Every means to secure the control flow comes with its cost (speed, memory space, code readability, or etc). Such changes are redundant as long as the bug is not harmful.

Buffer overflows are a serious issue, and make up a significant portion of software vulnerabilities. Stockfish is open source, free, and built by the community - but it is also an extremely popular program that should take responsibility for a security issue within its internal processing that stems from arguably legitimate user input. A service running Stockfish on servers will probably think to validate the FEN, but I doubt anyone considered the possibility of a buffer overflow from generating too many moves. Stockfish doesn't crash because it recognizes invalid data, the system terminates the executable because it is working with corrupted pointers. That's not good.

By your same standard, one can claim that any PGO-enabled compiler is "vulnerable" for it will execute whatever arbitrary code that it compiled, yet so far I haven't seen anyone whined at them.

What??? Does profile-guided optimization have a buffer overflow too?

Have you found proper ROP gadgets in Stockfish or system libraries?

ROP gadgets are not difficult to find, its just a few instructions that can be used as part of a ROP chain. Obviously what compiler is used matters. This open-source gadget finder tool found 13825 in Stockfish's binary. It's pretty unavoidable if you want a fast application, the thing to avoid is allowing malicious stack access in the first place.

Are you sure the return address is "close" enough to moveList + MAX_MOVES so that it can be overwritten?

Return addresses are pushed to stack from EIP/RIP via the call instruction, and the move list is dynamically created in stack via this code:

  for (const auto& m : MoveList<LEGAL>(pos)) // Buffer overflow here
      if (   limits.searchmoves.empty()
          || std::count(limits.searchmoves.begin(), limits.searchmoves.end(), m))
          rootMoves.emplace_back(m);

The crash happens a while later, which is good news for the exploit, because it means SF continues running for quite some time with that corrupted data on the stack. The EBP register is pushed before the buffer overflow takes place with push ebp at the start of the function (not the actual C++ function because of heavy inlining, but a parent function), and is restored after with pop ebp, which is where the register is actually filled with escaped bytes from the buffer.


I am not a skilled stack exploiter or anything, so it is taking me some time to investigate into this and try out different positions and learn why they are crashing (which can be difficult because of inlining). Different compilers and compiler settings will probably also result in different crashes for different reasons. This also extends to the fact that you can control how much the buffer is overflowed by how many moves over the limit are generated.

The simple fact that there are many >256 move positions means that there are many different combinations of moves that can escape the buffer. Giving a different position with different moves escaping the buffer will crash with different corrupt memory. Perhaps I can create a table of which positions crash with what corrupted memory, so I can try to target specific things. Sure, this exploit isn't just some super easy buffer overflow with user-defined bytes, but it certainly has the potential to be very dangerous and should be taken seriously. Many different services run Stockfish on their cloud servers, and I have yet to find any service that doesn't accept my >256-move FENs.

If I were to have the ability to target a specific address, and perhaps knowledge of the address of some critical system API functions on a victim machine/server, you can hopefully see how this access violation crash is no joke. Some of the most effective code execution exploits rely on a secondary exploit to obtain memory addresses of applications running on the target OS.

If I manage to cause some malicious execution from this, there won't be a PoC to share because it will just be a FEN (or more likely a large series of FENs corresponding to different target addresses).

I also plan to investigate other ways in which Stockfish crashes from bad user input (like invalid FEN parsing).