Transactional Synchronization Extension coming to Intel processors
software.intel.comI found this illustration on his blog easier to follow: http://software.intel.com/en-us/blogs/2012/02/07/coarse-grai...
How much of a performance increase can be seen with transactional processing?
That depends on how often code goes through the 'get the lock' code path and it doesn't need to. It will be interesting to see how this extension is extended across the interchip buses in multi-CPU mother boards. Transactional region coherence was a feature of the DEC VAX architecture early on, it was implemented such that you simply didn't get to run until the previous thread left the region. Sort of like a 'one vehicle lane' where the imposition of the one thread of execution was done externally to the CPU. In the hash table example programmers could read the memory in the region without being impeded but as soon as one started writing it would block future readers and shoot down the pages in the lookaside cache of the MMU. SGI did something similar with a network card which could provide shared memory with transactional semantics across the network. Some of that work got folded into the InfiniBand spec as I recall. I suppose its just possible that all this stuff has finally fallen out of patent protection and so Intel feels they can put it into their chips.
IMHO Restricted Transactional Memory (RTM) is something different than simply blocking until you acquire the lock. There are also full software implementations (http://en.wikipedia.org/wiki/Software_transactional_memory).
Basically the code path is executed optimistically and in case of conflicts the application has to handle it somehow.
I think there will be performance gains to be had, especially in applications where concurrency has been 'bolted on' (akin to the Big Kernel Lock).
However, I think what makes this interesting is not the raw performance it provides, but the functionality that it exposes. As far as I can tell, TSX will allow sets of operations to be executed, then "rolled back" in case of conflicts. This could greatly improve performance of Java code within synchronized blocks, for instance, or provide much faster hardware implementation of the software transaction memory model in Clojure.
I believe the biggest benefit of this will be making multi-threaded programming easier to get right, and get decent performance to boot. And if these constructs are supported natively in languages and frameworks, everyone will benefit from having 4- 8- or 16- cores.
I'd except PyPy guys are keeping close eye on this.