With Ivy Bridge set to debut in just a few short months, Intel has begun to talk about what we can expect to see in Haswell. Haswell is the 22nmm successor to IB and is currently expected to debut in 2013. Haswell will include support for Intel’s Transactional Synchroni ation Extensions (TSX), a new set of instructions designed to improve multi-core efficiency.
One of the most basic challenges of multi-threading is the need to ensure that the available CPU cores don’t interfere with each other and tie virtual knots in the underlying programs. The simplest way to visuali e the problem is to imagine a pair of cooks working off the same recipe at the same stove. If tasks are divided properly, food preparation time can be substantially reduced. If responsibilities aren’t clearly delineated, some tasks may end up being done twice, while other steps are left out altogether due to miscommunication over what’s been done and what hasn’t.
Multi-threaded software deals with this problem by locking threads to protect them from random modification until the current operation has been completed. A coarse-grained thread lock is a lock that applies to an entire data structure. In this case, it may help to think of a standard Excel spreadsheet. Try opening two copies of the same spreadsheet, and Excel will immediately notify you that the table is open in another program and can only be viewed in read-only mode. Coarse-grained thread locks are simple and easy to implement, but aren’t very efficient.
Fine-grained thread locks allow for a greater degree of simultaneous access. To continue with our previous example, instead of locking the entire spreadsheet, each thread might be allowed to modify a different column of data. This approach could significantly improve performance, but risks errors. As the above slide shows, problems can occur when two different threads need to write data to the same field. Coarse-threading avoids this problem, but leaves other cores idle.
Programmers tend to err on the side of caution and generally employ more coarse-grained locks than is optimal in order to ensure that data remains accurate. TSX is designed to move this evaluation from software into hardware and give programmers the performance advantages of fine-grained locks without the associated implementation difficulty.
Haswell will be able to determine dynamically whether or not threads need to be locked and perform the necessary operations. The switch isn’t automatic; programmers who want to take advantage of TSX will need to use one of two software interfaces to create specific transaction regions. Intel’s manual states: “If the transactional execution completes successfully, then all memory operations performed within the transactional region will appear to have occurred instantaneously when viewed from other logical processors.” Updates are only made visible to logical processors after they’ve successfully completed; Intel refers to this as an “atomic commit.”
Transactional updates are executed optimistically without explicit synchroni ation. If an update fails, the execution is reversed and the thread is handled normally.
The big picture
Programmers will have to adapt programs to TSX in order to take advantage of it, and it’s not the sort of technology that’s going to make surfing the web or gaming any faster. It is, however, an example of how Intel is continuing to improve total CPU performance even with clock speeds and IPC held relatively constant. Thread locks are major performance bottlenecks and can lead to power consumption inefficiencies if cores are left powered up and waiting for a thread to complete. Intel’s optimistic execution runs the risk of being wrong, but the cost of reversing the transaction in terms of power and time is evidently smaller, on average, than the benefit of speculatively executing in the first place.
Given that power consumption falls as process technology shrinks, the “cost” of such an approach will decline over time while the software benefits increase as more programmers adopt the TSX model. It’s also a small step towards the new approaches DARPA recently called for as the organi ation struggles towards building an exascale computer. TSX is compatible with current programming techniques and provides a fall back method for using them, which will hopefully make it easier for programmers to transition to the new model.
Intel’s Haswell will include new multi-core enhancements
0 comments:
Post a Comment