Memory controller, SMT and SSE4.2
Memory controller
Nehalems integrated memory controller supports tripple channel DDR3 which does 1.33 GT/s. The maximum bandwidth therefore is 32 GB/s. Every controller can act independently. This has a positive impact on the overall performance. To profit of the four times higher bandwidth every core supports now up to ten data cache misses and 16 total outstanding misses. A Core 2 on the other hand supports 8 data cache misses and 14 total misses in-flight.
Also the integrated memory controller improves memory latency substantially. In example Nehalem proviedes 60 ns latency compared to 100 ns on Harpertown.
SSE4.2 and SMT
Also the instruciton set has been extended. SSE4.2 therefore now supports string comparsion, a CRC instruction as well as a popcount. Demonstrations under optimal conditions showed a performance increase 6 to eighteen times but in every day applications the increase will be much smaller.
Much more important is the return to simultaneous multithreading (SMT). Intel first implemented the on a 130 nm Pentium 4. SMT demands a lot of bandwidth to deploy its potential because it generates substantially more outstanding misses. Also the validation is very complex and requires a design which is basically conceived for SMT. Further SMT is a very elegant way to get more performance out of one core because it es very energy efficient.
Discuss this article in the forum.