Thursday, January 27, 2011

Intel Sandy Bridge out of specification 4.0, 4.4 and 4.6 GHz. Updated – HighPerformanceSystems

In this small article I will present the CineBench R11.5 results of three high end configurations, three High Performance Systems configured out of specification at frecuencies of 4 GHz or more. Download the latest versión of CienBench R11.5.

The first one is a Core i7 930 D0 with triple channel DDR3 at core frecuencies of 4.0 / 4.2 GHz, the second system an AMD Phenom II X6 clcked at 4 GHz and finally the brand new Intel Core i7 2600K. These last two machines have a more tipical dual channel DDR3.

CBR11.5CineBench R11.5.

For a detailed analysis of the underlying microarchitecture of these new Sandy Bridge processors, I recommend several of my past articles on another of my Blogs: LowLevelHardware:

- Microarquitectura Intel Sandy Bridge. Parte 1. Actualizado – LowLevelHardware

- Previo Intel Sandy Bridge. Actualizado – LowLevelHardware
- Intel Sandy Bridge versus Westmere die. Actualizado – LowLevelHardware
- Micrografía detallada de Intel Sandy Bridge – ProfessionalSAT
- Intel Sandy Bridge. Introducción – ProfessionalSAT

Update January 19, 2011: Adding R11.5 CineBench multithreaded performance of AMD Phenom II X6 out of specification at 4 GHz with Northbridge and 6MB L3 Cache set to 2.82 GHz.

Update January 21, 2011: Adding mutithread scaling section and general comments about AMD Phenom II microarchitecture and future improvements in AMD processors. Some graphical errors fixed.

Intel Nehalem Core i7 930 Quadcore D0 system:

This processor family has populated almost all my High Performance Systems for the last two years. Those CPUs has shown excellent IPC and absolute performance and a very high frequency potential from his firsts (C0 / C1) commercial steppings.

In some fortunate hand picked samples I can hit clock levels of 4.0 / 4.2 GHz fully stable in Prime95 X64 SmallFFT, InPlace and Blend (48h each) and IntelBurnTest with 200 cicles. In those cases all energy saving techniques are activated, including C1, C3, C6 and C7 states.

The adjustments are described bellow:

The memory subsystem consists of three 2 GB DDR3 modules in triple channel configuration at a effective frequency of 1451 MHz with access latencies 7-7-7-14-1N and 1.64 V.

SandyB_Nehalem_WestmereIntel Nehalem (left) vs. Sandy Bridge (right).

The core frequency is at 4 GHz with 8 threads maximum load and up to 4.2 GHz with single threaded load (both frequencies are sustained and stable with 100% saturation load, Prime 95 X64).

The core voltage is 1.30 V without activating Load Line Calibration and Uncore voltage is also at 1.30 V. PLL voltaje is nominal at 1.80 V.

The Uncore (memory controllers and other internal buses and the 8 MB L3 cache) are set to 3,439 GHz in BIOS.

AMD Thuban Phenom II X6 Hexacore system

The Phenom II X6 processor is set at a constant frequency of 4 GHz (from 2.8 GHz nominal) through a 282 MHz bus clock and a X14 multiplier. The AMD Turbo Core BIOS option remains disabled.

The North Bridge (which includes 6 MB and 48-way L3 cache, data buses and memory controllers) is set to 2.82 GHz (from 2 GHz nominal) through a X10 multiplier.

Memory subsystem: two 4 GB DDR3 DIMMs clcket at 1600 MHz for a total of 8 GB at 1503 MHz with access latencies of 7-7-7-21 1T at 1.65 V.

Intel Sandy Bridge Core i7 2600K  Quadcore system (HT, 8 threads)

The memory subsystem consists of two 4 GB DDR3 1600 DIMM configured with 9-9-9-24 latencies (nominal).

SB_Die_630 Sandy Bridge 32 nm quad core.

Core frequency  is set respectively to 4.0, 4.4 and 4.6 GHz with BIOS multiplier adjustment. Tuebo mode was disabled.

The 8 MB Last Level Cache is synchronous to the core clock, it runs at the same frequency and is divided into four 2 MB banks each with 16-way associativity accesed through a ring bus.

Processor cooling

Because of the low voltages used (the lowest possible that ensures complete stability, thanks to hand picked processors…), the heat dissipation is very low and controlled.

This is not a serious problem for the fans - radiator used, that’s the excellent Scythe Mugen 2 B configured with two Slip Stream 12 cm fans in push - pull configuration.

Cinebench R11.5 results 

Designación Frecuencia 1 thread Multithreaded SpeedUp
Intel Core i7 2600K 3,4 / 3,8 GHz 3,4 / 3,8 GHz 1,55 6,96 4,49 X
Intel Core i7 2600K 4,0 GHz 4,0 GHz 1,62 7,82 4,82 X
Intel Core i7 2600K 4,4 GHz 4,4 GHz 1,76 8,61 4,89 X
Intel Core i7 2600K 4,6 GHz 4,6 GHz 1,86 8,91 4,83 X
Intel Core i7 930 D0 4,0 / 4,2 GHz 4,0 / 4.2 GHz 1,45 6,88 4,74 X
AMD Phenom II X6 1090T 4 GHz 4,0 GHz 1,25 7,07 5,66 X
Nehalem, Sandy Bridge and Thuban out of specification.

There’s no need for much comments…

The new Intel Sandy Bridge processor is faster in nominal mode than the out of specification Core i7 930 Bloomfield set to 4.0 / 4.2 GHz (the practical high frequency limit at the used voltages for 24 h /365 day use).

The AMD Phenom II X6 aggressively defends its position thanks to the six phisical cores providing an excellent multithreaded result that exceeds the Core i7 930 4.0 / 4.2 GHz with Hyper Threading enabled and the new Core i7 2600K in nominal mode.

imageGraphic results. CineBench R11.5.

Note that the i7 930 4.0 / 4.2 GHz system is already extremely fast but surely Sandy Bridge is the new performance king and definitely outshines Nehalem despite having only a dual channel DDR3 1600 memory subsystem compared to the Nehalem 1451 MHz triple channel.

Also remember that in this test Nehalem has a Uncore and L3 cache frequency increased to 3.44 GHz from 2.66 GHz nominal, which helps greatly.

Multicore / multithread scaling

In this section I analyze the calculation speed increase with thread count.

The AMD Phenom II X6 supports six threads, one per core, … no SMT folks!

Both the Core i7 930 and the new Core i7 2600K supports 8 threads, two threads for each physical core (4 cores, 8 threads). The 2600K is currently the one and only Sandy Bridge QuadCore with SMT enables.

imageMultithreaded performance scaling. Cinebench R11.5.

Starting with the AMD Phenom II X6 I found a very good scaling coming very close to the theoretical maximum value of 6X, at 5.66X. It says a lot about the excellent work done by AMD engineers with respect to the L3 cache and DDR3 concurrent access by all cores.

I’ve no doubt AMD, with an Intel-style SMT implementation, would exceed the 6X factor and possibly can be around 8X  for a multithreaded score of about 10 points, exceeding the Core i7 2600K 4.6 GHz calculation speed.

AMD should emphasize its per core IPC, currently much lower than on Intel CPUs (about 20 - 30%) and especially the work must concentrate in Branch Prediction processing accuracy and increase the L1 caches associativity.

For the two Intel processors, I must note the Hyper Threading effect (the two-way SMT implemented in them). It manages to overcome the 4X factor (quad core) and approaching 5X. Also a remarkable result, one extra core almost for free.

I can’t say much more ... Nehalem age is over …

Link to my original post in Spanish at ProfessionalSAT from Jan 12 2011

No comments:

Post a Comment