Thursday, January 27, 2011

Intel Sandy Bridge out of specification 4.0, 4.4 and 4.6 GHz. Updated – HighPerformanceSystems

In this small article I will present the CineBench R11.5 results of three high end configurations, three High Performance Systems configured out of specification at frecuencies of 4 GHz or more. Download the latest versión of CienBench R11.5.

The first one is a Core i7 930 D0 with triple channel DDR3 at core frecuencies of 4.0 / 4.2 GHz, the second system an AMD Phenom II X6 clcked at 4 GHz and finally the brand new Intel Core i7 2600K. These last two machines have a more tipical dual channel DDR3.

CBR11.5CineBench R11.5.

For a detailed analysis of the underlying microarchitecture of these new Sandy Bridge processors, I recommend several of my past articles on another of my Blogs: LowLevelHardware:

- Microarquitectura Intel Sandy Bridge. Parte 1. Actualizado – LowLevelHardware

- Previo Intel Sandy Bridge. Actualizado – LowLevelHardware
- Intel Sandy Bridge versus Westmere die. Actualizado – LowLevelHardware
- Micrografía detallada de Intel Sandy Bridge – ProfessionalSAT
- Intel Sandy Bridge. Introducción – ProfessionalSAT

Update January 19, 2011: Adding R11.5 CineBench multithreaded performance of AMD Phenom II X6 out of specification at 4 GHz with Northbridge and 6MB L3 Cache set to 2.82 GHz.

Update January 21, 2011: Adding mutithread scaling section and general comments about AMD Phenom II microarchitecture and future improvements in AMD processors. Some graphical errors fixed.

Intel Nehalem Core i7 930 Quadcore D0 system:

This processor family has populated almost all my High Performance Systems for the last two years. Those CPUs has shown excellent IPC and absolute performance and a very high frequency potential from his firsts (C0 / C1) commercial steppings.

In some fortunate hand picked samples I can hit clock levels of 4.0 / 4.2 GHz fully stable in Prime95 X64 SmallFFT, InPlace and Blend (48h each) and IntelBurnTest with 200 cicles. In those cases all energy saving techniques are activated, including C1, C3, C6 and C7 states.

The adjustments are described bellow:

The memory subsystem consists of three 2 GB DDR3 modules in triple channel configuration at a effective frequency of 1451 MHz with access latencies 7-7-7-14-1N and 1.64 V.

SandyB_Nehalem_WestmereIntel Nehalem (left) vs. Sandy Bridge (right).

The core frequency is at 4 GHz with 8 threads maximum load and up to 4.2 GHz with single threaded load (both frequencies are sustained and stable with 100% saturation load, Prime 95 X64).

The core voltage is 1.30 V without activating Load Line Calibration and Uncore voltage is also at 1.30 V. PLL voltaje is nominal at 1.80 V.

The Uncore (memory controllers and other internal buses and the 8 MB L3 cache) are set to 3,439 GHz in BIOS.

AMD Thuban Phenom II X6 Hexacore system

The Phenom II X6 processor is set at a constant frequency of 4 GHz (from 2.8 GHz nominal) through a 282 MHz bus clock and a X14 multiplier. The AMD Turbo Core BIOS option remains disabled.

The North Bridge (which includes 6 MB and 48-way L3 cache, data buses and memory controllers) is set to 2.82 GHz (from 2 GHz nominal) through a X10 multiplier.

Memory subsystem: two 4 GB DDR3 DIMMs clcket at 1600 MHz for a total of 8 GB at 1503 MHz with access latencies of 7-7-7-21 1T at 1.65 V.

Intel Sandy Bridge Core i7 2600K  Quadcore system (HT, 8 threads)

The memory subsystem consists of two 4 GB DDR3 1600 DIMM configured with 9-9-9-24 latencies (nominal).

SB_Die_630 Sandy Bridge 32 nm quad core.

Core frequency  is set respectively to 4.0, 4.4 and 4.6 GHz with BIOS multiplier adjustment. Tuebo mode was disabled.

The 8 MB Last Level Cache is synchronous to the core clock, it runs at the same frequency and is divided into four 2 MB banks each with 16-way associativity accesed through a ring bus.

Processor cooling

Because of the low voltages used (the lowest possible that ensures complete stability, thanks to hand picked processors…), the heat dissipation is very low and controlled.

This is not a serious problem for the fans - radiator used, that’s the excellent Scythe Mugen 2 B configured with two Slip Stream 12 cm fans in push - pull configuration.

Cinebench R11.5 results 

Designación Frecuencia 1 thread Multithreaded SpeedUp
Intel Core i7 2600K 3,4 / 3,8 GHz 3,4 / 3,8 GHz 1,55 6,96 4,49 X
Intel Core i7 2600K 4,0 GHz 4,0 GHz 1,62 7,82 4,82 X
Intel Core i7 2600K 4,4 GHz 4,4 GHz 1,76 8,61 4,89 X
Intel Core i7 2600K 4,6 GHz 4,6 GHz 1,86 8,91 4,83 X
Intel Core i7 930 D0 4,0 / 4,2 GHz 4,0 / 4.2 GHz 1,45 6,88 4,74 X
AMD Phenom II X6 1090T 4 GHz 4,0 GHz 1,25 7,07 5,66 X
Nehalem, Sandy Bridge and Thuban out of specification.

There’s no need for much comments…

The new Intel Sandy Bridge processor is faster in nominal mode than the out of specification Core i7 930 Bloomfield set to 4.0 / 4.2 GHz (the practical high frequency limit at the used voltages for 24 h /365 day use).

The AMD Phenom II X6 aggressively defends its position thanks to the six phisical cores providing an excellent multithreaded result that exceeds the Core i7 930 4.0 / 4.2 GHz with Hyper Threading enabled and the new Core i7 2600K in nominal mode.

imageGraphic results. CineBench R11.5.

Note that the i7 930 4.0 / 4.2 GHz system is already extremely fast but surely Sandy Bridge is the new performance king and definitely outshines Nehalem despite having only a dual channel DDR3 1600 memory subsystem compared to the Nehalem 1451 MHz triple channel.

Also remember that in this test Nehalem has a Uncore and L3 cache frequency increased to 3.44 GHz from 2.66 GHz nominal, which helps greatly.

Multicore / multithread scaling

In this section I analyze the calculation speed increase with thread count.

The AMD Phenom II X6 supports six threads, one per core, … no SMT folks!

Both the Core i7 930 and the new Core i7 2600K supports 8 threads, two threads for each physical core (4 cores, 8 threads). The 2600K is currently the one and only Sandy Bridge QuadCore with SMT enables.

imageMultithreaded performance scaling. Cinebench R11.5.

Starting with the AMD Phenom II X6 I found a very good scaling coming very close to the theoretical maximum value of 6X, at 5.66X. It says a lot about the excellent work done by AMD engineers with respect to the L3 cache and DDR3 concurrent access by all cores.

I’ve no doubt AMD, with an Intel-style SMT implementation, would exceed the 6X factor and possibly can be around 8X  for a multithreaded score of about 10 points, exceeding the Core i7 2600K 4.6 GHz calculation speed.

AMD should emphasize its per core IPC, currently much lower than on Intel CPUs (about 20 - 30%) and especially the work must concentrate in Branch Prediction processing accuracy and increase the L1 caches associativity.

For the two Intel processors, I must note the Hyper Threading effect (the two-way SMT implemented in them). It manages to overcome the 4X factor (quad core) and approaching 5X. Also a remarkable result, one extra core almost for free.

I can’t say much more ... Nehalem age is over …

Echad un vistazo a la web de mi nueva empresa, un proyecto de gran envergadura que llevo preparando hace más de un año.

Os lo recomiendo para diseño de sistemas de altas prestaciones:

ip16_texto_300px_blanco[4][2][2][2]

Allí tenéis a vuestra disposición el formulario de contacto, para consultas sobre este artículo hacedlo más abajo en la sección de comentarios.

Y mi nuevo Blog de contenido muy técnico y actualizado donde encontraréis artículos míos sobre hardware, procesadores y sistemas y también otros posts de expertos programadores e informáticos sobre otros temas de actualidad:

infromaticapremium-blog[4][2][2][2]

Link to my original post in Spanish at ProfessionalSAT from Jan 12 2011

Wednesday, December 22, 2010

4 GB DDR3 modules. Updated – HighPerformanceSystems

These days I am doing some experiments with 4 GB DDR3 memory modules on a Core i7 930 at 4.2 GHz (181 MHz X23 Turbo Mode) for a total of 12 GB. In ProfessionalSAT, two articles detail some aspects of systems designed with these components (in Spanish, sadly I´m in the process of translation…):

3DIMM_4GB_DDR3_13333 DDR3 1333 DIMMs, 4 GB each.

The first and early conclusions are:

  • I can’t configure them as I normally do with their 2 GB counterparts. Timings of 7-7-7-14 1T on socket 1366 motherboards for Core i7 900 series are unreachable, in this case I only managed to reach 8-8-8-24 2T with full establility.
  • I also notice a higher thermal dissipation with 100% load. Temperature is really high on the surface of the memory chips, being recommendable direct air cooling.
  • Its maximum frequency with nominal 9-9-9-27 2T timings is 1500 MHz and is reached at 1.64V.

DSCF0522The chips are much greater in area than its 2 GB analogues.

This amount of memory (12 GB) helps greatly with Windows 7 X64, especially when compressing large volumes of data with 7zip in LZMA2 mode at 8 threads with large dictionary sizes. It is a task that I routinely run and the compression time decreases in a very remarkable way.

DSCF0521Detail of one of the chips.

Another possible use of this amount of memory for a user without that need for its software mix is to allocate 4 GB for a RAM disk (ramdisk) staying with 8 GB of memory for the operating system.

DSCF0523Kingston KVR1333D3N9/4G

No doubt we soon will see our systems populated by modules of this capacity when its price ratio lowers compared to the standard 2 GB ones.

When the memory manufacturers reach the next node in manufacturing, thereby reducing voltage and die area, we will be able to increase the frequency of these 4 GB DIMMs without any problems.

Link to my original post in Spanish at LowLevelHardware from Nov 3 2010.

Core i7 4 GHz Liquid Cooling 12 GB AMD HD6850 Corsair Force F60 SSD – HighPerformanceSystems

In this short article I´ll delve in the characteristics of one of the most afordable liquid cooling systems for high performance microprocessors, the Corsair H50. The test is done in a machine with a Core i7 930 processor clocked at 4 GHz with 12 GB DDR3 in triple channel, one Radeon HD6850 1 GB GDDR5 and a Corsair Force F60 SSD for the operating system, in this case Windows 7 Home Premium X64.

To begin with, I must say I’m not very enthusiastic about liquid cooling, since I see few advantages for an investment similar to a good tower type cooler with multiple heatpipes if we care properly of the heat flows inside the tower.

clip_image001Corsair H50 radiator in push-pull configuration.

The very unique advantage I see is the huge thermal inertia of water. Its temperature increase is rather slow due to its high specific heat of 4.18 kJ / (kg * K). Simply put, it takes many watts (J / s) to increase significantly the temperature of a small mass of water.

In this case the liquid cooling system is a compact Corsair H50 Hydro Series:

clip_image002The simple, factory pre-assembled and sealed, Corsair H50.

The most striking design characteristic of this water cooling kit is the water block (the part that makes contact with the processor).

The water block has a really great design and has a very high thermal performance. Its base is pure copper without any metal passivation (without zinc or nickel). This is the key to performance.

clip_image003Corsair H50 water block, pure copper.

Given the computer tower design, one Aerocool VS-9, plenty of ventilation holes, I had several options for radiator mounting. I have chosen to position it in the top rack and with the air intake from outside.

With this mounting I got the lowest temperatures (by far) on both full loading and idle.

clip_image004Aerocool VS-9 tower.

The fresh air intakes from the top panel, passes through the Corsair H50 radiator and is exhausted by the second fan inside the tower. The exhausted air, at idle and with a room temperature of 18º C outside the tower rises only 4º at 22° C and a peak of 39º C with a 100% load in Prime95 InPlace Round off Checking at 4 GHz.

clip_image005Push-pull configuration. Air entering through top panel.

In fact, thanks to this air inlet and another fan also placed in the front entrance, I achieve a positive pressure gradient that helps to dissipate the heat of the new AMD GPU:

The HD 6850 with 1 GB GDDR5 manufactured by ASUS

clip_image006AMD HD 6850 1GB GDDR5.

As you know the new AMD HD 6850 has one of the best price – performance of the whole GPU market.

clip_image00712 cm front fan in push configuration.

Thus, the cooling system of this machine is as follows:

  • One 12 cm front input fan.
  • One 12 cm rear output fan.
  • H50 Corsair Kit with two 12 cm fans in push-pull configuration at the top panel pushing air into the tower.
  • The power supply is mounted on the low panel with its fan intaking fresh air and expelling it through the back panel.
  • GPU: AMD HD 6850 1GB GDDR5 expelling air through the rear panel.

Final pressure balance inside the tower: positive (greater than outside atmospheric pressure). Provided that the system has a powerful GPU with high thermal emission is absolutely critical to obtain positive pressure. This will the push out the heat of the GPU naturally.

clip_image008System overview. On top of the DVD recorder you will appreciate the Corsair Force F60 SSD .

The operating system and the most used programs are installed in the SSD with his brutal performance in random accesses (up to 50000 IOPS), leaving a conventional 2 TB hard disk for data (suitable for mostly sequential workloads).

clip_image009Corsair H50, controlled temperatures and low noise.

Thus the CPU temperature at idle is approximately 6 to 8º C lower than the same system with a Scythe Yasya with two fans in push-pull. In load we have the following behavior:

At the very beginning of the stress test execution (Prime 95 InPlace Round off Checking) we face a very positive surprise with temperatures 12 to 15º C lower than with a high-end tower cooler.

This is because the water temperature should be around 28 to 32º C at this moment. Thanks to the large thermal inertia of H2O we have a few minutes of surprisingly low temperatures...

clip_image010Corsair H50 Water Block and the AMD 6850.

As time passes, coolant temperature goes up and we get increasingly higher temperatures. After about 60 - 80 minutes of test we reach a temperature plateau which is stabilized at 70 º C with transitory peaks of 72º C. The radiator exhaust air is around 39º C (18º C Outdoor temp).

With a Scythe Yasya with two 12 cm fans in push-pull I get 76 º C (absolute peak after 6 h) under the same conditions. They are also entirely satisfactory temperatures at these frequencies and such high load.

clip_image011Excellent results for the Corsair H50.

Also I want to emphasize the rather imperceptible noise level of the water pump in the Corsair H50. It rotates at around 1200 - 1500 rpm and its behavior is entirely satisfactory.

Conclusions:

I must admit, the Corsair H50 has changed in many ways my opinion about liquid cooling at low prices. Its superb thermal design and lower or comparable temperatures than state-of-the-art heatpipe air tower coolers has earned my respect and admiration for a very good piece of engineering.

In short, a great and compact design with the best performance for its price as long as it’s mounted the right way.

All photographs were taken with an excelent FujiFilm FinePix HS10.

Link to my original post in Spanish at ProfessionalSAT from Dec 15 2010.