What makes a CPU fast?

Posted: March 19, 2011 by ryanlecocq in Technology

Turbo?  Turbo!

If you don’t understand why a 3.4 gigahertz dual-core Pentium D is much slower than a 1.86 gigahertz Core2Duo, it isn’t your fault.  The problem is that the numbers clearly printed on the box don’t mean anything and won’t help you at all when choosing a new CPU.  Although the CPU frequencies do rate how fast the processor itself is, they in no way describe real world performance.

Another figure that is in smaller print is the Front-side Bus speed on Intel chips and the Hypertransport speed on AMD chips.  This is the first thing you need to know as it tells you how fast the processor can really send electricity through the motherboard.  On Intel chips each core runs at it’s internal rated frequency, but each core can only connect through the front-side bus to the Northbridge at the FSB’s rated speed (it gets slightly more complicated as the rated speed is a multiplier of the actual speed, but this is the layman’s explanation).  This means that a 1.8ghz CPU with two cores on a 1333mhz FSB can send it’s information  through the front-side bus about 60% faster than a CPU with an 800MHz FSB like the Pentium D.  There have been a few exceptions like Hyper-Threading technology, which attempt to circumvent this basic rule, but nothing has ever really worked.  So a Core2Duo with a 1333mhz front-side bus speed is much faster than a Pentium D with 800mhz front-side bus speed regardless of clock speed.  Hypertransport is much easier to understand as it just gives you the final figure of total rated transfer speed (in MTs megatransfers).  So an AM3 motherboard with a 3400MT rated speed can put 3400mhz through to the Northbridge at once regardless of cores or rated speeds.

Once you understand the MT speed that your CPU can actually perform at, we get into how to tell which processor at the same MT speed is faster.  This is where cache size does matter.  The level one, level two and now level 3 caches on CPUs allow the processor to use that extra speed that isn’t ready to go through the motherboard yet.  The cache saves the solutions to math already done so that they can be sent through the bus as soon as they are needed.  So this is how a quad-core processor with a 2mb L2 cache and a 6mb L3 cache is much, much faster than a dual-core with only a 1MB L2 cache.  Not only does the quad core have twice the cores with the same amount of L2 cache for each core, it has another level of cache to hold even larger amounts of data that can be sent through as many as 8 times more directions (circuits) at once.  So although both CPUs may be filling the entire 3400MT that can go through the board at once, the quad-core is pulling the solutions to FAQs off the top of it’s head, while the dual-core is trying to remember High School history.

I’m updating this now because when I wrote this I didn’t know much about why Intel chips were faster than AMD ones per core besides better instruction sets.  The CPU performance is generally measured in FLOPS or Floating-point Operations Per Second.  The floating point unit or units of a CPU determine how efficiently it can put data through all of those parts and circuits I described above.  Intel has stayed ahead in per-core performance by putting significant development into entirely new CPU designs every few years.  While AMD has taken longer between each cycle, but each cycle is about the same jump as that between Intel’s, they have fallen behind in the last 6-8 year period.  Intel’s CPUs are just plain faster at doing math and therefore do things better.  So even if AMD increases the bus speed, cache size and clock speed above an Intel chip, the Intel chip will still be faster in many cases because it is doing the initial calculations more efficiently.  So there ya go.

So the complex simple answer is that to have the fastest CPU, you need to have the fastest possible MT speed supported through the motherboard and you need to keep it filled with the most data on hand you can.  So first determine your motherboard’s front-side bus or Hypertransport speed, then, consider how much cache there is for each core to constantly fill that speed with data.  So why is clock speed the first stat on any CPU?  Well first of all it’s the easiest thing to increase without changing the design at all.  Secondly it does matter slightly as the processor does math faster internally, reducing any waiting involved.  Overall faster speed just usually means a better CPU in a series.  A CPU at the high end of a model run is going to have more bells and whistles and the manufacturer will clock it up to make it clearly better.  Never compare clockspeed across series though.  A Regor dual-core Athlon II @3.1Ghz is faster than a quad-core Phenom I @1.8Ghz.  The fact that 2×3.1=6.2 and 4×1.8=7.2 is irrelevant.  There ya go, now you know more about microchips than 95% of America and at least 30% of China.

EDIT:  I posted a reply below in the comments, but here’s an explanation on single or double vs. multi-threaded apps.  Many programs and pre-DirectX 11 games are single or dual threaded.  This means if you have 3 or more cores, the program will just divide the work of one or two cores over all of them or only use the first 2 cores.  This is only incrementally faster than having one or two cores.  So basically for these purposes you want the fastest core speed and largest cache for each core.  This is why Intel i-series CPUs outperform AMD CPUs with 2-4 times the cores.  The Intels are faster, have larger cache per core and have a new and more effective version of Hyper-threading that works in many apps.  For gaming however, DirectX 11 is becoming more and more standard and makes more cores utilized fully.  DirectX 11 also makes any form of dual-gpu setup effective.  While games that are single-threaded can see little benefit from SLI or Xfire and even less from Hybrid Graphics/Optimus/Intel HD hybrid systems, DirectX 11 games see gains even on AMD’s new Dual Graphics APU+discreet combos.

  1. pvkd44 says:

    Nice article! It makes sense now, kinda!

    Does all the above apply to the performance of a particular program (or script) which may appear to run on a single processor, rather than the performance of the entire CPU?

    To put it another way, for a single process, would a quad core still complete the process faster than a dual core with identical core speed and fsb?

    If so is this down to more cache or the quad vs dual design?

      • ryanlecocq says:

        Sorry, that was a really poorly written and confusing explanation. EDIT:

        For single-threaded apps or double threaded apps you want the fastest individual core stats. You want each core to have the fastest speed and largest cache possible. So for single threaded apps, an Intel dual-core i5 is much faster than an AMD hexacore 1055T. In DirectX 11 gaming the AMD uses all 6 cores as much as necessary so the hexa is better.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s