Barcelona vs Core2

By: Vincent Diepeveen (, May 16, 2007 6:35 am
Room: Moderated Discussions
David Kanter ( on 5/16/07 wrote:
>Vincent Diepeveen ( on 5/13/07 wrote:
>>If core2 can retire 4 uops per cycle and barcelona can >retire 3 uops a cycle i
>>understand, then core2 can blow that barcelona core >completely away. That's 33% faster speed.
>Yes and the concord can fly faster than a 787...oh wait, no, it doesn't fly anymore : )
>I have yet to find any code with > 2 uops/cycle, so Intel's 4th issue/execute/retire
>slot really doesn't help all that much. Nobody consistently has IPC=3...
>It helps for clearing queuing up, but it really isn't that important. What IPC does your code get on Core2 anyways?

It really doesn't matter what software i look to that i wrote myself, they all profit bigtime from a 4th execution unit.

Your math model might need a more realistic approach.

The total speed of your software gets dominated by the average IPC you can get, yet consistently getting above 2.0 is total unnecessary to already profit from the possibility to execute 4 a cycle.

Let's show in a sample calculation how relevant your remark is that it should be 'consistently higher' than ipc 3.0.

Let's use an example where a program in 80% of the code cannot profit from moving from 3 to 4 integer units, that means that if 20% does profit.

That 20% gets a speedup of 33%.

Your total program speedup then is:

100% - 80% - (20 * 3 / 4 ) = 5%

So even for software that hardly can need a 4th unit, already can easily get a speedup of 5% from it.

In reality however, many instructions from intel are dead slow, so the observed speedup at chips that can retire 4 instructions a cycle is far bigger than that 5%.

