By: Gipsel (abc.delete@this.def.com), May 22, 2007 5:05 am
Room: Moderated Discussions
dess (dess@nospam.com) on 5/22/07 wrote:
---------------------------
>>>Moreover, AFAIK, AMD's Pack Buffer emits 3 _macro-ops_, >whose can contain up to
>>>6 micro-ops, while Intel's path is 4 _uop_. Or is this >"uop" a macro-op in AMD's terms?
>>
>>Intel's uops and AMD's micro-ops, macro-ops, etc. are not really comparable directly.
>
>Right, but I would take here what Agner wrote about it: "A macro-operation in AMD
>terminology is somewhat similar to a fused micro-operation in Intel terminology."
Intel changed more or less their definition of a µOp with the introduction of the PentiumM (later renamed to core, the ones which incorporate µOp fusion). They don't differentiate between fused µOps and not fused ones. When looking at the possible µOps fused together you really get the impression that intel more or less adopted AMDs concept of MacroOps consisting of an ALU µOp and and load-store µOp. Both (fused µOps and MacroOps) get split up in two individual µOps later in the pipeline. Agner is absolutely right that they are more or less equivalent.
But I don't think Core2 is really that much wider than Barcelona. First you have to consider that Core 2 has more restrictions to its 4 decoders (only one complex one, smaller fetch bandwidth and smaller predecode buffer). It's clear that core2 is not able to sustain a higher decode rate than Barcelona on all workloads. Additionally the core2 has a six issue execution core, but only three issue ports can accept arithmetical instruction. The other three issue ports only accepts memory operations. That means the peak execution rate is also not necessarily higher on core2 than on K8/Barcelona. But the 4µOp execution rate of the Core2 will help if you have some "RISCy" x86 code, that means separate load instruction followed by register operations instead of load-use operations (but core2 should still prefer one load-use operation over two separate ones).
>(BTW, Agner is using micro-op all along, also on Intel's CPUs, instead of uop. uop
>is perhaps just a witty abbrevation, isn't it?)
Exactly. Actually it isn't uOp, it's µOp. As you know, "µ" is the greek letter "micro". Most keyboards simply lack a key for that letter.
---------------------------
>>>Moreover, AFAIK, AMD's Pack Buffer emits 3 _macro-ops_, >whose can contain up to
>>>6 micro-ops, while Intel's path is 4 _uop_. Or is this >"uop" a macro-op in AMD's terms?
>>
>>Intel's uops and AMD's micro-ops, macro-ops, etc. are not really comparable directly.
>
>Right, but I would take here what Agner wrote about it: "A macro-operation in AMD
>terminology is somewhat similar to a fused micro-operation in Intel terminology."
Intel changed more or less their definition of a µOp with the introduction of the PentiumM (later renamed to core, the ones which incorporate µOp fusion). They don't differentiate between fused µOps and not fused ones. When looking at the possible µOps fused together you really get the impression that intel more or less adopted AMDs concept of MacroOps consisting of an ALU µOp and and load-store µOp. Both (fused µOps and MacroOps) get split up in two individual µOps later in the pipeline. Agner is absolutely right that they are more or less equivalent.
But I don't think Core2 is really that much wider than Barcelona. First you have to consider that Core 2 has more restrictions to its 4 decoders (only one complex one, smaller fetch bandwidth and smaller predecode buffer). It's clear that core2 is not able to sustain a higher decode rate than Barcelona on all workloads. Additionally the core2 has a six issue execution core, but only three issue ports can accept arithmetical instruction. The other three issue ports only accepts memory operations. That means the peak execution rate is also not necessarily higher on core2 than on K8/Barcelona. But the 4µOp execution rate of the Core2 will help if you have some "RISCy" x86 code, that means separate load instruction followed by register operations instead of load-use operations (but core2 should still prefer one load-use operation over two separate ones).
>(BTW, Agner is using micro-op all along, also on Intel's CPUs, instead of uop. uop
>is perhaps just a witty abbrevation, isn't it?)
Exactly. Actually it isn't uOp, it's µOp. As you know, "µ" is the greek letter "micro". Most keyboards simply lack a key for that letter.
Topic | Posted By | Date |
---|---|---|
Barcelona Article Online | David Kanter | 2007/05/16 03:20 AM |
Barcelona Article Online | PiedPiper | 2007/05/16 05:12 AM |
Yes, I left out a sentence there. Fixed (NT) | David Kanter | 2007/05/16 12:07 PM |
Barcelona Article Online | anonymous | 2007/05/16 06:01 AM |
Barcelona Article Online | Anonymous | 2007/05/16 06:28 PM |
Barcelona Article Online | anonymous | 2007/05/16 07:52 PM |
Barcelona Article Online | Anonymous1 | 2007/05/16 07:08 AM |
Barcelona Article Online | Dean M | 2007/05/16 11:09 AM |
Barcelona Article Online | David Kanter | 2007/05/16 12:38 PM |
Barcelona Article Online | Dean M | 2007/05/16 02:10 PM |
Barcelona Article Online | IntelUser2000 | 2007/05/16 02:59 PM |
Barcelona Article Online | Linus Torvalds | 2007/05/16 03:24 PM |
Barcelona Article Online | David Kanter | 2007/05/16 04:57 PM |
Barcelona Article Online | Michael S | 2007/05/17 05:07 AM |
Barcelona Article Online | IntelUser2000 | 2007/05/18 08:58 PM |
8 socket servers | Doug Siebert | 2007/05/16 04:58 PM |
8 socket servers | Michael S | 2007/05/17 05:20 AM |
8 socket servers | Joe Chang | 2007/05/17 07:38 AM |
8 socket servers | Alex Jones | 2007/05/17 09:35 AM |
8 socket servers | Jose | 2007/05/23 08:23 AM |
8 socket servers | Michael S | 2007/05/23 11:37 AM |
8 socket servers | anonymous | 2007/05/26 03:49 PM |
8 socket servers | Joe Chang | 2007/05/27 01:46 PM |
8 socket servers | Doug Siebert | 2007/05/23 09:56 PM |
8 socket servers | Joe Chang | 2007/05/24 04:33 AM |
8 socket servers | Anonymous | 2007/05/24 11:18 AM |
8 socket servers | Doug Siebert | 2007/05/24 10:47 PM |
8 socket servers | Linus Torvalds | 2007/05/25 10:35 AM |
8 socket servers | Nick | 2007/05/25 02:29 AM |
Performance estimation seems odd | Hotar | 2007/05/17 01:54 AM |
Performance estimation seems odd | David Kanter | 2007/05/17 08:38 AM |
microops vs macroops on page 4 | Peter Lund | 2007/05/17 12:04 PM |
microops vs macroops on page 4 | David Kanter | 2007/05/21 04:51 PM |
microops vs macroops on page 4 | EduardoS | 2007/05/21 05:42 PM |
microops vs macroops on page 4 | dess | 2007/05/21 07:00 PM |
Barcelona Article Online | Peter Lund | 2007/05/17 12:25 PM |
macro-op vs. micro-op | dess | 2007/05/21 07:24 AM |
macro-op vs. micro-op | David Kanter | 2007/05/21 04:38 PM |
macro-op vs. micro-op | dess | 2007/05/21 06:15 PM |
macro-op vs. micro-op | David Kanter | 2007/05/22 12:11 AM |
macro-op vs. micro-op | dess | 2007/05/22 03:56 AM |
macro-op vs. micro-op | Gipsel | 2007/05/22 05:05 AM |
macro-op vs. micro-op | dess | 2007/05/22 05:52 AM |
macro-op vs. micro-op | anonymous | 2007/05/22 06:14 AM |
macro-op vs. micro-op | dess | 2007/05/22 06:44 AM |
macro-op vs. micro-op | EduardoS | 2007/05/22 02:19 PM |
macro-op vs. micro-op | dess | 2007/05/24 08:52 AM |
Stop comparing apples to oranges | EduardoS | 2007/05/22 02:30 PM |
Stop comparing apples to oranges | dess | 2007/05/22 04:09 PM |
Stop comparing apples to oranges | dess | 2007/05/22 04:30 PM |
Stop comparing apples to oranges | EduardoS | 2007/05/22 04:31 PM |
Stop comparing... apples to oranges? | dess | 2007/05/24 09:30 AM |
Stop comparing apples to oranges | anonymous | 2007/05/22 08:12 PM |
Stop comparing apples to oranges | EduardoS | 2007/05/23 02:50 PM |
macro-op vs. micro-op | anonymous | 2007/05/22 06:08 AM |
macro-op vs. micro-op | dess | 2007/05/22 06:40 AM |
macro-op vs. micro-op | anonymous | 2007/05/22 06:48 AM |
macro-op vs. micro-op | dess | 2007/05/21 08:30 PM |
macro-op vs. micro-op | anonymous | 2007/05/22 06:44 AM |
macro-op vs. micro-op | dess | 2007/05/24 09:38 AM |
macro-op vs. micro-op | Michael S | 2007/05/22 05:26 AM |