By: David Kanter (dkanter.delete@this.realworldtech.com), April 30, 2012 6:40 pm
Room: Moderated Discussions
hcl64 (mario.smarq@gmail.com) on 4/30/12 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 4/30/12 wrote:
>---------------------------
>>
>>>Also in that light SMT can be as efficiently implemented in a narrower
>>>core, cause is not for heavy computations. Also if memory access will be more important
>>>"a run-ahead scheme", that is, continue executing speculatively based on predicted
>>>and stored values and addresses, after a L1 or L2 miss(cache miss mitigation), seems
>>>better than SMT or spMT and so efficiently done at a >narrower core also.
>>
>>Run-ahead is a much simpler version of out-of-order. Judging by the results of
>>the POWER6, it's not a particularly attractive design choice. IBM went back to OOOE pretty quickly.
>>
>
>yes somehow related to OOOE, but in this what i was thinking was more based on
>the work of James David Dundas, and is more of a dynamic pre-processing on the code
>stream, that is, upon a L1 or L2 *data* miss(L1, L2, L3, depends on how aggressive
>you want to take it) a processor *doesn't stall waiting* but can continue executing
>instructions based on predicted speculative *data* addresses and values, without
>waiting for the correct addresses and values to arrive, or dependencies to be resolved(
>it can assume a very lax approach to dependencies).
It seems like that only works when your value predictions are highly accurate. If they aren't...you're going to waste tons of power.
>Contrary to forms of spMT or the Scout threads of Rock chip, this doesn't incur
>the penalties of a context switch( OTOH as limits on how much instructions can be
>processed this way- usually less than 3 digits by run, >depending on size of additional buffers etc.).
I think the real question is how much can you really get done in the shadow of a cache miss? How much could you get done on another thread? How much on an OOOE?
>Pre-processing so because many instructions can be so wrongly speculated and must
>be re-executed, but many can also be correctly speculated(large majority of the
>speculated in average if the prediction mechanisms are very good), and so it can
>be a very good pre-fetch and pre-execution method warming >all caches and accelerating execution.
"If" the predictions are correct, is a big if. I don't know how expensive it is to make a >90% accuracy value predictor in HW.
>In this light, i think "run-ahead" and SMT can't happen on the same core at the
>same time... but there can be a switch between modes on the >same core in a conditional subordinated way.
>>>I'm not implying about which is better, to me none is better at all things, not
>>>even now... but i'm very curious about AMD approach alright, that is why i inquired
>>>about fusing on-the-fly 2 Integer macro-ops into one XOP >before
>>>really curious if even Integer processing can be done in >good extent at a co-processor
>>>( FlexFPU *is* a co-processor).
>>
>>1. What do you mean by a coprocessor? To me the FPU is just an FPU shared by two
>>cores. It's no more a coprocessor than the FPUs in Sandy Bridge are coprocessors.
>>
>
>ummm... no not like SNB at all
How is it different?
>Co-processor organization
>
>http://techreport.com/r.x/bulldozer-uarch/bulldozer-fpu.jpg
That's a marketing diagram. It's not a definition. What do you think a co-processor is?
>So *perhaps*(don't know any details) that is one of the >beauties of a *module*.
>Co-processors can not only happen on different PCBs, in >different dies interposed
>in MCM, on the same die on different parts of a integrated >xbar, but now also on much closer proximity...
You still haven't told me what a "co-processor" is...
>>2. Speculatively fusing x86 instructions is challenging. What if they aren't adjacent
>>in the code stream? Macro-fusion requires this, and it's common for CMP+JMP. But
>>that's not necessarily true for integer adds.
>>
>>3. How do you handle load/store alignment?
>>
>>4. How do you handle exceptions that occur between the two instructions? You'd have to do a partial register rollback.
>>
>>5. How many x86 integer instructions have been extended with XOP? It's a relatively
>>small number (add, multiply-add, compare).
>>
>>Honestly, I think it would be more productive to try and speculativcely fuse FP
>>MUL and ADD, if you had an FMA unit with intermediate rounding.
>>
>
>Good questions, that perhaps you could pose to an AMD >representative, if you have the chance Mr Kanter.
Why would I bother to ask? They aren't doing such a thing.
>IF i'm not mistaken the majority of XOP is Integer (3 >operand) total pack has more
>than one hundredth instructions defined, i think.
>
>Perhaps integer fusing could be also a good thing.
Doubtful. The latency would also kill you.
David
---------------------------
>David Kanter (dkanter@realworldtech.com) on 4/30/12 wrote:
>---------------------------
>>
>>>Also in that light SMT can be as efficiently implemented in a narrower
>>>core, cause is not for heavy computations. Also if memory access will be more important
>>>"a run-ahead scheme", that is, continue executing speculatively based on predicted
>>>and stored values and addresses, after a L1 or L2 miss(cache miss mitigation), seems
>>>better than SMT or spMT and so efficiently done at a >narrower core also.
>>
>>Run-ahead is a much simpler version of out-of-order. Judging by the results of
>>the POWER6, it's not a particularly attractive design choice. IBM went back to OOOE pretty quickly.
>>
>
>yes somehow related to OOOE, but in this what i was thinking was more based on
>the work of James David Dundas, and is more of a dynamic pre-processing on the code
>stream, that is, upon a L1 or L2 *data* miss(L1, L2, L3, depends on how aggressive
>you want to take it) a processor *doesn't stall waiting* but can continue executing
>instructions based on predicted speculative *data* addresses and values, without
>waiting for the correct addresses and values to arrive, or dependencies to be resolved(
>it can assume a very lax approach to dependencies).
It seems like that only works when your value predictions are highly accurate. If they aren't...you're going to waste tons of power.
>Contrary to forms of spMT or the Scout threads of Rock chip, this doesn't incur
>the penalties of a context switch( OTOH as limits on how much instructions can be
>processed this way- usually less than 3 digits by run, >depending on size of additional buffers etc.).
I think the real question is how much can you really get done in the shadow of a cache miss? How much could you get done on another thread? How much on an OOOE?
>Pre-processing so because many instructions can be so wrongly speculated and must
>be re-executed, but many can also be correctly speculated(large majority of the
>speculated in average if the prediction mechanisms are very good), and so it can
>be a very good pre-fetch and pre-execution method warming >all caches and accelerating execution.
"If" the predictions are correct, is a big if. I don't know how expensive it is to make a >90% accuracy value predictor in HW.
>In this light, i think "run-ahead" and SMT can't happen on the same core at the
>same time... but there can be a switch between modes on the >same core in a conditional subordinated way.
>>>I'm not implying about which is better, to me none is better at all things, not
>>>even now... but i'm very curious about AMD approach alright, that is why i inquired
>>>about fusing on-the-fly 2 Integer macro-ops into one XOP >before
>>>really curious if even Integer processing can be done in >good extent at a co-processor
>>>( FlexFPU *is* a co-processor).
>>
>>1. What do you mean by a coprocessor? To me the FPU is just an FPU shared by two
>>cores. It's no more a coprocessor than the FPUs in Sandy Bridge are coprocessors.
>>
>
>ummm... no not like SNB at all
How is it different?
>Co-processor organization
>
>http://techreport.com/r.x/bulldozer-uarch/bulldozer-fpu.jpg
That's a marketing diagram. It's not a definition. What do you think a co-processor is?
>So *perhaps*(don't know any details) that is one of the >beauties of a *module*.
>Co-processors can not only happen on different PCBs, in >different dies interposed
>in MCM, on the same die on different parts of a integrated >xbar, but now also on much closer proximity...
You still haven't told me what a "co-processor" is...
>>2. Speculatively fusing x86 instructions is challenging. What if they aren't adjacent
>>in the code stream? Macro-fusion requires this, and it's common for CMP+JMP. But
>>that's not necessarily true for integer adds.
>>
>>3. How do you handle load/store alignment?
>>
>>4. How do you handle exceptions that occur between the two instructions? You'd have to do a partial register rollback.
>>
>>5. How many x86 integer instructions have been extended with XOP? It's a relatively
>>small number (add, multiply-add, compare).
>>
>>Honestly, I think it would be more productive to try and speculativcely fuse FP
>>MUL and ADD, if you had an FMA unit with intermediate rounding.
>>
>
>Good questions, that perhaps you could pose to an AMD >representative, if you have the chance Mr Kanter.
Why would I bother to ask? They aren't doing such a thing.
>IF i'm not mistaken the majority of XOP is Integer (3 >operand) total pack has more
>than one hundredth instructions defined, i think.
>
>Perhaps integer fusing could be also a good thing.
Doubtful. The latency would also kill you.
David
Topic | Posted By | Date |
---|---|---|
Phoronix tests GCC compiler flags and Bulldozer. | I.S.T. | 2012/04/19 02:05 AM |
Single page view? | David Kanter | 2012/04/19 07:59 AM |
Single page view? | wainwright | 2012/04/19 08:22 AM |
Single page view? | slothrop | 2012/04/19 08:23 AM |
Single page view? | David Kanter | 2012/04/19 08:31 AM |
Single page view? | EduardoS | 2012/04/19 02:12 PM |
Is there a single page view option for RWT articles? | anon | 2012/04/19 08:27 AM |
Single page view? | Del | 2012/04/19 08:36 AM |
Single page view? | slacker | 2012/04/19 02:56 PM |
Single page view? | Del | 2012/04/22 05:09 AM |
Single page view? | David Kanter | 2012/04/22 08:38 AM |
Single page view? | Del | 2012/04/23 12:22 AM |
Single page view? | Michael S | 2012/04/19 12:30 PM |
Single page view? | Ungo | 2012/04/19 01:25 PM |
Single page view? | Foo_ | 2012/04/19 11:17 PM |
Single page view? | James | 2012/04/20 03:01 AM |
There are ads on the web? | JJB | 2012/04/20 03:32 AM |
What a bunch of freeloaders (NT) | slacker | 2012/04/20 12:44 PM |
So are you, probably | iz | 2012/04/21 03:41 AM |
Impression ad revenue | Paul A. Clayton | 2012/04/21 05:44 AM |
So are you, probably | slacker | 2012/04/21 12:09 PM |
So are you, probably | David Kanter | 2012/04/22 08:41 AM |
So are you, probably | iz | 2012/04/22 02:57 PM |
So are you, probably | Doug Siebert | 2012/04/22 11:37 AM |
Aha! | David Kanter | 2012/04/22 02:45 PM |
Aha! | bakaneko | 2012/04/22 07:49 PM |
So are you, probably | iz | 2012/04/22 02:48 PM |
That's not how the business works... | David Kanter | 2012/04/22 04:31 PM |
That's not how the business works... | iz | 2012/04/23 12:49 AM |
So are you, probably | slacker | 2012/04/22 10:31 PM |
back to phoronix | Michael S | 2012/04/23 01:07 AM |
So are you, probably | iz | 2012/04/23 02:29 AM |
Membership at RWT | David Kanter | 2012/04/23 10:24 AM |
So are you, probably | Jukka Larja | 2012/04/27 07:59 AM |
So, what do people think of these numbers> | I.S.T. | 2012/04/19 06:34 PM |
So, what do people think of these numbers> | Linus Torvalds | 2012/04/20 07:34 AM |
So, what do people think of these numbers> | Kira | 2012/04/20 08:18 AM |
So, what do people think of these numbers> | Linus Torvalds | 2012/04/20 09:05 AM |
So, what do people think of these numbers> | Doug Siebert | 2012/04/20 08:00 PM |
So, what do people think of these numbers> | Megol | 2012/04/21 08:05 AM |
So, what do people think of these numbers> | Linus Torvalds | 2012/04/21 12:11 PM |
Most problems are fixed... | Megol | 2012/04/24 06:00 AM |
So, what do people think of these numbers> | bakaneko | 2012/04/20 10:16 AM |
So, what do people think of these numbers> | bakaneko | 2012/04/20 10:37 AM |
So, what do people think of these numbers> | Linus Torvalds | 2012/04/20 12:24 PM |
So, what do people think of these numbers> | Joel | 2012/04/20 01:59 PM |
So, what do people think of these numbers> | Kira | 2012/04/20 02:32 PM |
So, what do people think of these numbers> | EduardoS | 2012/04/20 03:00 PM |
Bulldozer's Oddities. | Joel | 2012/04/20 03:54 PM |
In defense of Bulldozer's Oddities | David Kanter | 2012/04/20 04:32 PM |
In defense of Bulldozer's Oddities | Exophase | 2012/04/20 06:11 PM |
In defense of Bulldozer's Oddities | EduardoS | 2012/04/20 06:46 PM |
In defense of Bulldozer's Oddities | Exophase | 2012/04/20 07:18 PM |
In defense of Bulldozer's Oddities | anonymous | 2012/04/20 10:26 PM |
In defense of Bulldozer's Oddities | JJB | 2012/04/20 10:34 PM |
In defense of Bulldozer's Oddities | imaxx | 2012/04/21 06:21 AM |
In defense of Bulldozer's Oddities | Michael S | 2012/04/21 09:42 AM |
Bulldozer's integer execution units | David Kanter | 2012/04/25 03:29 PM |
Bulldozer's integer execution units | Exophase | 2012/04/26 11:17 AM |
Bulldozer's integer execution units | anonymous | 2012/04/26 02:15 PM |
Bulldozer's integer execution units | EduardoS | 2012/04/26 02:40 PM |
Bulldozer's integer execution units | Foo_ | 2012/04/27 07:21 AM |
Bulldozer's integer execution units | Megol | 2012/04/27 12:38 PM |
Bulldozer's integer execution units | EduardoS | 2012/04/26 02:47 PM |
Bulldozer's integer execution units | Exophase | 2012/04/26 04:02 PM |
Bulldozer's integer execution units | EduardoS | 2012/04/26 05:03 PM |
Bulldozer's integer execution units | Exophase | 2012/04/26 05:24 PM |
Bulldozer's integer execution units | EduardoS | 2012/04/26 06:18 PM |
Bulldozer's cache memory performance | Heikki Kultala | 2012/04/28 12:18 AM |
Bulldozer's cache memory performance | EduardoS | 2012/04/28 09:06 AM |
Bulldozer's integer execution units | David Kanter | 2012/04/26 03:03 PM |
Bulldozer's integer execution units | Exophase | 2012/04/26 03:59 PM |
Bulldozer's integer execution units | David Kanter | 2012/04/26 09:53 PM |
Bulldozer's integer execution units | Exophase | 2012/04/27 07:42 AM |
Bulldozer's integer execution units | David Kanter | 2012/04/27 10:06 AM |
Bulldozer's integer execution units | EduardoS | 2012/04/27 12:27 PM |
K8 divided pipelines? | Paul A. Clayton | 2012/04/27 12:59 PM |
Bulldozer's integer execution units | Michael S | 2012/04/27 03:37 AM |
Bulldozer's integer execution units | Exophase | 2012/04/27 07:33 AM |
Bulldozer's integer execution units | anonymous | 2012/04/27 08:03 AM |
Renaming Flags | Konrad Schwarz | 2012/04/27 02:04 AM |
Renaming Flags | none | 2012/04/27 03:03 AM |
Renaming Flags | Megol | 2012/04/27 11:42 AM |
Bulldozer's integer execution units | hcl64 | 2012/04/27 03:31 PM |
VEX supports 3+ operands. FPU have renaming already(NT) | Megol | 2012/04/28 07:20 AM |
In defense of Bulldozer's Oddities | Linus Torvalds | 2012/04/21 11:26 AM |
Thanks for the lesson | JJB | 2012/04/21 01:23 PM |
Side note.. | Linus Torvalds | 2012/04/21 01:57 PM |
In defense of Bulldozer's Oddities | Exophase | 2012/04/21 11:13 AM |
In defense of Bulldozer's Oddities | EduardoS | 2012/04/21 11:53 AM |
In defense of Bulldozer's Oddities | Gionatan Danti | 2012/04/21 11:42 AM |
In defense of Bulldozer's Oddities | hcl64 | 2012/04/27 04:07 PM |
In defense of Bulldozer's Oddities | David Kanter | 2012/04/28 05:29 AM |
In defense of Bulldozer's Oddities | hcl64 | 2012/04/28 01:44 PM |
In defense of Bulldozer's Oddities | David Kanter | 2012/04/28 08:42 PM |
In defense of Bulldozer's Oddities | hcl64 | 2012/04/28 09:39 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/20 05:05 PM |
Bulldozer's Oddities. | anon | 2012/04/20 07:32 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/21 11:37 AM |
Bulldozer's Oddities. | anon | 2012/04/21 09:16 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/21 09:43 PM |
Bulldozer's Oddities. | anon | 2012/04/22 01:09 AM |
Bulldozer's Oddities. | EduardoS | 2012/04/22 12:57 PM |
Bulldozer's Oddities. | anon | 2012/04/22 03:17 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/22 04:05 PM |
Bulldozer's Oddities. | anon | 2012/04/22 04:42 PM |
Bulldozer's Oddities. | anon | 2012/04/22 05:01 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/22 09:28 PM |
Bulldozer's Oddities. | anon | 2012/04/22 10:05 PM |
Bulldozer's isn't bad. | a reader | 2012/04/21 09:01 AM |
Bulldozer's isn't bad. | Kira | 2012/04/21 10:29 AM |
Bulldozer's isn't bad. | hcl64 | 2012/04/27 04:58 PM |
Bulldozer's isn't bad. | anon | 2012/04/27 05:16 PM |
Bulldozer's isn't bad. | hcl64 | 2012/04/27 06:33 PM |
Bulldozer's isn't bad. | rwessel | 2012/04/27 10:12 PM |
Bulldozer's isn't bad. | EduardoS | 2012/04/28 08:29 AM |
Bulldozer's isn't bad. | EduardoS | 2012/04/28 08:30 AM |
Bulldozer's isn't bad. | Michael S | 2012/04/28 11:36 AM |
Bulldozer is made for SPEC fp | Pelle-48 | 2012/04/21 10:41 AM |
Bulldozer's Oddities. | mpx | 2012/04/22 02:47 AM |
Bulldozer's Oddities. | EduardoS | 2012/04/22 12:57 PM |
Bulldozer's Oddities. | mpx | 2012/04/23 06:04 AM |
Bulldozer's Oddities. | Eric | 2012/04/23 11:33 AM |
Bulldozer's Oddities. | EduardoS | 2012/04/23 01:22 PM |
Bulldozer's Oddities. | Eric | 2012/04/23 06:30 PM |
Bulldozer's Oddities. | hcl64 | 2012/04/27 05:16 PM |
Bulldozer's Oddities. | Y | 2012/04/25 03:34 AM |
Bulldozer's IDIV | Heikki Kultala | 2012/04/27 09:56 PM |
Bulldozer's IDIV | Y | 2012/04/30 12:51 AM |
Bulldozer's IDIV | EduardoS | 2012/04/30 04:39 AM |
Bulldozer's IDIV | P3Dnow | 2012/05/08 12:23 AM |
Bulldozer's IDIV | Exophase | 2012/05/08 06:37 AM |
Bulldozer's Oddities. | EduardoS | 2012/04/23 01:15 PM |
Clustered MT as SMT for high frequency | Paul A. Clayton | 2012/04/20 03:10 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/27 11:56 PM |
Clustered MT as SMT for high frequency | anonymous | 2012/04/28 12:43 AM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 01:59 PM |
Clustered MT as SMT for high frequency | anonymous | 2012/04/28 07:45 PM |
Clustered MT as SMT for high frequency | anon | 2012/04/28 01:13 AM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 02:23 PM |
Clustered MT as SMT for high frequency | anon | 2012/04/28 05:19 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 06:58 PM |
Clustered MT as SMT for high frequency | David Kanter | 2012/04/28 05:38 AM |
Guessed meaning of "strong dependency model" | Paul A. Clayton | 2012/04/28 06:24 AM |
Guessed meaning of "strong dependency model" | EduardoS | 2012/04/28 08:46 AM |
*Right meaning* about "strong dependency model" | hcl64 | 2012/04/28 03:59 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 03:24 PM |
Clustered MT as SMT for high frequency | anonymous | 2012/04/28 07:50 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 08:47 PM |
SNB width | David Kanter | 2012/04/28 08:48 PM |
SNB width | hcl64 | 2012/04/29 01:24 AM |
Clustered MT as SMT for high frequency | David Kanter | 2012/04/28 08:56 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 10:44 PM |
SOI, FD vs. PD | David Kanter | 2012/04/29 06:19 AM |
SOI, FD vs. PD | hcl64 | 2012/04/29 04:31 PM |
SOI, FD vs. PD | David Kanter | 2012/04/29 10:26 PM |
SOI, FD vs. PD | hcl64 | 2012/04/30 07:08 AM |
SOI, FD vs. PD | David Kanter | 2012/04/30 08:59 AM |
SOI, FD vs. PD | hcl64 | 2012/04/30 05:10 PM |
SOI, FD vs. PD | David Kanter | 2012/04/30 05:32 PM |
SOI, FD vs. PD | hcl64 | 2012/04/30 09:47 PM |
SOI, FD vs. PD | David Kanter | 2012/05/01 01:24 AM |
SOI, FD vs. PD | hcl64 | 2012/05/01 04:46 AM |
SOI, FD vs. PD | hcl64 | 2012/05/01 05:37 AM |
SOI, FD vs. PD | David Kanter | 2012/05/01 07:19 AM |
SOI, FD vs. PD | hcl64 | 2012/05/01 06:39 AM |
PD-SOI | David Kanter | 2012/05/02 11:22 AM |
SOI, FD vs. PD | slacker | 2012/04/30 07:10 PM |
SOI, FD vs. PD | David Kanter | 2012/04/30 09:16 PM |
SOI, FD vs. PD | slacker | 2012/05/01 09:04 PM |
SOI, FD vs. PD | David Kanter | 2012/05/02 07:19 AM |
SOI, FD vs. PD | zou | 2012/05/02 11:23 AM |
Previous discussion of clustered MT | Paul A. Clayton | 2012/04/28 06:00 AM |
Previous discussion of clustered MT | hcl64 | 2012/04/28 08:38 PM |
Previous discussion of clustered MT | David Kanter | 2012/04/30 03:37 PM |
Previous discussion of clustered MT | hcl64 | 2012/04/30 06:24 PM |
Previous discussion of clustered MT | David Kanter | 2012/04/30 06:40 PM |
Previous discussion of clustered MT | hcl64 | 2012/05/01 08:15 AM |
Latency issues | David Kanter | 2012/05/02 11:01 AM |
So, what do people think of these numbers> | Megol | 2012/04/21 12:57 AM |