By: hcl64 (mario.smarq.delete@this.gmail.com), April 28, 2012 8:38 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton@gmail.com) on 4/28/12 wrote:
---------------------------
>hcl64 (mario.smarq@gmail.com) on 4/28/12 wrote:
>---------------------------
>>Paul A. Clayton (paaronclayton@gmail.com) on 4/20/12 wrote:
>>---------------------------
>[snip]
>>A 6 wide issue processor for x86 is simply a pipe dream...
>>until there will be ways to considerably break the "strong
>>dependency model" of x86 it will be out of reach.
>
>Support for 6 wide execution is not so much physically
>impractical as economically impractical. x86 is not in a
>performance at any cost market, and most of the code run on
>x86 is not high-ILP code.
>
*I got to wonder if its necessary at all*. Perhaps heavy computational tasks can start to use OpenCL and heterogeneous approaches... it seams a trend, even IBM already has cryptographic/compression engines on die of its z196 chips, HPC is starting to use GPGPU pervasively... and results speak loud;
An illustration
https://amdfusion.activeevents.com/scheduler/catalog.do MM-4172 "" Then we used OpenCL to optimize VLC’s scaling filter, which is used to enlarge or shrink the video on the fly during playback. This OpenCL optimization has achieved speedups of up to *10x* on Llano and *18x* on Trinity compared to a competing CPU.""
one got to wonder about this... i bet some here use that piece of OSS (i do)... and when the arguing "ab noxious" is about which is 20 to 30% better, what to think when the arguing could be orders of magnitude... what the future can bring. What role will be left to the CPU ?
>*It is not clear* what you mean by "strong dependency
>model". A quick google found a use of that by you in
>another forum where you seem to refer to memory
>dependency checking. This is not a particularly x86
>issue. (Yes, with only 16 GPRs x86 will have more
>memory activity, but this is not a huge barrier and having
>fewer GPRs helps in renaming and load-op instructions
>communicate single-use temporaries 'registers'.)
>
All over in the work of Mikko Lipasti
http://www.realworldtech.com/forums/index.cfm?action=detail&id=128868&threadid=128602&roomid=2
>OoO can expose some ILP and wide execution can be useful
>at times after a dependency on a long latency operation is
>resolved (wide execution could also be helpful in branch
>misprediction recovery with few checkpoints); but the
>cost-benefit ratio seems to favor more moderate width.
>
>>4 may be already too much (BD is a false 4 wide issue),
>
>The value of 4 wide depends on design budget and design
>goals. For Intel, high single-thread performance is more
>practical (e.g., higher volume allowing higher absolute
>design costs and more binning) and perhaps more important
>than for AMD, and Intel seems committed to SMT (which
>benefits from 'excessive' width).
>
Yes OoO is here to stay, but one has to wonder what is the future of CPU, what it could be like, besides being a control & access processing element for the heterogeneous crowd.
In that light i don't know if a "wide" u-arch is beneficial for branch, more than a narrower one. Also in that light SMT can be as efficiently implemented in a narrower core, cause is not for heavy computations. Also if memory access will be more important "a run-ahead scheme", that is, continue executing speculatively based on predicted and stored values and addresses, after a L1 or L2 miss(cache miss mitigation), seems better than SMT or spMT and so efficiently done at a narrower core also.
**Yes, now Intel and AMD seam to be definitely trailing different approaches:**
Intel will try to beef even more their cores, perhaps even more SMT ways ( 4x per core), complemented with good HTM support...
AMD seems more on a Decoupled approach (decoupled access execute) where the CPU will be more of a control & access processing element for their *co-processores* : FlexFPUs, GNC CU, CCP(cryptographic), Managed code Fabric engines(java/javascript/HSAIL/possible C#), DRAM I/O engines with IOMMU & DirectGMA(DMA from co-processor to co-processor)...
http://news.ncsu.edu/releases/wmszhougpucpu/
I'm not implying about which is better, to me none is better at all things, not even now... but i'm very curious about AMD approach alright, that is why i inquired about fusing on-the-fly 2 Integer macro-ops into one XOP before
http://www.realworldtech.com/forums/index.cfm?action=detail&id=128834&threadid=128602&roomid=2
really curious if even Integer processing can be done in good extent at a co-processor ( FlexFPU *is* a co-processor).
>Width is also somewhat flexible in meaning given the
>potential to fuse operations, and cascaded ALUs could
>further confuse the matter.
>
>[snip]
>BD was clearly targeting higher frequency (like P4) and
>used a relatively small (especially for AMD) Dcache (like
>P4).
Small Dcache... i think someone already answer that, perhaps they were too cocky about their pre-fetch schemes cleverness.
---------------------------
>hcl64 (mario.smarq@gmail.com) on 4/28/12 wrote:
>---------------------------
>>Paul A. Clayton (paaronclayton@gmail.com) on 4/20/12 wrote:
>>---------------------------
>[snip]
>>A 6 wide issue processor for x86 is simply a pipe dream...
>>until there will be ways to considerably break the "strong
>>dependency model" of x86 it will be out of reach.
>
>Support for 6 wide execution is not so much physically
>impractical as economically impractical. x86 is not in a
>performance at any cost market, and most of the code run on
>x86 is not high-ILP code.
>
*I got to wonder if its necessary at all*. Perhaps heavy computational tasks can start to use OpenCL and heterogeneous approaches... it seams a trend, even IBM already has cryptographic/compression engines on die of its z196 chips, HPC is starting to use GPGPU pervasively... and results speak loud;
An illustration
https://amdfusion.activeevents.com/scheduler/catalog.do MM-4172 "" Then we used OpenCL to optimize VLC’s scaling filter, which is used to enlarge or shrink the video on the fly during playback. This OpenCL optimization has achieved speedups of up to *10x* on Llano and *18x* on Trinity compared to a competing CPU.""
one got to wonder about this... i bet some here use that piece of OSS (i do)... and when the arguing "ab noxious" is about which is 20 to 30% better, what to think when the arguing could be orders of magnitude... what the future can bring. What role will be left to the CPU ?
>*It is not clear* what you mean by "strong dependency
>model". A quick google found a use of that by you in
>another forum where you seem to refer to memory
>dependency checking. This is not a particularly x86
>issue. (Yes, with only 16 GPRs x86 will have more
>memory activity, but this is not a huge barrier and having
>fewer GPRs helps in renaming and load-op instructions
>communicate single-use temporaries 'registers'.)
>
All over in the work of Mikko Lipasti
http://www.realworldtech.com/forums/index.cfm?action=detail&id=128868&threadid=128602&roomid=2
>OoO can expose some ILP and wide execution can be useful
>at times after a dependency on a long latency operation is
>resolved (wide execution could also be helpful in branch
>misprediction recovery with few checkpoints); but the
>cost-benefit ratio seems to favor more moderate width.
>
>>4 may be already too much (BD is a false 4 wide issue),
>
>The value of 4 wide depends on design budget and design
>goals. For Intel, high single-thread performance is more
>practical (e.g., higher volume allowing higher absolute
>design costs and more binning) and perhaps more important
>than for AMD, and Intel seems committed to SMT (which
>benefits from 'excessive' width).
>
Yes OoO is here to stay, but one has to wonder what is the future of CPU, what it could be like, besides being a control & access processing element for the heterogeneous crowd.
In that light i don't know if a "wide" u-arch is beneficial for branch, more than a narrower one. Also in that light SMT can be as efficiently implemented in a narrower core, cause is not for heavy computations. Also if memory access will be more important "a run-ahead scheme", that is, continue executing speculatively based on predicted and stored values and addresses, after a L1 or L2 miss(cache miss mitigation), seems better than SMT or spMT and so efficiently done at a narrower core also.
**Yes, now Intel and AMD seam to be definitely trailing different approaches:**
Intel will try to beef even more their cores, perhaps even more SMT ways ( 4x per core), complemented with good HTM support...
AMD seems more on a Decoupled approach (decoupled access execute) where the CPU will be more of a control & access processing element for their *co-processores* : FlexFPUs, GNC CU, CCP(cryptographic), Managed code Fabric engines(java/javascript/HSAIL/possible C#), DRAM I/O engines with IOMMU & DirectGMA(DMA from co-processor to co-processor)...
http://news.ncsu.edu/releases/wmszhougpucpu/
I'm not implying about which is better, to me none is better at all things, not even now... but i'm very curious about AMD approach alright, that is why i inquired about fusing on-the-fly 2 Integer macro-ops into one XOP before
http://www.realworldtech.com/forums/index.cfm?action=detail&id=128834&threadid=128602&roomid=2
really curious if even Integer processing can be done in good extent at a co-processor ( FlexFPU *is* a co-processor).
>Width is also somewhat flexible in meaning given the
>potential to fuse operations, and cascaded ALUs could
>further confuse the matter.
>
>[snip]
>BD was clearly targeting higher frequency (like P4) and
>used a relatively small (especially for AMD) Dcache (like
>P4).
Small Dcache... i think someone already answer that, perhaps they were too cocky about their pre-fetch schemes cleverness.
Topic | Posted By | Date |
---|---|---|
Phoronix tests GCC compiler flags and Bulldozer. | I.S.T. | 2012/04/19 02:05 AM |
Single page view? | David Kanter | 2012/04/19 07:59 AM |
Single page view? | wainwright | 2012/04/19 08:22 AM |
Single page view? | slothrop | 2012/04/19 08:23 AM |
Single page view? | David Kanter | 2012/04/19 08:31 AM |
Single page view? | EduardoS | 2012/04/19 02:12 PM |
Is there a single page view option for RWT articles? | anon | 2012/04/19 08:27 AM |
Single page view? | Del | 2012/04/19 08:36 AM |
Single page view? | slacker | 2012/04/19 02:56 PM |
Single page view? | Del | 2012/04/22 05:09 AM |
Single page view? | David Kanter | 2012/04/22 08:38 AM |
Single page view? | Del | 2012/04/23 12:22 AM |
Single page view? | Michael S | 2012/04/19 12:30 PM |
Single page view? | Ungo | 2012/04/19 01:25 PM |
Single page view? | Foo_ | 2012/04/19 11:17 PM |
Single page view? | James | 2012/04/20 03:01 AM |
There are ads on the web? | JJB | 2012/04/20 03:32 AM |
What a bunch of freeloaders (NT) | slacker | 2012/04/20 12:44 PM |
So are you, probably | iz | 2012/04/21 03:41 AM |
Impression ad revenue | Paul A. Clayton | 2012/04/21 05:44 AM |
So are you, probably | slacker | 2012/04/21 12:09 PM |
So are you, probably | David Kanter | 2012/04/22 08:41 AM |
So are you, probably | iz | 2012/04/22 02:57 PM |
So are you, probably | Doug Siebert | 2012/04/22 11:37 AM |
Aha! | David Kanter | 2012/04/22 02:45 PM |
Aha! | bakaneko | 2012/04/22 07:49 PM |
So are you, probably | iz | 2012/04/22 02:48 PM |
That's not how the business works... | David Kanter | 2012/04/22 04:31 PM |
That's not how the business works... | iz | 2012/04/23 12:49 AM |
So are you, probably | slacker | 2012/04/22 10:31 PM |
back to phoronix | Michael S | 2012/04/23 01:07 AM |
So are you, probably | iz | 2012/04/23 02:29 AM |
Membership at RWT | David Kanter | 2012/04/23 10:24 AM |
So are you, probably | Jukka Larja | 2012/04/27 07:59 AM |
So, what do people think of these numbers> | I.S.T. | 2012/04/19 06:34 PM |
So, what do people think of these numbers> | Linus Torvalds | 2012/04/20 07:34 AM |
So, what do people think of these numbers> | Kira | 2012/04/20 08:18 AM |
So, what do people think of these numbers> | Linus Torvalds | 2012/04/20 09:05 AM |
So, what do people think of these numbers> | Doug Siebert | 2012/04/20 08:00 PM |
So, what do people think of these numbers> | Megol | 2012/04/21 08:05 AM |
So, what do people think of these numbers> | Linus Torvalds | 2012/04/21 12:11 PM |
Most problems are fixed... | Megol | 2012/04/24 06:00 AM |
So, what do people think of these numbers> | bakaneko | 2012/04/20 10:16 AM |
So, what do people think of these numbers> | bakaneko | 2012/04/20 10:37 AM |
So, what do people think of these numbers> | Linus Torvalds | 2012/04/20 12:24 PM |
So, what do people think of these numbers> | Joel | 2012/04/20 01:59 PM |
So, what do people think of these numbers> | Kira | 2012/04/20 02:32 PM |
So, what do people think of these numbers> | EduardoS | 2012/04/20 03:00 PM |
Bulldozer's Oddities. | Joel | 2012/04/20 03:54 PM |
In defense of Bulldozer's Oddities | David Kanter | 2012/04/20 04:32 PM |
In defense of Bulldozer's Oddities | Exophase | 2012/04/20 06:11 PM |
In defense of Bulldozer's Oddities | EduardoS | 2012/04/20 06:46 PM |
In defense of Bulldozer's Oddities | Exophase | 2012/04/20 07:18 PM |
In defense of Bulldozer's Oddities | anonymous | 2012/04/20 10:26 PM |
In defense of Bulldozer's Oddities | JJB | 2012/04/20 10:34 PM |
In defense of Bulldozer's Oddities | imaxx | 2012/04/21 06:21 AM |
In defense of Bulldozer's Oddities | Michael S | 2012/04/21 09:42 AM |
Bulldozer's integer execution units | David Kanter | 2012/04/25 03:29 PM |
Bulldozer's integer execution units | Exophase | 2012/04/26 11:17 AM |
Bulldozer's integer execution units | anonymous | 2012/04/26 02:15 PM |
Bulldozer's integer execution units | EduardoS | 2012/04/26 02:40 PM |
Bulldozer's integer execution units | Foo_ | 2012/04/27 07:21 AM |
Bulldozer's integer execution units | Megol | 2012/04/27 12:38 PM |
Bulldozer's integer execution units | EduardoS | 2012/04/26 02:47 PM |
Bulldozer's integer execution units | Exophase | 2012/04/26 04:02 PM |
Bulldozer's integer execution units | EduardoS | 2012/04/26 05:03 PM |
Bulldozer's integer execution units | Exophase | 2012/04/26 05:24 PM |
Bulldozer's integer execution units | EduardoS | 2012/04/26 06:18 PM |
Bulldozer's cache memory performance | Heikki Kultala | 2012/04/28 12:18 AM |
Bulldozer's cache memory performance | EduardoS | 2012/04/28 09:06 AM |
Bulldozer's integer execution units | David Kanter | 2012/04/26 03:03 PM |
Bulldozer's integer execution units | Exophase | 2012/04/26 03:59 PM |
Bulldozer's integer execution units | David Kanter | 2012/04/26 09:53 PM |
Bulldozer's integer execution units | Exophase | 2012/04/27 07:42 AM |
Bulldozer's integer execution units | David Kanter | 2012/04/27 10:06 AM |
Bulldozer's integer execution units | EduardoS | 2012/04/27 12:27 PM |
K8 divided pipelines? | Paul A. Clayton | 2012/04/27 12:59 PM |
Bulldozer's integer execution units | Michael S | 2012/04/27 03:37 AM |
Bulldozer's integer execution units | Exophase | 2012/04/27 07:33 AM |
Bulldozer's integer execution units | anonymous | 2012/04/27 08:03 AM |
Renaming Flags | Konrad Schwarz | 2012/04/27 02:04 AM |
Renaming Flags | none | 2012/04/27 03:03 AM |
Renaming Flags | Megol | 2012/04/27 11:42 AM |
Bulldozer's integer execution units | hcl64 | 2012/04/27 03:31 PM |
VEX supports 3+ operands. FPU have renaming already(NT) | Megol | 2012/04/28 07:20 AM |
In defense of Bulldozer's Oddities | Linus Torvalds | 2012/04/21 11:26 AM |
Thanks for the lesson | JJB | 2012/04/21 01:23 PM |
Side note.. | Linus Torvalds | 2012/04/21 01:57 PM |
In defense of Bulldozer's Oddities | Exophase | 2012/04/21 11:13 AM |
In defense of Bulldozer's Oddities | EduardoS | 2012/04/21 11:53 AM |
In defense of Bulldozer's Oddities | Gionatan Danti | 2012/04/21 11:42 AM |
In defense of Bulldozer's Oddities | hcl64 | 2012/04/27 04:07 PM |
In defense of Bulldozer's Oddities | David Kanter | 2012/04/28 05:29 AM |
In defense of Bulldozer's Oddities | hcl64 | 2012/04/28 01:44 PM |
In defense of Bulldozer's Oddities | David Kanter | 2012/04/28 08:42 PM |
In defense of Bulldozer's Oddities | hcl64 | 2012/04/28 09:39 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/20 05:05 PM |
Bulldozer's Oddities. | anon | 2012/04/20 07:32 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/21 11:37 AM |
Bulldozer's Oddities. | anon | 2012/04/21 09:16 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/21 09:43 PM |
Bulldozer's Oddities. | anon | 2012/04/22 01:09 AM |
Bulldozer's Oddities. | EduardoS | 2012/04/22 12:57 PM |
Bulldozer's Oddities. | anon | 2012/04/22 03:17 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/22 04:05 PM |
Bulldozer's Oddities. | anon | 2012/04/22 04:42 PM |
Bulldozer's Oddities. | anon | 2012/04/22 05:01 PM |
Bulldozer's Oddities. | EduardoS | 2012/04/22 09:28 PM |
Bulldozer's Oddities. | anon | 2012/04/22 10:05 PM |
Bulldozer's isn't bad. | a reader | 2012/04/21 09:01 AM |
Bulldozer's isn't bad. | Kira | 2012/04/21 10:29 AM |
Bulldozer's isn't bad. | hcl64 | 2012/04/27 04:58 PM |
Bulldozer's isn't bad. | anon | 2012/04/27 05:16 PM |
Bulldozer's isn't bad. | hcl64 | 2012/04/27 06:33 PM |
Bulldozer's isn't bad. | rwessel | 2012/04/27 10:12 PM |
Bulldozer's isn't bad. | EduardoS | 2012/04/28 08:29 AM |
Bulldozer's isn't bad. | EduardoS | 2012/04/28 08:30 AM |
Bulldozer's isn't bad. | Michael S | 2012/04/28 11:36 AM |
Bulldozer is made for SPEC fp | Pelle-48 | 2012/04/21 10:41 AM |
Bulldozer's Oddities. | mpx | 2012/04/22 02:47 AM |
Bulldozer's Oddities. | EduardoS | 2012/04/22 12:57 PM |
Bulldozer's Oddities. | mpx | 2012/04/23 06:04 AM |
Bulldozer's Oddities. | Eric | 2012/04/23 11:33 AM |
Bulldozer's Oddities. | EduardoS | 2012/04/23 01:22 PM |
Bulldozer's Oddities. | Eric | 2012/04/23 06:30 PM |
Bulldozer's Oddities. | hcl64 | 2012/04/27 05:16 PM |
Bulldozer's Oddities. | Y | 2012/04/25 03:34 AM |
Bulldozer's IDIV | Heikki Kultala | 2012/04/27 09:56 PM |
Bulldozer's IDIV | Y | 2012/04/30 12:51 AM |
Bulldozer's IDIV | EduardoS | 2012/04/30 04:39 AM |
Bulldozer's IDIV | P3Dnow | 2012/05/08 12:23 AM |
Bulldozer's IDIV | Exophase | 2012/05/08 06:37 AM |
Bulldozer's Oddities. | EduardoS | 2012/04/23 01:15 PM |
Clustered MT as SMT for high frequency | Paul A. Clayton | 2012/04/20 03:10 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/27 11:56 PM |
Clustered MT as SMT for high frequency | anonymous | 2012/04/28 12:43 AM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 01:59 PM |
Clustered MT as SMT for high frequency | anonymous | 2012/04/28 07:45 PM |
Clustered MT as SMT for high frequency | anon | 2012/04/28 01:13 AM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 02:23 PM |
Clustered MT as SMT for high frequency | anon | 2012/04/28 05:19 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 06:58 PM |
Clustered MT as SMT for high frequency | David Kanter | 2012/04/28 05:38 AM |
Guessed meaning of "strong dependency model" | Paul A. Clayton | 2012/04/28 06:24 AM |
Guessed meaning of "strong dependency model" | EduardoS | 2012/04/28 08:46 AM |
*Right meaning* about "strong dependency model" | hcl64 | 2012/04/28 03:59 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 03:24 PM |
Clustered MT as SMT for high frequency | anonymous | 2012/04/28 07:50 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 08:47 PM |
SNB width | David Kanter | 2012/04/28 08:48 PM |
SNB width | hcl64 | 2012/04/29 01:24 AM |
Clustered MT as SMT for high frequency | David Kanter | 2012/04/28 08:56 PM |
Clustered MT as SMT for high frequency | hcl64 | 2012/04/28 10:44 PM |
SOI, FD vs. PD | David Kanter | 2012/04/29 06:19 AM |
SOI, FD vs. PD | hcl64 | 2012/04/29 04:31 PM |
SOI, FD vs. PD | David Kanter | 2012/04/29 10:26 PM |
SOI, FD vs. PD | hcl64 | 2012/04/30 07:08 AM |
SOI, FD vs. PD | David Kanter | 2012/04/30 08:59 AM |
SOI, FD vs. PD | hcl64 | 2012/04/30 05:10 PM |
SOI, FD vs. PD | David Kanter | 2012/04/30 05:32 PM |
SOI, FD vs. PD | hcl64 | 2012/04/30 09:47 PM |
SOI, FD vs. PD | David Kanter | 2012/05/01 01:24 AM |
SOI, FD vs. PD | hcl64 | 2012/05/01 04:46 AM |
SOI, FD vs. PD | hcl64 | 2012/05/01 05:37 AM |
SOI, FD vs. PD | David Kanter | 2012/05/01 07:19 AM |
SOI, FD vs. PD | hcl64 | 2012/05/01 06:39 AM |
PD-SOI | David Kanter | 2012/05/02 11:22 AM |
SOI, FD vs. PD | slacker | 2012/04/30 07:10 PM |
SOI, FD vs. PD | David Kanter | 2012/04/30 09:16 PM |
SOI, FD vs. PD | slacker | 2012/05/01 09:04 PM |
SOI, FD vs. PD | David Kanter | 2012/05/02 07:19 AM |
SOI, FD vs. PD | zou | 2012/05/02 11:23 AM |
Previous discussion of clustered MT | Paul A. Clayton | 2012/04/28 06:00 AM |
Previous discussion of clustered MT | hcl64 | 2012/04/28 08:38 PM |
Previous discussion of clustered MT | David Kanter | 2012/04/30 03:37 PM |
Previous discussion of clustered MT | hcl64 | 2012/04/30 06:24 PM |
Previous discussion of clustered MT | David Kanter | 2012/04/30 06:40 PM |
Previous discussion of clustered MT | hcl64 | 2012/05/01 08:15 AM |
Latency issues | David Kanter | 2012/05/02 11:01 AM |
So, what do people think of these numbers> | Megol | 2012/04/21 12:57 AM |