Previous discussion of clustered MT

By: David Kanter (dkanter.delete@this.realworldtech.com), April 30, 2012 4:37 pm
Room: Moderated Discussions
>>Support for 6 wide execution is not so much physically
>>impractical as economically impractical. x86 is not in a
>>performance at any cost market, and most of the code run on
>>x86 is not high-ILP code.
>>
>
>*I got to wonder if its necessary at all*. Perhaps heavy computational tasks can
>start to use OpenCL and heterogeneous approaches... it seams a trend, even IBM already
>has cryptographic/compression engines on die of its z196 chips, HPC is starting
>to use GPGPU pervasively... and results speak loud;

Most real HPC workloads are still using CPUs. GPUs are not nearly as efficient as vendors suggest for real algorithms that rely on substantial communication and complex data structures.

OpenCL is barely on the market, and doesn't have widespread adoption yet.

The reality is that CPU performance is still the most important factor for the vast majority of users.

Even if they don't need the performance, it still ends up as better battery life. The two are different sides of the same coin.

>MM-4172 "" Then we used OpenCL to optimize VLC’s scaling filter, which is used
>to enlarge or shrink the video on the fly during playback. This OpenCL optimization
>has achieved speedups of up to *10x* on Llano and *18x* on >Trinity compared to a competing CPU.""

Is this normally done using software or part of the dedicated video decoder? It seems like it should be part of a video decoder...

>one got to wonder about this... i bet some here use that piece of OSS (i do)...
>and when the arguing "ab noxious" is about which is 20 to 30% better, what to think
>when the arguing could be orders of magnitude... what the >future can bring. What role will be left to the CPU ?

I don't understand what you are saying. Can you try and rephrase that?

>>*It is not clear* what you mean by "strong dependency
>>model". A quick google found a use of that by you in
>>another forum where you seem to refer to memory
>>dependency checking. This is not a particularly x86
>>issue. (Yes, with only 16 GPRs x86 will have more
>>memory activity, but this is not a huge barrier and having
>>fewer GPRs helps in renaming and load-op instructions
>>communicate single-use temporaries 'registers'.)
>>
>
>All over in the work of Mikko Lipasti
>http://www.realworldtech.com/forums/index.cfm?action=detail&id=128868&threadid=128602&roomid=2

>>OoO can expose some ILP and wide execution can be useful
>>at times after a dependency on a long latency operation is
>>resolved (wide execution could also be helpful in branch
>>misprediction recovery with few checkpoints); but the
>>cost-benefit ratio seems to favor more moderate width.
>>
>>>4 may be already too much (BD is a false 4 wide issue),
>>
>>The value of 4 wide depends on design budget and design
>>goals. For Intel, high single-thread performance is more
>>practical (e.g., higher volume allowing higher absolute
>>design costs and more binning) and perhaps more important
>>than for AMD, and Intel seems committed to SMT (which
>>benefits from 'excessive' width).
>>
>
>Yes OoO is here to stay, but one has to wonder what is the >future of CPU, what
>it could be like, besides being a control & access >processing element for the heterogeneous crowd.

Higher performance, lower power undoubtedly.

>In that light i don't know if a "wide" u-arch is beneficial for branch, more than
>a narrower one.

For really really branchy code, the ILP is probably minimal and a 4-wide superscalar might be limited. But a lot of code is a mix of high ILP and low ILP. For example, analytic databases typically have IPC>1. That's an important workload that is unlikely to ever migrate to GPUs.

The real point is that even if IPC = 1, you spend a lot of time waiting for memory and disk. So in reality, execution is a lot of cycles with IPC = 0 and quite a few cycles with IPC >1. You need to be able to exploit those workloads where IPC = 3, 4 or even more to keep average IPC at a reasonable level.

>Also in that light SMT can be as efficiently implemented in a narrower
>core, cause is not for heavy computations. Also if memory access will be more important
>"a run-ahead scheme", that is, continue executing speculatively based on predicted
>and stored values and addresses, after a L1 or L2 miss(cache miss mitigation), seems
>better than SMT or spMT and so efficiently done at a >narrower core also.

Run-ahead is a much simpler version of out-of-order. Judging by the results of the POWER6, it's not a particularly attractive design choice. IBM went back to OOOE pretty quickly.

>**Yes, now Intel and AMD seam to be definitely trailing different approaches:**
>
>Intel will try to beef even more their cores, perhaps even >more SMT ways ( 4x per
>core), complemented with good HTM support...

I'm not sure 4 threads makes sense for client systems.

>AMD seems more on a Decoupled approach (decoupled access execute) where the CPU
>will be more of a control & access processing element for their *co-processores*
>: FlexFPUs, GNC CU, CCP(cryptographic), Managed code Fabric engines(java/javascript/HSAIL/possible
>C#), DRAM I/O engines with IOMMU & DirectGMA(DMA from co-processor to co-processor)...
>http://news.ncsu.edu/releases/wmszhougpucpu/

Honestly I don't think there's any real evidence that AMD is going that way with products. Researchers might be interested, but researchers wanted to build trace processors a decade or two ago.

Those kinds of ideas sound neat, but don't work in practice because the latency is horrific.

AMD seems to be converging around an architecture where you have a CPU and a GPU. Not sure what they are planning to do for crypto stuff...

>I'm not implying about which is better, to me none is better at all things, not
>even now... but i'm very curious about AMD approach alright, that is why i inquired
>about fusing on-the-fly 2 Integer macro-ops into one XOP >before
>really curious if even Integer processing can be done in >good extent at a co-processor
>( FlexFPU *is* a co-processor).

1. What do you mean by a coprocessor? To me the FPU is just an FPU shared by two cores. It's no more a coprocessor than the FPUs in Sandy Bridge are coprocessors.

2. Speculatively fusing x86 instructions is challenging. What if they aren't adjacent in the code stream? Macro-fusion requires this, and it's common for CMP+JMP. But that's not necessarily true for integer adds.

3. How do you handle load/store alignment?

4. How do you handle exceptions that occur between the two instructions? You'd have to do a partial register rollback.

5. How many x86 integer instructions have been extended with XOP? It's a relatively small number (add, multiply-add, compare).

Honestly, I think it would be more productive to try and speculativcely fuse FP MUL and ADD, if you had an FMA unit with intermediate rounding.

>>Width is also somewhat flexible in meaning given the
>>potential to fuse operations, and cascaded ALUs could
>>further confuse the matter.
>>
>>[snip]
>>BD was clearly targeting higher frequency (like P4) and
>>used a relatively small (especially for AMD) Dcache (like
>>P4).
>
>Small Dcache... i think someone already answer that, >perhaps they were too cocky
>about their pre-fetch schemes cleverness.

Well it definitely seems to have been a mistake given how slow the L2 is.

DK
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Phoronix tests GCC compiler flags and Bulldozer.I.S.T.2012/04/19 03:05 AM
  Single page view?David Kanter2012/04/19 08:59 AM
    Single page view?wainwright2012/04/19 09:22 AM
    Single page view?slothrop2012/04/19 09:23 AM
      Single page view?David Kanter2012/04/19 09:31 AM
        Single page view?EduardoS2012/04/19 03:12 PM
    Is there a single page view option for RWT articles?anon2012/04/19 09:27 AM
    Single page view?Del2012/04/19 09:36 AM
      Single page view?slacker2012/04/19 03:56 PM
        Single page view?Del2012/04/22 06:09 AM
          Single page view?David Kanter2012/04/22 09:38 AM
            Single page view?Del2012/04/23 01:22 AM
    Single page view?Michael S2012/04/19 01:30 PM
      Single page view?Ungo2012/04/19 02:25 PM
        Single page view?Foo_2012/04/20 12:17 AM
          Single page view?James2012/04/20 04:01 AM
            There are ads on the web?JJB2012/04/20 04:32 AM
              What a bunch of freeloaders (NT)slacker2012/04/20 01:44 PM
                So are you, probablyiz2012/04/21 04:41 AM
                  Impression ad revenuePaul A. Clayton2012/04/21 06:44 AM
                  So are you, probablyslacker2012/04/21 01:09 PM
                    So are you, probablyDavid Kanter2012/04/22 09:41 AM
                      So are you, probablyiz2012/04/22 03:57 PM
                    So are you, probablyDoug Siebert2012/04/22 12:37 PM
                      Aha!David Kanter2012/04/22 03:45 PM
                        Aha!bakaneko2012/04/22 08:49 PM
                    So are you, probablyiz2012/04/22 03:48 PM
                      That's not how the business works...David Kanter2012/04/22 05:31 PM
                        That's not how the business works...iz2012/04/23 01:49 AM
                      So are you, probablyslacker2012/04/22 11:31 PM
                        back to phoronixMichael S2012/04/23 02:07 AM
                        So are you, probablyiz2012/04/23 03:29 AM
                          Membership at RWTDavid Kanter2012/04/23 11:24 AM
                          So are you, probablyJukka Larja2012/04/27 08:59 AM
  So, what do people think of these numbers>I.S.T.2012/04/19 07:34 PM
    So, what do people think of these numbers>Linus Torvalds2012/04/20 08:34 AM
      So, what do people think of these numbers>Kira2012/04/20 09:18 AM
        So, what do people think of these numbers>Linus Torvalds2012/04/20 10:05 AM
      So, what do people think of these numbers>Doug Siebert2012/04/20 09:00 PM
      So, what do people think of these numbers>Megol2012/04/21 09:05 AM
        So, what do people think of these numbers>Linus Torvalds2012/04/21 01:11 PM
          Most problems are fixed...Megol2012/04/24 07:00 AM
    So, what do people think of these numbers>bakaneko2012/04/20 11:16 AM
      So, what do people think of these numbers>bakaneko2012/04/20 11:37 AM
        So, what do people think of these numbers>Linus Torvalds2012/04/20 01:24 PM
          So, what do people think of these numbers>Joel2012/04/20 02:59 PM
            So, what do people think of these numbers>Kira2012/04/20 03:32 PM
              So, what do people think of these numbers>EduardoS2012/04/20 04:00 PM
                Bulldozer's Oddities.Joel2012/04/20 04:54 PM
                  In defense of Bulldozer's OdditiesDavid Kanter2012/04/20 05:32 PM
                    In defense of Bulldozer's OdditiesExophase2012/04/20 07:11 PM
                      In defense of Bulldozer's OdditiesEduardoS2012/04/20 07:46 PM
                        In defense of Bulldozer's OdditiesExophase2012/04/20 08:18 PM
                          In defense of Bulldozer's Odditiesanonymous2012/04/20 11:26 PM
                            In defense of Bulldozer's OdditiesJJB2012/04/20 11:34 PM
                              In defense of Bulldozer's Odditiesimaxx2012/04/21 07:21 AM
                                In defense of Bulldozer's OdditiesMichael S2012/04/21 10:42 AM
                                  Bulldozer's integer execution unitsDavid Kanter2012/04/25 04:29 PM
                                    Bulldozer's integer execution unitsExophase2012/04/26 12:17 PM
                                      Bulldozer's integer execution unitsanonymous2012/04/26 03:15 PM
                                        Bulldozer's integer execution unitsEduardoS2012/04/26 03:40 PM
                                          Bulldozer's integer execution unitsFoo_2012/04/27 08:21 AM
                                            Bulldozer's integer execution unitsMegol2012/04/27 01:38 PM
                                      Bulldozer's integer execution unitsEduardoS2012/04/26 03:47 PM
                                        Bulldozer's integer execution unitsExophase2012/04/26 05:02 PM
                                          Bulldozer's integer execution unitsEduardoS2012/04/26 06:03 PM
                                            Bulldozer's integer execution unitsExophase2012/04/26 06:24 PM
                                              Bulldozer's integer execution unitsEduardoS2012/04/26 07:18 PM
                                                Bulldozer's cache memory performanceHeikki Kultala2012/04/28 01:18 AM
                                                  Bulldozer's cache memory performanceEduardoS2012/04/28 10:06 AM
                                      Bulldozer's integer execution unitsDavid Kanter2012/04/26 04:03 PM
                                        Bulldozer's integer execution unitsExophase2012/04/26 04:59 PM
                                          Bulldozer's integer execution unitsDavid Kanter2012/04/26 10:53 PM
                                            Bulldozer's integer execution unitsExophase2012/04/27 08:42 AM
                                              Bulldozer's integer execution unitsDavid Kanter2012/04/27 11:06 AM
                                                Bulldozer's integer execution unitsEduardoS2012/04/27 01:27 PM
                                                K8 divided pipelines?Paul A. Clayton2012/04/27 01:59 PM
                                          Bulldozer's integer execution unitsMichael S2012/04/27 04:37 AM
                                            Bulldozer's integer execution unitsExophase2012/04/27 08:33 AM
                                            Bulldozer's integer execution unitsanonymous2012/04/27 09:03 AM
                                    Renaming FlagsKonrad Schwarz2012/04/27 03:04 AM
                                      Renaming Flagsnone2012/04/27 04:03 AM
                                        Renaming FlagsMegol2012/04/27 12:42 PM
                                    Bulldozer's integer execution unitshcl642012/04/27 04:31 PM
                                      VEX supports 3+ operands. FPU have renaming already(NT)Megol2012/04/28 08:20 AM
                              In defense of Bulldozer's OdditiesLinus Torvalds2012/04/21 12:26 PM
                                Thanks for the lessonJJB2012/04/21 02:23 PM
                                  Side note..Linus Torvalds2012/04/21 02:57 PM
                            In defense of Bulldozer's OdditiesExophase2012/04/21 12:13 PM
                            In defense of Bulldozer's OdditiesEduardoS2012/04/21 12:53 PM
                    In defense of Bulldozer's OdditiesGionatan Danti2012/04/21 12:42 PM
                    In defense of Bulldozer's Odditieshcl642012/04/27 05:07 PM
                      In defense of Bulldozer's OdditiesDavid Kanter2012/04/28 06:29 AM
                        In defense of Bulldozer's Odditieshcl642012/04/28 02:44 PM
                          In defense of Bulldozer's OdditiesDavid Kanter2012/04/28 09:42 PM
                            In defense of Bulldozer's Odditieshcl642012/04/28 10:39 PM
                  Bulldozer's Oddities.EduardoS2012/04/20 06:05 PM
                    Bulldozer's Oddities.anon2012/04/20 08:32 PM
                      Bulldozer's Oddities.EduardoS2012/04/21 12:37 PM
                        Bulldozer's Oddities.anon2012/04/21 10:16 PM
                          Bulldozer's Oddities.EduardoS2012/04/21 10:43 PM
                            Bulldozer's Oddities.anon2012/04/22 02:09 AM
                              Bulldozer's Oddities.EduardoS2012/04/22 01:57 PM
                                Bulldozer's Oddities.anon2012/04/22 04:17 PM
                                  Bulldozer's Oddities.EduardoS2012/04/22 05:05 PM
                                    Bulldozer's Oddities.anon2012/04/22 05:42 PM
                                      Bulldozer's Oddities.anon2012/04/22 06:01 PM
                                      Bulldozer's Oddities.EduardoS2012/04/22 10:28 PM
                                        Bulldozer's Oddities.anon2012/04/22 11:05 PM
                  Bulldozer's isn't bad.a reader2012/04/21 10:01 AM
                    Bulldozer's isn't bad.Kira2012/04/21 11:29 AM
                      Bulldozer's isn't bad.hcl642012/04/27 05:58 PM
                        Bulldozer's isn't bad.anon2012/04/27 06:16 PM
                          Bulldozer's isn't bad.hcl642012/04/27 07:33 PM
                            Bulldozer's isn't bad.rwessel2012/04/27 11:12 PM
                        Bulldozer's isn't bad.EduardoS2012/04/28 09:29 AM
                          Bulldozer's isn't bad.EduardoS2012/04/28 09:30 AM
                          Bulldozer's isn't bad.Michael S2012/04/28 12:36 PM
                    Bulldozer is made for SPEC fpPelle-482012/04/21 11:41 AM
                  Bulldozer's Oddities.mpx2012/04/22 03:47 AM
                    Bulldozer's Oddities.EduardoS2012/04/22 01:57 PM
                      Bulldozer's Oddities.mpx2012/04/23 07:04 AM
                        Bulldozer's Oddities.Eric2012/04/23 12:33 PM
                          Bulldozer's Oddities.EduardoS2012/04/23 02:22 PM
                            Bulldozer's Oddities.Eric2012/04/23 07:30 PM
                              Bulldozer's Oddities.hcl642012/04/27 06:16 PM
                            Bulldozer's Oddities.Y2012/04/25 04:34 AM
                              Bulldozer's IDIVHeikki Kultala2012/04/27 10:56 PM
                                Bulldozer's IDIVY2012/04/30 01:51 AM
                                  Bulldozer's IDIVEduardoS2012/04/30 05:39 AM
                                    Bulldozer's IDIVP3Dnow2012/05/08 01:23 AM
                                      Bulldozer's IDIVExophase2012/05/08 07:37 AM
                        Bulldozer's Oddities.EduardoS2012/04/23 02:15 PM
              Clustered MT as SMT for high frequencyPaul A. Clayton2012/04/20 04:10 PM
                Clustered MT as SMT for high frequencyhcl642012/04/28 12:56 AM
                  Clustered MT as SMT for high frequencyanonymous2012/04/28 01:43 AM
                    Clustered MT as SMT for high frequencyhcl642012/04/28 02:59 PM
                      Clustered MT as SMT for high frequencyanonymous2012/04/28 08:45 PM
                  Clustered MT as SMT for high frequencyanon2012/04/28 02:13 AM
                    Clustered MT as SMT for high frequencyhcl642012/04/28 03:23 PM
                      Clustered MT as SMT for high frequencyanon2012/04/28 06:19 PM
                        Clustered MT as SMT for high frequencyhcl642012/04/28 07:58 PM
                  Clustered MT as SMT for high frequencyDavid Kanter2012/04/28 06:38 AM
                    Guessed meaning of "strong dependency model"Paul A. Clayton2012/04/28 07:24 AM
                      Guessed meaning of "strong dependency model"EduardoS2012/04/28 09:46 AM
                        *Right meaning* about "strong dependency model"hcl642012/04/28 04:59 PM
                    Clustered MT as SMT for high frequencyhcl642012/04/28 04:24 PM
                      Clustered MT as SMT for high frequencyanonymous2012/04/28 08:50 PM
                        Clustered MT as SMT for high frequencyhcl642012/04/28 09:47 PM
                          SNB widthDavid Kanter2012/04/28 09:48 PM
                            SNB widthhcl642012/04/29 02:24 AM
                      Clustered MT as SMT for high frequencyDavid Kanter2012/04/28 09:56 PM
                        Clustered MT as SMT for high frequencyhcl642012/04/28 11:44 PM
                          SOI, FD vs. PDDavid Kanter2012/04/29 07:19 AM
                            SOI, FD vs. PDhcl642012/04/29 05:31 PM
                              SOI, FD vs. PDDavid Kanter2012/04/29 11:26 PM
                                SOI, FD vs. PDhcl642012/04/30 08:08 AM
                                  SOI, FD vs. PDDavid Kanter2012/04/30 09:59 AM
                                    SOI, FD vs. PDhcl642012/04/30 06:10 PM
                                      SOI, FD vs. PDDavid Kanter2012/04/30 06:32 PM
                                        SOI, FD vs. PDhcl642012/04/30 10:47 PM
                                          SOI, FD vs. PDDavid Kanter2012/05/01 02:24 AM
                                            SOI, FD vs. PDhcl642012/05/01 05:46 AM
                                            SOI, FD vs. PDhcl642012/05/01 06:37 AM
                                              SOI, FD vs. PDDavid Kanter2012/05/01 08:19 AM
                                          SOI, FD vs. PDhcl642012/05/01 07:39 AM
                                            PD-SOIDavid Kanter2012/05/02 12:22 PM
                                    SOI, FD vs. PDslacker2012/04/30 08:10 PM
                                      SOI, FD vs. PDDavid Kanter2012/04/30 10:16 PM
                                        SOI, FD vs. PDslacker2012/05/01 10:04 PM
                                          SOI, FD vs. PDDavid Kanter2012/05/02 08:19 AM
                                            SOI, FD vs. PDzou2012/05/02 12:23 PM
                  Previous discussion of clustered MTPaul A. Clayton2012/04/28 07:00 AM
                    Previous discussion of clustered MThcl642012/04/28 09:38 PM
                      Previous discussion of clustered MTDavid Kanter2012/04/30 04:37 PM
                        Previous discussion of clustered MThcl642012/04/30 07:24 PM
                          Previous discussion of clustered MTDavid Kanter2012/04/30 07:40 PM
                            Previous discussion of clustered MThcl642012/05/01 09:15 AM
                              Latency issuesDavid Kanter2012/05/02 12:01 PM
              So, what do people think of these numbers>Megol2012/04/21 01:57 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?