Shared FPU wasn't BD's problem

By: Adrian (a.delete@this.acm.org), August 31, 2021 1:27 am
Room: Moderated Discussions
Anon (no.delete@this.spam.com) on August 31, 2021 12:28 am wrote:
> Chester (lamchester.delete@this.gmail.com) on August 30, 2021 1:03 pm wrote:
> > In BD's case, that's probably to hit high clock speeds on a pretty bad node. Integer SIMD
> > ops are 1c latency on newer AMD CPUs, but probably 2c in Bulldozer because the units are
> > half width. Piledriver could do a couple FPU ops (extrq, insertq) with 1c latency.
>
> They were 2 cycles because of the FPU design, from Athlon to Bulldozer all AMD
> FPUs (by FPU I mean: integer SIMD units) had a minimum 2 cycle latency.
>
> > I think Bulldozer's biggest problems were:
> >
> > - The 16 KB L1D was too small and write-through
> > - Slow L2 has to handle a lot of L1D misses
> > - The branch predictor was better than K10's, but not quite as good as Intel's at the time
> > - Each module half (thread) just wasn't as beefy as a whole Intel core, which could
> > bring a lot more OOO resources into play when one SMT thread is in halt.
> > - FP execution units were 128 bits wide (256-bit AVX ops decoded into two micro-ops),
> > putting it at a disadvantage vs Sandy Bridge's 256-bit wide units
> >
> > Then to wrap it up, every single bit of ST performance matters for the desktop
> > market. Sharing the FPU is pretty far down on the list of BD's problems, IMO.
>
> Let's go into this discussion again, BD was a bad CPU, I think everybody agree here, the problem starts
> when people try to find "why" and then they point everything that was different in BD as a "bad choice",
> but hey, BD wasn't bad in every aspect, it's modulo multi-threaded performance was on par with Intel
> HT core, just the single-threaded performance (which was very important specially on the consumer market)
> was trrible, so please guys, stop blaming the shared resources (write-trough L2, FPU, decoder, L1I),
> they were fine at delivering about the same performance of each Intel HT thread, the problem was the
> non-shared resources which were way too limited for good single-thread performance.
>



What I have disliked most about BD was not its performance, which was good enough for certain purposes, but its marketing.

AMD marketing used the names "module" for 2 integer cores + FPU and "core" for 1 integer core + its share of the FPU.


Even if it may be argued that calling the BD integer core a "core" is technically correct, the use of this word was completely misleading in the context of the competitive marketing of 2011, when 4 module / 8 "core" AMD CPUs were compared with 4 core / 8 thread Intel CPUs and with the 6 core AMD CPUs of 2010.


The BD "module" had exactly the same amount of execution resources as an Intel core for most things, except that for some things, e.g. integer multipliers, it had even less execution resources.

A BD "core" had much less execution resources than an AMD core of the previous generation, even if an AMD "module" had more execution resources than the previous generation "core", except for a few things like integer multipliers, where BD regressed.


If the AMD marketing would have launched BD as an 4 core / 8 thread CPU, instead of an 4 "module" / 8 "core" CPU, they would have set correctly the expectations of the buyers.

The 8-thread BD matched the MT performance of the 8-thread Sandy Bridge, but it was far slower in ST, which was normal as a single BD thread could use much less execution resources.

As it was marketed, as an 8-core CPU, the buyers were immediately dismayed after discovering that their "8-core" CPU is much slower in single thread not only than Intel but also than the previous AMD CPUs, while in multi-thread the so-called "8-core" CPU barely matched a "4-core" Intel CPU or a "6-core" AMD CPU of the previous generation.

Because of the stupid marketing, BD was a huge disappointment.


If it would have been marketed realistically, it might have been much more successful, because there were enough cases when it made sense to buy a BD for decent performance at a lower price.







































< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
AVX512 as co-processorMichael S2021/08/29 03:13 AM
  AVX512 as co-processor-.-2021/08/29 04:05 AM
    Shared FPU wasn't BD's problemChester2021/08/30 01:03 PM
      Excellent post (NT)Heikki Kultala2021/08/30 01:34 PM
      Shared FPU wasn't BD's problemP Snip2021/08/30 01:53 PM
      Shared FPU wasn't BD's problem-.-2021/08/30 05:47 PM
      Shared FPU wasn't BD's problemDavid Kanter2021/08/30 10:29 PM
        Shared FPU wasn't BD's problemChester2021/08/31 02:58 AM
          Shared FPU wasn't BD's problemDavid Kanter2021/08/31 09:28 AM
            Shared FPU wasn't BD's problemChester2021/08/31 12:29 PM
            Shared FPU wasn't BD's problemRayla2021/08/31 02:34 PM
      Shared FPU wasn't BD's problemAnon2021/08/31 12:28 AM
        Shared FPU wasn't BD's problemAdrian2021/08/31 01:27 AM
          Shared FPU wasn't BD's problemAnon2021/08/31 02:06 AM
            Shared FPU wasn't BD's problemanonymou52021/08/31 02:09 PM
              Shared FPU wasn't BD's problemChester2021/09/01 11:05 AM
      Shared FPU wasn't BD's problemKevin G2021/08/31 09:39 AM
        Shared FPU wasn't BD's problemChester2021/09/01 10:03 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊