Shared FPU wasn't BD's problem

By: Rayla (rayla.delete@this.example.com), August 31, 2021 2:34 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on August 31, 2021 9:28 am wrote:
> Chester (lamchester.delete@this.gmail.com) on August 31, 2021 2:58 am wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on August 30, 2021 10:29 pm wrote:
> > > Chester (lamchester.delete@this.gmail.com) on August 30, 2021 1:03 pm wrote:
> > > > -.- (blarg.delete@this.mailinator.com) on August 29, 2021 4:05 am wrote:
> > > > > ARM's upcoming Cortex A510 uses a shared FPU between two cores, so there's
> > > > > at least a second mainstream player trying out shared FPUs:
> > > > >
> > > > >
> > > > >
> > > > > I recall Bulldozer had minimum 2 cycle latency FPU ops, and current
> > > > > ARM chips generally also have minimum 2 cycle latency FPU ops.
> > > >
> > > > In BD's case, that's probably to hit high clock speeds on a pretty bad node. Integer SIMD
> > > > ops are 1c latency on newer AMD CPUs, but probably 2c in Bulldozer because the units are
> > > > half width. Piledriver could do a couple FPU ops (extrq, insertq) with 1c latency.
> > > >
> > > > Sharing an AVX512 unit between 4 little cores may work, in a way similar to Apple AMX,
> > > > A510 SVE, and IBM Telum's AI accelerator. I think Bulldozer's biggest problems were:
> > > >
> > > > - The 16 KB L1D was too small and write-through
> > > > - Slow L2 has to handle a lot of L1D misses
> > >
> > > Also, it could only do a single L2 access/clock IIRC. It probably
> > > needed to be able to do 3 given the write-through.
> > >
> > > David
> >
> > I think it mostly needed to be lower latency. From some testing, PD's L2 can do 16B per cycle, >per thread.
>
> Yes, but I'm saying needed to be 2x16B/clock for write-through.
>
> Also the L2 needs to service two instruction caches that were tiny.
>

A BD module has only one i-cache, and it's 64K 2-way - lower associativity than would be ideal, certainly, but not that tiny.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
AVX512 as co-processorMichael S2021/08/29 03:13 AM
  AVX512 as co-processor-.-2021/08/29 04:05 AM
    Shared FPU wasn't BD's problemChester2021/08/30 01:03 PM
      Excellent post (NT)Heikki Kultala2021/08/30 01:34 PM
      Shared FPU wasn't BD's problemP Snip2021/08/30 01:53 PM
      Shared FPU wasn't BD's problem-.-2021/08/30 05:47 PM
      Shared FPU wasn't BD's problemDavid Kanter2021/08/30 10:29 PM
        Shared FPU wasn't BD's problemChester2021/08/31 02:58 AM
          Shared FPU wasn't BD's problemDavid Kanter2021/08/31 09:28 AM
            Shared FPU wasn't BD's problemChester2021/08/31 12:29 PM
            Shared FPU wasn't BD's problemRayla2021/08/31 02:34 PM
      Shared FPU wasn't BD's problemAnon2021/08/31 12:28 AM
        Shared FPU wasn't BD's problemAdrian2021/08/31 01:27 AM
          Shared FPU wasn't BD's problemAnon2021/08/31 02:06 AM
            Shared FPU wasn't BD's problemanonymou52021/08/31 02:09 PM
              Shared FPU wasn't BD's problemChester2021/09/01 11:05 AM
      Shared FPU wasn't BD's problemKevin G2021/08/31 09:39 AM
        Shared FPU wasn't BD's problemChester2021/09/01 10:03 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊