AVX-512 possibly in its own clock domain?

By: Maynard Handley (name99.delete@this.name99.org), August 26, 2018 4:28 pm
Room: Moderated Discussions
Brett (ggtgp.delete@this.yahoo.com) on August 26, 2018 9:48 am wrote:
> Travis (travis.downs.delete@this.gmail.com) on August 25, 2018 8:40 pm wrote:
> > Brett (ggtgp.delete@this.yahoo.com) on August 25, 2018 1:58 pm wrote:
> >
> > > That means the high FPU units are the same design as the low FPU units, which further limits performance.
> > > The high FPU units could have been designed for low clocks, using different transitors
> > > and different gate delays. But that means switching rates clocks fast.
> > > I could see a design where you have an extra FPU just for non-vector math
> > > at 4+GHz, and a full set of vector units designed for low clocks.
> >
> > On Intel designs this is difficult because the 512-bit vector FP ALU is pretty much completely
> > overlapping with the 2x 256-bit ALUs, and those guys run at a higher frequency license.
> > So they need to design these to run at the speeds needed by the 256-bit ops.
>
> Sorry I was not thinking x86 but post RISC, though the speculation was about Intel.
>
> There is the question of where would you make the cut and have a potentially separate register file between short
> floats and huge vectors, so that you could more simply put the wide vector unit in its own clock domain.
>
> I would say short 128 bit vectors of four 32 bit floats is the biggest size that
> you want running at full clock rates. This gives high performance for simple 3D game
> transforms. With larger vectors you are doing different types of bulk math.
>
> The wide vector unit becomes a separate coprocessor like the old days, but even more separated in its own
> clock domain. This gives Cell like performance but with a real instruction set with a real integer CPU.
>
> This design is optimal for ray tracing and other tasks where the
> CPU is doing a lot of searching intermixed with heavy math.
> The CPU gets to run at 4GHz and the wide vector unit gets
> to run its queue at its optimal heat load clock rate.
>

What is the official ARM stance regarding NEON vs SVE? In particular are they independent register files, or is there some sort of rule that NEON always forms the lowest bits of the SVE registers?
(That would be a strange rule, but it would fit how NEON overlaps scalar FP, and is how ARM has done things in the past.)

The registers seems to be the part that's architecturally visible. Apart from them, it seems plausible that a company with say 3 NEON units could provide 384-bit SVE sharing that HW, likewise eg for 2 NEON units and 256-bit SVE.
(Yeah, yeah, I know people keep insisting that SVE is not for mobile and suchlike. Well, maybe not this year, but Moore's law is still a thing for some companies, and at some point it's going to be a "why the heck not".)
For those sorts of designs, you have the same issue of using the full vector width vs partial usage, so distinct frequency island is impractical. But of course you might run all two or three NEON units simultaneously, so that's unlikely to be useful design consideration.

But this problem may be essentially a problem of striving for maximum frequency and you know my thoughts on that. To the extent that anyone else adopts a more balanced design policy (higher IPC at lower frequency) there's less of a concern, and less need, for a separate lower frequency for their wide vectors?
The current vector guys like NEC and Fujitsu are targeting about 2GHz? IBM goes for high frequencies, but there vectors are still capped at 128bits wide? And they are willing to put down much more aggressive infrastructure for power delivery, caps, cooling etc ...

Who's left? Oracle in theory, but they're not really in this space. And all the ARM vendors I guess are more at the 3GHz level which (to judge by Intel's AVX curves) is still mostly fine. If we start to see ThunderX with 48 cores and SVE-512, this might become an interesting issue...
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
AVX-512 downclocking not as bad as thought?Travis2018/08/24 12:01 AM
  AVX-512 downclocking not as bad as thought?Ricardo B2018/08/24 04:22 AM
    AVX-512 downclocking not as bad as thought?David Hess2018/08/24 09:08 PM
      AVX-512 downclocking not as bad as thought?Travis2018/08/24 09:54 PM
        AVX-512 downclocking not as bad as thought?anon2018/08/25 04:06 AM
          AVX-512 downclocking not as bad as thought?Travis2018/08/25 07:34 PM
        AVX-512 downclocking not as bad as thought?David Hess2018/08/25 09:44 AM
          AVX-512 downclocking not as bad as thought?Travis2018/08/25 07:35 PM
  AVX-512 downclocking not as bad as thought?Jeff S.2018/08/24 06:22 AM
    AVX-512 downclocking not as bad as thought?Travis2018/08/24 07:41 AM
      AVX-512 downclocking not as bad as thought?Jeff S.2018/08/24 09:24 AM
  AVX-512 downclocking not as bad as thought?Maynard Handley2018/08/24 11:03 AM
    AVX-512 downclocking not as bad as thought?Jeff S.2018/08/24 11:24 AM
      AVX-512 downclocking not as bad as thought?Maynard Handley2018/08/24 12:26 PM
        AVX-512 downclocking not as bad as thought?Jeff S.2018/08/24 12:39 PM
        AVX-512 downclocking not as bad as thought?megabytephreak2018/08/24 12:43 PM
          AVX-512 downclocking not as bad as thought?Maynard Handley2018/08/24 03:59 PM
          AVX-512 downclocking not as bad as thought?someone2018/09/12 12:20 AM
            AVX-512 downclocking not as bad as thought?anonymou52018/09/12 06:39 AM
        AVX-512 downclocking not as bad as thought?Travis2018/08/24 02:48 PM
    AVX-512 downclocking not as bad as thought?David Hess2018/08/24 09:26 PM
  Gold 6130 resultsTravis2018/08/24 03:49 PM
    Gold 6130 results-.-2018/08/24 08:08 PM
      Gold 6130 resultsTravis2018/08/24 10:02 PM
        Gold 6130 results-.-2018/08/25 02:27 AM
          Gold 6130 resultsTravis2018/08/25 07:37 PM
  AVX-512 downclocking not as bad as thought?Adrian2018/08/24 08:30 PM
    AVX-512 downclocking not as bad as thought?Adrian2018/08/24 08:36 PM
      AVX-512 downclocking not as bad as thought?Adrian2018/08/24 08:39 PM
        AVX-512 downclocking not as bad as thought?Travis2018/08/24 10:06 PM
          AVX-512 downclocking not as bad as thought?Adrian2018/08/24 10:28 PM
  AVX-512 downclocking not as bad as thought?Royi2018/08/25 03:36 AM
    AVX-512 downclocking not as bad as thought?Travis Downs2018/08/25 08:18 AM
      AVX-512 possibly in its own clock domain?Brett2018/08/25 12:58 PM
        AVX-512 possibly in its own clock domain?Travis2018/08/25 07:40 PM
          AVX-512 possibly in its own clock domain?Brett2018/08/26 08:48 AM
            AVX-512 possibly in its own clock domain?Maynard Handley2018/08/26 04:28 PM
              AVX-512 possibly in its own clock domain?none2018/08/26 08:40 PM
                AVX-512 possibly in its own clock domain?Maynard Handley2018/08/27 09:37 AM
                  AVX-512 possibly in its own clock domain?none2018/08/27 10:28 AM
        AVX-512 possibly in its own clock domain?David Hess2018/08/26 09:20 AM
          AVX-512 possibly in its own clock domain?Maynard Handley2018/08/26 04:35 PM
            AVX-512 possibly in its own clock domain?Gabriele Svelto2018/08/27 12:46 PM
            AVX-512 possibly in its own clock domain?David Hess2018/08/27 06:03 PM
  AVX-512 downclocking not as bad as thought?Travis2018/09/07 07:32 PM
    AVX-512 downclocking not as bad as thought?anonymou52018/09/07 08:03 PM
      AVX-512 downclocking not as bad as thought?Travis2018/09/09 07:16 PM
    AVX-512 downclocking not as bad as thought?Tim McCaffrey2018/09/08 07:55 AM
      AVX-512 downclocking not as bad as thought?Travis Downs2018/09/08 02:21 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?