GPNPU?

By: --- (---.delete@this.redheron.com), October 8, 2022 11:38 am
Room: Moderated Discussions
Jörn Engel (joern.delete@this.purestorage.com) on October 8, 2022 10:02 am wrote:
> --- (---.delete@this.redheron.com) on October 7, 2022 2:40 pm wrote:
> >
> > So, can we do anything interesting with this that's not actually NPU related?
> > The most obvious possibly that struck me was random number related stuff; you can hook up these thing
> > to act as LFSRs and (perhaps) generate lots of (few bit, adequate quality?) random numbers per cycle,
> > then either concatenate them to generate streams of multi-bit integers uniformly distributed's or
> > add 6 or 12 or so of them to generate (adequate?) gaussian values. I don't care about adversarial
> > security stuff, I'm more interested in "good-enough" for various types of physics work.
>
> You can generate one "good-enough" 64bit random number per cycle on a regular CPU. In 4 cycles
> you can get something borderline good enough for crypto. Not sure how many applications exist
> that consume random numbers at a high enough rate to care about what you propose.
>
> And speaking of good enough, you want to avoid short sequences in your PRNG. It's easy to have a 2^64 sequence
> with a 64bit state and 64bit multiplications, etc. 32bit requires two registers for state and some logic
> to transfer state between the two halves. 8bit may result in you wasting more time moving state from register
> to register (or vector lane to vector lane, whatever) that you could potentially gain in speedup.

That's one (64b) per cycle! I'm assuming using an NPU I can get something like 256 random 8-bits per cycle per "engine" (8 or so engines on an M1).
Obviously what I have in mind would run as a throughput model, not a latency model! So you'd have the NPU doing something like filling up a 64K buffer in a ping-pong model, while the CPU walks through the other buffer loading random bits as required.
And of course, yes, you want long sequences. But if what you actually care about is processes, not simple aggregates, then you can still get a fair bit of value from shorter sequences simply by things like interleaving sequences as you generate the successive elements of your process.

Maybe it's not worth the hassle in production code? OTOH it's always interesting to think of what's possible with this additional HW and how to get SOME value from it.
Where you could get real value is if, in the NPU, you can shape the values to the properties of your process. I'm sure you could do that in a GPU right now without much pain (and that's clearly the more practical solution!) but, heck, maybe you can get both – generate raw randoms on the NPU, condition them on the GPU, and use them on the CPU :-)
Well, one day maybe I'll give it a shot!
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
GPNPU?---2022/10/07 02:40 PM
  GPNPU?---2022/10/07 08:23 PM
  GPNPU?Jörn Engel2022/10/08 10:02 AM
    GPNPU?---2022/10/08 11:38 AM
      GPNPU?Jörn Engel2022/10/08 05:16 PM
        GPNPU?dmcq2022/10/09 03:58 AM
  GPNPU?David Kanter2022/10/08 10:16 PM
    GPNPU?---2022/10/09 06:08 PM
  What is NPU ? (NT)Michael S2022/10/09 02:50 AM
    "Neural processing unit", AFAIU (NT)Foo_2022/10/09 03:22 AM
      Training, inference or both ? (NT)Michael S2022/10/09 04:03 AM
        Network Processing Unit (NT)anonymou52022/10/09 10:37 AM
    What is NPU ?Will W2022/10/09 10:25 AM
      XTLA :-) (NT)dmcq2022/10/09 12:51 PM
  GPNPU?Etienne2022/10/11 01:12 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊