By: --- (, October 9, 2022 6:08 pm
Room: Moderated Discussions
David Kanter ( on October 8, 2022 10:16 pm wrote:
> --- ( on October 7, 2022 2:40 pm wrote:
> > Is there yet any sort of movement for GP-NPU (analogous to GP-GPU, general purpose compute on an NPU)?
> > I can't speak for other designs (and there seem to be no good overviews or even consensus yet as to what
> > these things should look like) but the Apple one appears to be essentially/primarily a convolution engine.
> > Define an (essentially fixed) set of weights, run a stream of data against those weights, and accumulate
> > sums. What we get in HW is a whole lot (low-precision) MACs linked to some (wider precision) accumulators,
> > some specialized storage (weights plus input stream buffer) and some HW-assisted addressing.
> >
> > Point is, you get much less than even GPUs when GP-GPU began.
> >
> > So, can we do anything interesting with this that's not actually NPU related?
> > The most obvious possibly that struck me was random number related stuff; you can hook up these thing
> > to act as LFSRs and (perhaps) generate lots of (few bit, adequate quality?) random numbers per cycle,
> > then either concatenate them to generate streams of multi-bit integers uniformly distributed's or
> > add 6 or 12 or so of them to generate (adequate?) gaussian values. I don't care about adversarial
> > security stuff, I'm more interested in "good-enough" for various types of physics work.
> >
> > Presumably (as was done in the early days of BrookGPU) you would have to fake this by creating
> > a neural net in TensorFlow or equivalent that used the convolution options available to perform
> > an LFSR (or more appropriate RNG) on each element of an array of input data, pooled them together
> > (if summing for Gaussian) and dumped out a similar array of "random"s.
> > This may not seem like much, but if you could have it running
> > in parallel with say a large Monte Carlo integration
> > as the very 1st stage of generating a constant stream of uniform or gaussian randoms, before we condition
> > them to fit a process, maybe there is some value there, the ability to double the speed or more?
> >
> >
> > Anyway, point is, has anyone heard any sort of mutterings
> > along these lines? Or is everyone, even academics,
> > still so excited by what new things can be done on GPUs that no-one yet even started thinking about NPUs?
> There are few things that are *new* that you can do on a GPU (except some of the new instructions
> for smith-waterman and some of the shared memory and barrier stuff). You can just do things
> at higher throughput than what is typically feasible on a CPU. The whole GP idea is about
> taking the 'core' functions of a GPU and making them more easily accessible and extending
> the functions that are supported to include more CPU-like attributes.
> The only thing I could potentially see really applying is enabling lower overhead interactions between an NPU
> and a CPU. E.g., instead of having to write out the memory via DMA, having very low latency and tight integration.
> Having the NPU operate on the address space with paging (sounds expensive to have paging in an NPU) or having
> an NPU operate on some low latency scratchpad memory close to the core (sounds like SPR's TMUL stuff).
> If you could move data from a CPU program to the NPU in ~10
> cycles that would potentially open some interesting stuff.
> David

There is value to the human community in, for example, porting a PDE solver or QCD simulation to a GPU and getting 10x the speed, even if it's not especially new. I'm happy people are doing this and would not criticize or mock that work.
But that's very different from what I am asking, precisely because the instruction set for an inference NPU is so much more limited and the granularity so much larger (which is, of course, precisely why it exists in addition to the GPU, rather than doing the work on the GPU).

If you want "NPU-like" functionality close to the CPU, that's what AVX-512 or AMX are for. And good for them, but again not relevant to the question I'm considering.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
GPNPU?---2022/10/07 02:40 PM
  GPNPU?---2022/10/07 08:23 PM
  GPNPU?Jörn Engel2022/10/08 10:02 AM
    GPNPU?---2022/10/08 11:38 AM
      GPNPU?Jörn Engel2022/10/08 05:16 PM
        GPNPU?dmcq2022/10/09 03:58 AM
  GPNPU?David Kanter2022/10/08 10:16 PM
    GPNPU?---2022/10/09 06:08 PM
  What is NPU ? (NT)Michael S2022/10/09 02:50 AM
    "Neural processing unit", AFAIU (NT)Foo_2022/10/09 03:22 AM
      Training, inference or both ? (NT)Michael S2022/10/09 04:03 AM
        Network Processing Unit (NT)anonymou52022/10/09 10:37 AM
    What is NPU ?Will W2022/10/09 10:25 AM
      XTLA :-) (NT)dmcq2022/10/09 12:51 PM
  GPNPU?Etienne2022/10/11 01:12 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊