By: Jörn Engel (joern.delete@this.purestorage.com), October 8, 2022 5:16 pm
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on October 8, 2022 11:38 am wrote:
>
> That's one (64b) per cycle! I'm assuming using an NPU I can get something like
> 256 random 8-bits per cycle per "engine" (8 or so engines on an M1).
> Obviously what I have in mind would run as a throughput model, not a latency model!
> So you'd have the NPU doing something like filling up a 64K buffer in a ping-pong model,
> while the CPU walks through the other buffer loading random bits as required.
> And of course, yes, you want long sequences. But if what you actually care about is processes,
> not simple aggregates, then you can still get a fair bit of value from shorter sequences simply
> by things like interleaving sequences as you generate the successive elements of your process.
Short sequences will quickly cost you weeks of debugging followed by a replacement of the PRNG. The moment you call something "random", people will make quality assumptions. If you have a fast&weak PRNG, you will violate those assumptions and those weeks of debugging will happen. No amount of documentation will stop people from making those assumptions.
As usual, it seems more important to be "good enough" in many dimensions (speed, quality) than amazing in one and so-so in another. And the bar for "good enough" quality in PRNG is surprisingly high. The bar for performance seems fairly low, if you look at what people still use in production. ;)
>
> That's one (64b) per cycle! I'm assuming using an NPU I can get something like
> 256 random 8-bits per cycle per "engine" (8 or so engines on an M1).
> Obviously what I have in mind would run as a throughput model, not a latency model!
> So you'd have the NPU doing something like filling up a 64K buffer in a ping-pong model,
> while the CPU walks through the other buffer loading random bits as required.
> And of course, yes, you want long sequences. But if what you actually care about is processes,
> not simple aggregates, then you can still get a fair bit of value from shorter sequences simply
> by things like interleaving sequences as you generate the successive elements of your process.
Short sequences will quickly cost you weeks of debugging followed by a replacement of the PRNG. The moment you call something "random", people will make quality assumptions. If you have a fast&weak PRNG, you will violate those assumptions and those weeks of debugging will happen. No amount of documentation will stop people from making those assumptions.
As usual, it seems more important to be "good enough" in many dimensions (speed, quality) than amazing in one and so-so in another. And the bar for "good enough" quality in PRNG is surprisingly high. The bar for performance seems fairly low, if you look at what people still use in production. ;)
Topic | Posted By | Date |
---|---|---|
GPNPU? | --- | 2022/10/07 02:40 PM |
GPNPU? | --- | 2022/10/07 08:23 PM |
GPNPU? | Jörn Engel | 2022/10/08 10:02 AM |
GPNPU? | --- | 2022/10/08 11:38 AM |
GPNPU? | Jörn Engel | 2022/10/08 05:16 PM |
GPNPU? | dmcq | 2022/10/09 03:58 AM |
GPNPU? | David Kanter | 2022/10/08 10:16 PM |
GPNPU? | --- | 2022/10/09 06:08 PM |
What is NPU ? (NT) | Michael S | 2022/10/09 02:50 AM |
"Neural processing unit", AFAIU (NT) | Foo_ | 2022/10/09 03:22 AM |
Training, inference or both ? (NT) | Michael S | 2022/10/09 04:03 AM |
Network Processing Unit (NT) | anonymou5 | 2022/10/09 10:37 AM |
What is NPU ? | Will W | 2022/10/09 10:25 AM |
XTLA :-) (NT) | dmcq | 2022/10/09 12:51 PM |
GPNPU? | Etienne | 2022/10/11 01:12 AM |