Pseudo-random number generator would provide reproducible stochastic rounding

By: blaine (myname.delete@this.acm.org), March 26, 2022 1:09 pm
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on March 25, 2022 9:37 am wrote:
> Marcus (m.delete@this.bitsnbites.eu) on March 25, 2022 5:48 am wrote:
> > Adrian (a.delete@this.acm.org) on March 25, 2022 4:16 am wrote:
> > > Marcus (m.delete@this.bitsnbites.eu) on March 25, 2022 12:57 am wrote:
> > > > Adrian (a.delete@this.acm.org) on March 24, 2022 2:34 pm wrote:
> > > > > rwessel (rwessel.delete@this.yahoo.com) on March 24, 2022 1:28 pm wrote:
> > > > > > Paul A. Clayton (paaronclayton.delete@this.gmail.com) on March 24, 2022 11:43 am wrote:
> > > > > > > Pseudo-random number generator would provide reproducible stochastic rounding
> > > > > >
> > > > > > Easier said than done if the code is multi-threaded. Which is something of a given if we're assuming GPUs.
> > > > >
> > > > >
> > > > > There are many good PRNG's that allow the deterministic splitting of the
> > > > > generated sequence of numbers in separate subsequences for each thread.
> > > > >
> > > > > For example the multiplicative Fibonacci generators or those
> > > > > which use a pseudo-random function in counter mode.
> > > > >
> > > > >
> > > > > So the problem of having a multi-threaded program with pseudo-random but reproducible behavior
> > > > > has been solved since about 1984 for the cases when very fast PRNGs are needed and since
> > > > > about 1970 for the cases when the speed of the PRNGs does not matter so much.
> > > > >
> > > >
> > > > What if you change your program so that you do some extra FP computations before the main
> > > > work that you want to be reproducible? Or if you move from one compute node to another node
> > > > with a different number of HW threads so that you partition the work differently?
> > > >
> > > > I can see lots of situations where it would be hard to guarantee the same PRN sequence for your
> > > > operations. Not saying that it can't be done, but it feels like an inherently fragile solution.
> > >
> > >
> > >
> > >
> > > If you change in any way your program, then certainly the results
> > > will not be reproducible, even if you do not use any PRNG.
> > >
> > > If you change in any way the input data for your program, then certainly
> > > the results will not be reproducible, even if you do not use any PRNG.
> > >
> > > If you want to insert some computations in a program that uses PRNGs without changing the
> > > results of later computations, that is trivial. For the new inserted program section you must
> > > initialize a PRNG with a different seed than any used before. When you reach the reproducible
> > > section, you initialize the PRNG or PRNGs with the same seed or seeds used before.
> > >
> > > This is an absolutely trivial rule for using PRNGs and it has absolutely nothing to
> > > do with multi-threaded programs. You must do the same in a single-threaded program.
> > >
> > > If you use some weird PRNG that lacks an initialization function, then obviously that PRNG cannot be
> > > used in a program with reproducible results. If it has an
> > > initialization function, then you must reinitialize
> > > it with the same seed, whenever you want to obtain again the same pseudorandom sequence.
> > >
> > >
> > >
> > > The discussion was whether the use of a PRNG in a program prevents it to produce
> > > reproducible results, when you change neither the program nor its input data.
> > >
> > > For a single-threaded program, satisfying the reproducibility constraint requires a means to provide a seed
> > > for the PRNG, which ensures the generation of the same sequence as long as you do not change the seed.
> > >
> > > For a multi-threaded program that is not enough, because arbitrary
> > > interleaving of the thread executions is possible.
> > >
> > > The initialization of the PRNGs for multithreaded programs must use not only a seed, but also a thread
> > > identifier. The PRNGs thus initialized must generate distinct and uncorrelated pseudorandom sequences.
> > >
> > > Like I have said in my previous post, there are plenty of such PRNGs and some of them are simple
> > > enough (e.g. a double-length integer multiplication and shift per generated number) to be easily
> > > implemented in GPU shader programs, to be computed in parallel for each GPU thread.
> > >
> >
> > Not arguing that it can't be done. I'm just saying, from my experience with SW development,
> > that this kind of feature is very likely to cause headaches. It's global state, similar to FPU
> > configuration flags (rounding modes, FTZ/DAZ etc), which inevitably leaks across functions,
> > modules, libraries, API boundaries etc, and typically very few developers are aware of it.
> >
> > /Marcus
> >
>
>
> I agree that it is easy to make mistakes with something like
> this, so you are right about the possible headaches.
>
> However well-designed PRNGs do not introduce any global state.
>
>
> It was an unfortunate tradition in libraries like the standard C library to provide PRNGs that relied on global
> state for the dubious purpose of reducing the number of parameters in the functions related to PRNGs.
>
> Today, any correct PRNG implementation should have one extra parameter in all the PRNG functions,
> e.g. initialization and generation of next number, with the state of the PRNG.
>
> In this case there is no global state associated with PRNGs
> and no problems caused by it in multi-threaded programs.
>
y
> While I strongly dislike the misuse of object-oriented programming in domains for which it is unsuitable,
> like it happens frequently in the languages which force the use of OOP for anything, e.g. Java, PRNGs
> are perfectly suited for having an API in OOP style, with a constructor having an optional seed and
> an optional thread identifier and with a member function for getting the next number.
>
> In this case a PRNG object can be allocated in the local stack
> or heap or even in thread-local static memory, as desired.
>
> You may have any number of independent PRNG objects you
> like, if you want to use them for different purposes.
>
>
>
>
>
>
>
>
>
>
>
>
>
>

In the situation where you have multiple asynchronous processes (threads) where the global order of operations can change, using multiple seeds may reduce variance but in the end you would have to resort to statistics from multiple runs for insight.
< Previous Post in Thread 
TopicPosted ByDate
Nvidia H100 Tensor Core GPUHopper2022/03/22 08:48 AM
  Nvidia H100 Tensor Core GPUMarcus2022/03/23 11:23 PM
    Nvidia H100 Tensor Core GPUdmcq2022/03/24 01:40 AM
      Nvidia H100 Tensor Core GPUMarcus2022/03/24 03:03 AM
        Pseudo-random number generator would provide reproducible stochastic rounding (NT)Paul A. Clayton2022/03/24 11:43 AM
          Pseudo-random number generator would provide reproducible stochastic roundingrwessel2022/03/24 01:28 PM
            Pseudo-random number generator would provide reproducible stochastic roundingAdrian2022/03/24 02:34 PM
              Pseudo-random number generator would provide reproducible stochastic roundingMarcus2022/03/25 12:57 AM
                Pseudo-random number generator would provide reproducible stochastic roundingAdrian2022/03/25 04:16 AM
                  Pseudo-random number generator would provide reproducible stochastic roundingMarcus2022/03/25 05:48 AM
                    Pseudo-random number generator would provide reproducible stochastic roundingAdrian2022/03/25 09:37 AM
                      stateless PRNGshobold2022/03/25 02:34 PM
                        stateless PRNGsJörn Engel2022/03/25 09:30 PM
                          stateless PRNGshobold2022/03/26 10:32 AM
                            stateless PRNGsJörn Engel2022/03/26 02:14 PM
                              stateless PRNGshobold2022/03/27 02:11 AM
                      Pseudo-random number generator would provide reproducible stochastic roundingblaine2022/03/26 01:09 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊