By: Marcus (m.delete@this.bitsnbites.eu), March 24, 2022 3:03 am
Room: Moderated Discussions
dmcq (dmcq.delete@this.fano.co.uk) on March 24, 2022 1:40 am wrote:
> Marcus (m.delete@this.bitsnbites.eu) on March 23, 2022 11:23 pm wrote:
> > Hopper (hopper.delete@this.hopper.com) on March 22, 2022 8:48 am wrote:
> > > https://www.nvidia.com/en-us/data-center/h100/
> >
> > I see that they included an FP8 format, and the E4M3 variant uses the exact same binary format as
> > the float8 format in the MRISC32 ISA. At the time it was introduced in MRISC32 I was unsure about
> > what exponent/mantissa configuration to use, but E4M3 seemed like the most sane configuration.
> >
> > ...though they seem to support a nice higher-precision accumulator
> > for matrix multiplication, which makes perfect sense.
>
> I'd be interested if anyone had started using stochastic rounding
> yet, I'd have thought it should be the preferred mode in learning.
I think the main problem with that is reproducibility. In many situations you want to be able to guarantee that two consecutive training passes produce the same result. Otherwise, I agree.
> Marcus (m.delete@this.bitsnbites.eu) on March 23, 2022 11:23 pm wrote:
> > Hopper (hopper.delete@this.hopper.com) on March 22, 2022 8:48 am wrote:
> > > https://www.nvidia.com/en-us/data-center/h100/
> >
> > I see that they included an FP8 format, and the E4M3 variant uses the exact same binary format as
> > the float8 format in the MRISC32 ISA. At the time it was introduced in MRISC32 I was unsure about
> > what exponent/mantissa configuration to use, but E4M3 seemed like the most sane configuration.
> >
> > ...though they seem to support a nice higher-precision accumulator
> > for matrix multiplication, which makes perfect sense.
>
> I'd be interested if anyone had started using stochastic rounding
> yet, I'd have thought it should be the preferred mode in learning.
I think the main problem with that is reproducibility. In many situations you want to be able to guarantee that two consecutive training passes produce the same result. Otherwise, I agree.