By: Travis (travis.downs.delete@this.gmail.com), August 25, 2018 7:35 pm
Room: Moderated Discussions
David Hess (davidwhess.delete@this.gmail.com) on August 25, 2018 10:44 am wrote:
> Travis (travis.downs.delete@this.gmail.com) on August 24, 2018 10:54 pm wrote:
> > David Hess (davidwhess.delete@this.gmail.com) on August 24, 2018 10:08 pm wrote:
> > > Ricardo B (ricardo.b.delete@this.xxxxx.xx) on August 24, 2018 5:22 am wrote:
> > > >
> > > > Naively, I would say that the best approach for dealing with power hungry instructions
> > > > would be to run at normal clock speed and restrict the instruction issue rate until
> > > > some criteria are met and then down clock the core and un-restrict the issue rate.
> > >
> > > Running at a slower clock would allow the extra complex instructions
> > > to take advantage of more gate delays per stage.
> >
> > Yes, but only if they only ran at those slower speeds, right? Here it seems you
> > can run the most complex instructions (full width FMAs) at a higher speed ("middle
> > tier"), but just not at a high rate (but they have the expected latency).
> >
> > So the ALU must be designed to accommodate the gate delay associated with that higher
> > frequency, unless it can somehow reconfigure itself when running at a lower freq?
>
> No, I agree. I thought the most complex mode always operated at a reduced clock
> rate. If it executes some instructions at the full clock rate and then lowers
> the clock then it has to meet the worse case timing at the highest frequency.
Yeah, and it seems like it actually has to operate not only at the "light" AVX-512 frequency (aka AVX2 turbo), but even at the fastest frequency (non-AVX turbo) since in the case of 256-bit FMAs it can run at this speed, and we can pretty sure the units are shared.
> Travis (travis.downs.delete@this.gmail.com) on August 24, 2018 10:54 pm wrote:
> > David Hess (davidwhess.delete@this.gmail.com) on August 24, 2018 10:08 pm wrote:
> > > Ricardo B (ricardo.b.delete@this.xxxxx.xx) on August 24, 2018 5:22 am wrote:
> > > >
> > > > Naively, I would say that the best approach for dealing with power hungry instructions
> > > > would be to run at normal clock speed and restrict the instruction issue rate until
> > > > some criteria are met and then down clock the core and un-restrict the issue rate.
> > >
> > > Running at a slower clock would allow the extra complex instructions
> > > to take advantage of more gate delays per stage.
> >
> > Yes, but only if they only ran at those slower speeds, right? Here it seems you
> > can run the most complex instructions (full width FMAs) at a higher speed ("middle
> > tier"), but just not at a high rate (but they have the expected latency).
> >
> > So the ALU must be designed to accommodate the gate delay associated with that higher
> > frequency, unless it can somehow reconfigure itself when running at a lower freq?
>
> No, I agree. I thought the most complex mode always operated at a reduced clock
> rate. If it executes some instructions at the full clock rate and then lowers
> the clock then it has to meet the worse case timing at the highest frequency.
Yeah, and it seems like it actually has to operate not only at the "light" AVX-512 frequency (aka AVX2 turbo), but even at the fastest frequency (non-AVX turbo) since in the case of 256-bit FMAs it can run at this speed, and we can pretty sure the units are shared.