By: David Hess (davidwhess.delete@this.gmail.com), August 27, 2018 6:03 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on August 26, 2018 5:35 pm wrote:
> David Hess (davidwhess.delete@this.gmail.com) on August 26, 2018 10:20 am wrote:
> > Brett (ggtgp.delete@this.yahoo.com) on August 25, 2018 1:58 pm wrote:
> > >
> > > There is also no law that the float/vector unit has to run at the same clock rate as the rest of the CPU.
> > > You just have to buffer the cache interface and the incoming decoded float/vector instruction
> > > stream, and buffer the incoming instruction completion signals for write commits.
> > >
> > > This would completely hide single slow vector instructions, you could detect this
> > > by having say every 20th instruction a vector op, the CPU should stay at 4+GHz.
> >
> > I thought they might operate it synchronously at a small integer fraction of the clock
> > speed like 2/3 or 4/5 instead of changing the clock speed of the entire core.
>
> And that certainly has happened in the past!
> Some of the PPCs (around say the PPC 750) had some sort of weird stutter, I can't remember the details, that
> meant that every sixth or fifth FP instruction was delayed by one cycle. This was "architected" in the sense
> that it was meant to happen, every n'th instruction (or perhaps every n'th clock) was devoted to what ever it
> was that took some cleanup, it wasn't an artifact of a particular sequence of FP instructions or data.
>
> With a brief search the best I can find is this which references the issue:
> https://cr.yp.to/hardware/ppc.html
> At the time I did know the details, but it's possible I learned them
> from reading internal Apple docs and they were never public? Oh well.
The way AMD described the fast core clock throttling in the Phenom made me think that they used a fractional integer divider to maintain synchronous operation between cores.
> David Hess (davidwhess.delete@this.gmail.com) on August 26, 2018 10:20 am wrote:
> > Brett (ggtgp.delete@this.yahoo.com) on August 25, 2018 1:58 pm wrote:
> > >
> > > There is also no law that the float/vector unit has to run at the same clock rate as the rest of the CPU.
> > > You just have to buffer the cache interface and the incoming decoded float/vector instruction
> > > stream, and buffer the incoming instruction completion signals for write commits.
> > >
> > > This would completely hide single slow vector instructions, you could detect this
> > > by having say every 20th instruction a vector op, the CPU should stay at 4+GHz.
> >
> > I thought they might operate it synchronously at a small integer fraction of the clock
> > speed like 2/3 or 4/5 instead of changing the clock speed of the entire core.
>
> And that certainly has happened in the past!
> Some of the PPCs (around say the PPC 750) had some sort of weird stutter, I can't remember the details, that
> meant that every sixth or fifth FP instruction was delayed by one cycle. This was "architected" in the sense
> that it was meant to happen, every n'th instruction (or perhaps every n'th clock) was devoted to what ever it
> was that took some cleanup, it wasn't an artifact of a particular sequence of FP instructions or data.
>
> With a brief search the best I can find is this which references the issue:
> https://cr.yp.to/hardware/ppc.html
> At the time I did know the details, but it's possible I learned them
> from reading internal Apple docs and they were never public? Oh well.
The way AMD described the fast core clock throttling in the Phenom made me think that they used a fractional integer divider to maintain synchronous operation between cores.