By: RichardC (tich.delete@this.pobox.com), April 25, 2017 6:50 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on April 24, 2017 11:52 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on April 22, 2017 3:41 pm wrote:
> [snip]
> > Mill phasing is not similar (and I mean *at all*) to what traditionally called
> > "skewed pipeline" where designers sacrifice ALU-to-load-address latency and
> > branch mispredict penalty in order to improve load-data-to-ALU latency.
>
> While phasing in the Mill is not identical to the use of pipeline skewing to hide load latency, I see it as
> the same basic technique of delaying the start of one operation to allow results to be available "earlier"
> to a dependent operation. (I also consider cascaded ALUs and "counterflow pipelines" to be related techniques.)
It seems to me that either you're *really* doing it within one clock cycle, in which case
the you need to have a peculiarly slow clockspeed to allow a critical path through multiple
execution units; or else you're doing different ops in a bundle in different cycles, at
which point the concept of a "bundle"- and the claims of hardware simplicity - get very weird
(and the compiler has to go through unorthodox contortions).
Either makes it less likely that the Mill can deliver on its extravagant claims.
> Michael S (already5chosen.delete@this.yahoo.com) on April 22, 2017 3:41 pm wrote:
> [snip]
> > Mill phasing is not similar (and I mean *at all*) to what traditionally called
> > "skewed pipeline" where designers sacrifice ALU-to-load-address latency and
> > branch mispredict penalty in order to improve load-data-to-ALU latency.
>
> While phasing in the Mill is not identical to the use of pipeline skewing to hide load latency, I see it as
> the same basic technique of delaying the start of one operation to allow results to be available "earlier"
> to a dependent operation. (I also consider cascaded ALUs and "counterflow pipelines" to be related techniques.)
It seems to me that either you're *really* doing it within one clock cycle, in which case
the you need to have a peculiarly slow clockspeed to allow a critical path through multiple
execution units; or else you're doing different ops in a bundle in different cycles, at
which point the concept of a "bundle"- and the claims of hardware simplicity - get very weird
(and the compiler has to go through unorthodox contortions).
Either makes it less likely that the Mill can deliver on its extravagant claims.