By: Brett (ggtgp.delete@this.yahoo.com), August 3, 2022 11:37 am
Room: Moderated Discussions
hobold (hobold.delete@this.vectorizer.org) on August 3, 2022 3:17 am wrote:
> Brett (ggtgp.delete@this.yahoo.com) on August 2, 2022 10:41 pm wrote:
>
> [...]
> > A full set of integer instructions with three sources is all that’s left to do on the high end. Reduces
> > your critical path length and saves a write port and tracking compared to two instructions.
>
> A dependent pair of two-input instructions has a total of three inputs and one output ...
Yes two instructions with four inputs and two outputs can be combined into thee inputs and one output, but ONLY if the middle output is burned immediately by the final result, or very soon after.
Write outputs are a critical resource and saving a source helps as well, as well as the reduction in critical path length.
> > Things like two writes in an instruction do not help much due to instruction combining
> > or just being wide giving the same effect. Mul Hi/Lo and double register shifts.
>
> ... or it has two outputs if the earlier output is to be re-used later. Yes, that is instruction
> combining. Currently we are combining a small set of specific pairs. How about a mechanism
> to generally exploit data dependencies? Like combining any dependent pair?
>
> ILP is limited, data dependencies are unlimited. Purely playing around, let's try regarding data dependencies
> as a resource ... because if we found a way to exploit that ... the gains might be substantial.
Yes, that is what we are talking about, for any new architecture to replace or extend ARM64.
> > You could pack an in-then-else in one instruction saving a short jump, but that just
> > saves code space and makes downstream handling a hassle. Bottom of the barrel stuff.
>
> Isn't that akin to original ARM's general mechanism for predication
> of most instructions? Or do you mean something else?
Yes basically, but with a ~4X longer distance covered.
Personally I would prefer two short instructions instead, but fixed width instructions are it for the high end, for the next decade at least.
This just saves code size, which is not considered important. Ignoring that code size is a low single digit effect, and as such bigger than almost everything else being looked at. ;)
And probably with less headaches and cost to implement as well. ;)
Sticks get stuck in mud, and religious views on architecture are hard to change.
> Brett (ggtgp.delete@this.yahoo.com) on August 2, 2022 10:41 pm wrote:
>
> [...]
> > A full set of integer instructions with three sources is all that’s left to do on the high end. Reduces
> > your critical path length and saves a write port and tracking compared to two instructions.
>
> A dependent pair of two-input instructions has a total of three inputs and one output ...
Yes two instructions with four inputs and two outputs can be combined into thee inputs and one output, but ONLY if the middle output is burned immediately by the final result, or very soon after.
Write outputs are a critical resource and saving a source helps as well, as well as the reduction in critical path length.
> > Things like two writes in an instruction do not help much due to instruction combining
> > or just being wide giving the same effect. Mul Hi/Lo and double register shifts.
>
> ... or it has two outputs if the earlier output is to be re-used later. Yes, that is instruction
> combining. Currently we are combining a small set of specific pairs. How about a mechanism
> to generally exploit data dependencies? Like combining any dependent pair?
>
> ILP is limited, data dependencies are unlimited. Purely playing around, let's try regarding data dependencies
> as a resource ... because if we found a way to exploit that ... the gains might be substantial.
Yes, that is what we are talking about, for any new architecture to replace or extend ARM64.
> > You could pack an in-then-else in one instruction saving a short jump, but that just
> > saves code space and makes downstream handling a hassle. Bottom of the barrel stuff.
>
> Isn't that akin to original ARM's general mechanism for predication
> of most instructions? Or do you mean something else?
Yes basically, but with a ~4X longer distance covered.
Personally I would prefer two short instructions instead, but fixed width instructions are it for the high end, for the next decade at least.
This just saves code size, which is not considered important. Ignoring that code size is a low single digit effect, and as such bigger than almost everything else being looked at. ;)
And probably with less headaches and cost to implement as well. ;)
Sticks get stuck in mud, and religious views on architecture are hard to change.