By: Doug S (foo.delete@this.bar.bar), August 17, 2022 9:59 am
Room: Moderated Discussions
anon2 (anon.delete@this.anon.com) on August 17, 2022 6:00 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on August 17, 2022 5:11 am wrote:
> > anon2 (anon.delete@this.anon.com) on August 17, 2022 4:43 am wrote:
> > > none (none.delete@this.none.com) on August 17, 2022 2:18 am wrote:
> > > > anon2 (anon.delete@this.anon.com) on August 16, 2022 10:08 pm wrote:
> > > > [...]
> > > > > > what I am saying
> > > > > > is that it is cheaper to load from instruction stream than from a random place in memory.
> > > > > >
> > > > >
> > > > > According to the single line of HDL you all combined never wrote?
> > > >
> > > > Ha the argument from authority. Do you really think you need to have written HDL to have
> > > > some sane and interesting point of view on that matter?
> > >
> > > If you have no idea how front end data could possibly cost anything, then you can not have
> > > an interesting opinion about it. Looking at even quite a simple CPU implementation will show
> > > you stages and wires and structures that you will need to hold and move such immediate data
> > > through the pipeline. It doesn't just magically appear for free when you need it.
> > >
> >
> > May be, that's part of the problem? Those wires and structures are more visible in simple
> > CPU implementation than they are in the huge complex CPUs all of us use daily, in which
> > those things are here anyway, regardless of having or omitting long instructions?
> >
> > I actually wrote plenty of HDL code, but since only about 0.1% of it is related to
> > tiny hobby CPU I don't think that my HDL experience helps my judgement on the issue
> > more or even as much as my software experience and as my general (or call it "hand
> > waving") understanding of relationships between spatial locality of code and data.
> >
>
> But you can at least appreciate transistors and wires have a cost, at least if you have tried to synthesize
> any of these. Then in that case you know instructions are stored not only in instruction cache but
> buffers and caches throughout the pipeline, instructions move through possibly a dozen stages and may
> be in flight for hundreds or thousands of cycles while that data needs to be stored. And the immediate
> data needs to be scheduled to arrive at the execution unit as a source when ready.
>
> The point is this immediate data does not just magically arrive at the execution unit when the instruction
> does, just because it happened to be encoded in one ISA instruction. You want 64-bit immediates, you have
> to store them in various places and you have to move them from place to place and into execution units.
>
> None of us know exact tradeoffs to make a real judgement call here, so it's all just
> handwaving really, but if one can not fathom the idea that putting more immediates
> in code could have any implementation costs, then that's a whole different league.
Absolutely, and if few CPUs offered a non trivial load immediate instruction (i.e. limited bits and no support for shifting or masking) then we could take that as a sign that adding proper support for load immediate was difficult for the reasons you suggest (or other reasons) and that's why CPUs don't support it.
But the truth is that most CPUs do have good support for load immediate. This cost you are worried about has already been paid. So the only question is, do you use that which is already paid for, or is it still better to load from a constant table.
> Michael S (already5chosen.delete@this.yahoo.com) on August 17, 2022 5:11 am wrote:
> > anon2 (anon.delete@this.anon.com) on August 17, 2022 4:43 am wrote:
> > > none (none.delete@this.none.com) on August 17, 2022 2:18 am wrote:
> > > > anon2 (anon.delete@this.anon.com) on August 16, 2022 10:08 pm wrote:
> > > > [...]
> > > > > > what I am saying
> > > > > > is that it is cheaper to load from instruction stream than from a random place in memory.
> > > > > >
> > > > >
> > > > > According to the single line of HDL you all combined never wrote?
> > > >
> > > > Ha the argument from authority. Do you really think you need to have written HDL to have
> > > > some sane and interesting point of view on that matter?
> > >
> > > If you have no idea how front end data could possibly cost anything, then you can not have
> > > an interesting opinion about it. Looking at even quite a simple CPU implementation will show
> > > you stages and wires and structures that you will need to hold and move such immediate data
> > > through the pipeline. It doesn't just magically appear for free when you need it.
> > >
> >
> > May be, that's part of the problem? Those wires and structures are more visible in simple
> > CPU implementation than they are in the huge complex CPUs all of us use daily, in which
> > those things are here anyway, regardless of having or omitting long instructions?
> >
> > I actually wrote plenty of HDL code, but since only about 0.1% of it is related to
> > tiny hobby CPU I don't think that my HDL experience helps my judgement on the issue
> > more or even as much as my software experience and as my general (or call it "hand
> > waving") understanding of relationships between spatial locality of code and data.
> >
>
> But you can at least appreciate transistors and wires have a cost, at least if you have tried to synthesize
> any of these. Then in that case you know instructions are stored not only in instruction cache but
> buffers and caches throughout the pipeline, instructions move through possibly a dozen stages and may
> be in flight for hundreds or thousands of cycles while that data needs to be stored. And the immediate
> data needs to be scheduled to arrive at the execution unit as a source when ready.
>
> The point is this immediate data does not just magically arrive at the execution unit when the instruction
> does, just because it happened to be encoded in one ISA instruction. You want 64-bit immediates, you have
> to store them in various places and you have to move them from place to place and into execution units.
>
> None of us know exact tradeoffs to make a real judgement call here, so it's all just
> handwaving really, but if one can not fathom the idea that putting more immediates
> in code could have any implementation costs, then that's a whole different league.
Absolutely, and if few CPUs offered a non trivial load immediate instruction (i.e. limited bits and no support for shifting or masking) then we could take that as a sign that adding proper support for load immediate was difficult for the reasons you suggest (or other reasons) and that's why CPUs don't support it.
But the truth is that most CPUs do have good support for load immediate. This cost you are worried about has already been paid. So the only question is, do you use that which is already paid for, or is it still better to load from a constant table.