By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), April 9, 2017 7:10 am
Room: Moderated Discussions
matthew (nobody.delete@this.example.com) on April 9, 2017 1:14 am wrote:
> Brett (ggtgp.delete@this.yahoo.com) on April 8, 2017 11:30 pm wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on April 8, 2017 5:01 pm wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on April 8, 2017 12:46 pm wrote:
> > > > BTW, it's fascinating to see how you and Linus are talking through one another, completely
> > > > ignoring each others arguments without even minimal attempt of understanding.
> > >
> > > It's hard to argue with stubborn ignorance. I addressed all his points with hard facts and real numbers.
> > > If someone doesn't want to understand that a softfloat ABI on an FPU gives 95+% of the performance
> > > of a hardfloat ABI or the high cost of adding a FP register file then they can't be helped.
> > >
> > > Wilco
> >
> > You need to show pictures, worth a thousand words and all that.
> > Here is a ARM1 chip Adding a FPU register file is crazy cost wise.
>
> This is the only one where it actually shows the register file as distinct from the logic. I
> know nothing about chip design, so I can't say why that is. I have vague memories from twenty
> years ago of being told that non-architected registers are cheap, it's only architected registers
> that are expensive, but I have no idea if my CS professor was even right at the time.
It's mostly multi-ported register files that are expensive, a status register is fairly cheap in hardware. Architected registers are also expensive in terms of overheads they impose on function calls, process switch, exception handling etc.
A good example of removing a large register file and adding more useful stuff instead is Cortex-M3, vs ARM7tdmi it added a fast multiplier and divider as well as unaligned access. Besides replacing the ARM/Thumb-1 decoder with Thumb-2 for much higher performance, most cycle timings were reduced too. Now that was a great tradeoff.
> But you're missing Linus' point which was simply to have enough implemented to be able to
> have the same ABI for functions with FP arguments between soft and hard FP. Look back at
> the FPA procedure call standard -- only fp0-fp3 were used as argument registers. So all
> we needed at the time were four 64-bit registers -- even ARM1 had twenty-seven 32-bit registers,
> so this would have grown the register file by slightly less than a third.
VFP uses 8 64-bit arguments, so it's still a pretty large register file at half the initial size. Also it means you have introduced yet another variant that needs to be supported by software (using a full size register file means at least your context switch, exception handling, debugger etc don't need any changes).
So you end up paying for a lot of hardware and software overheads to get lower performance (since software floating routines now need to move back and forth between FP and integer registers). You think ARM's customers were asking "please significantly increase the area of your cores as if you added an FPU but make sure floating point is a lot slower"?
> That's somewhat expensive, but it's not *crazy*. By the time you look at ARM6 I think the
> register file had grown slightly to 31, so now it's expanding the register file by a quarter.
> And now you have a cache and so on, so it's starting to look way less expensive.
>
> (I'm not taking a stand on whether Linus is right or not, but you're missing his point)
No I'm not missing his point. Adding an FP register file to get a hardfloat abi does not have any advantage over a softfloat abi. That's a fact.
Wilco
> Brett (ggtgp.delete@this.yahoo.com) on April 8, 2017 11:30 pm wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on April 8, 2017 5:01 pm wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on April 8, 2017 12:46 pm wrote:
> > > > BTW, it's fascinating to see how you and Linus are talking through one another, completely
> > > > ignoring each others arguments without even minimal attempt of understanding.
> > >
> > > It's hard to argue with stubborn ignorance. I addressed all his points with hard facts and real numbers.
> > > If someone doesn't want to understand that a softfloat ABI on an FPU gives 95+% of the performance
> > > of a hardfloat ABI or the high cost of adding a FP register file then they can't be helped.
> > >
> > > Wilco
> >
> > You need to show pictures, worth a thousand words and all that.
> > Here is a ARM1 chip Adding a FPU register file is crazy cost wise.
>
> This is the only one where it actually shows the register file as distinct from the logic. I
> know nothing about chip design, so I can't say why that is. I have vague memories from twenty
> years ago of being told that non-architected registers are cheap, it's only architected registers
> that are expensive, but I have no idea if my CS professor was even right at the time.
It's mostly multi-ported register files that are expensive, a status register is fairly cheap in hardware. Architected registers are also expensive in terms of overheads they impose on function calls, process switch, exception handling etc.
A good example of removing a large register file and adding more useful stuff instead is Cortex-M3, vs ARM7tdmi it added a fast multiplier and divider as well as unaligned access. Besides replacing the ARM/Thumb-1 decoder with Thumb-2 for much higher performance, most cycle timings were reduced too. Now that was a great tradeoff.
> But you're missing Linus' point which was simply to have enough implemented to be able to
> have the same ABI for functions with FP arguments between soft and hard FP. Look back at
> the FPA procedure call standard -- only fp0-fp3 were used as argument registers. So all
> we needed at the time were four 64-bit registers -- even ARM1 had twenty-seven 32-bit registers,
> so this would have grown the register file by slightly less than a third.
VFP uses 8 64-bit arguments, so it's still a pretty large register file at half the initial size. Also it means you have introduced yet another variant that needs to be supported by software (using a full size register file means at least your context switch, exception handling, debugger etc don't need any changes).
So you end up paying for a lot of hardware and software overheads to get lower performance (since software floating routines now need to move back and forth between FP and integer registers). You think ARM's customers were asking "please significantly increase the area of your cores as if you added an FPU but make sure floating point is a lot slower"?
> That's somewhat expensive, but it's not *crazy*. By the time you look at ARM6 I think the
> register file had grown slightly to 31, so now it's expanding the register file by a quarter.
> And now you have a cache and so on, so it's starting to look way less expensive.
>
> (I'm not taking a stand on whether Linus is right or not, but you're missing his point)
No I'm not missing his point. Adding an FP register file to get a hardfloat abi does not have any advantage over a softfloat abi. That's a fact.
Wilco