By: dmcq (dmcq.delete@this.fano.co.uk), May 21, 2022 8:57 am
Room: Moderated Discussions
dmcq (dmcq.delete@this.fano.co.uk) on May 21, 2022 8:35 am wrote:
> Doug S (foo.delete@this.bar.bar) on May 20, 2022 10:07 am wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on May 20, 2022 5:43 am wrote:
> > > I certainly think it's a bit of a pity that SVE wasn't included in M1. It would have allowed a lot
> > > of code like strlen to be inline. Routines using the extra facilities of SVE2 though would be more
> > > like the matrix routines where using them out of line is expected one one might want to change them.
> >
> >
> > Would it even make sense to use SVE2 to inline strlen()? The average length of
> > strings is fairly short, so firing up the SVE units and loading up the registers
> > if your average string length is 10 would seem to be a massive waste.
> >
> > That doesn't even get into what happens if your string isn't aligned with the SVE2 vector size.
>
> It makes perfectly good sense. You only worry about the cache width and SVE registers should be a
> match for that. I presume you've not actually looked at the strlen code for instance in a library?
That does raise an important point though. As far as I know the conditions under which the first fault register are set are not specified - it could be on a cache line boundary or a page boundary or because the next page access would cause a fault. So there's room for yet another predictor in the processor for short vectors versus long ones to avoid preloading the cache when that's not needed. I suppose the same problem happens with any algorithm but it looks like it belongs to SVE when it is used.
> Doug S (foo.delete@this.bar.bar) on May 20, 2022 10:07 am wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on May 20, 2022 5:43 am wrote:
> > > I certainly think it's a bit of a pity that SVE wasn't included in M1. It would have allowed a lot
> > > of code like strlen to be inline. Routines using the extra facilities of SVE2 though would be more
> > > like the matrix routines where using them out of line is expected one one might want to change them.
> >
> >
> > Would it even make sense to use SVE2 to inline strlen()? The average length of
> > strings is fairly short, so firing up the SVE units and loading up the registers
> > if your average string length is 10 would seem to be a massive waste.
> >
> > That doesn't even get into what happens if your string isn't aligned with the SVE2 vector size.
>
> It makes perfectly good sense. You only worry about the cache width and SVE registers should be a
> match for that. I presume you've not actually looked at the strlen code for instance in a library?
That does raise an important point though. As far as I know the conditions under which the first fault register are set are not specified - it could be on a cache line boundary or a page boundary or because the next page access would cause a fault. So there's room for yet another predictor in the processor for short vectors versus long ones to avoid preloading the cache when that's not needed. I suppose the same problem happens with any algorithm but it looks like it belongs to SVE when it is used.