By: Travis (travis.downs.delete@this.gmail.com), April 25, 2017 8:26 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on April 23, 2017 6:59 am wrote:
> anon (spam.delete.delete@this.this.spam.com) on April 23, 2017 6:51 am wrote:
> >
> > I don't know about SKL-X yet, and I'd be lying if I said I know for sure, but I think the Intel
> > FP/vector PRF uses 128bit, not 256bit so 168x128bit means 168 XMM, but "only" 84 YMM registers.
> >
>
> Time to beg Travis to construct a microbenchmark?
Luckily, someone already did:
http://blog.stuffedcow.net/2013/05/measuring-rob-capacity/
Scroll down to the "Physical Register File Size" section (actually, don't, because the whole entry is worth reading from the top), and he mentions that he tested SSE and AVX registers and found approximately the expected number of speculative registers (the remaining being non-speculative and adding up to about 168).
Now he isn't totally explicitly that he ran the same test with both SSE and AVX registers, but he does mention AVX in several places as if he did (it's possible he could just be assuming AVX behaves in the same way...).
In any case, that's entirely unsurprising to me - it would seem to be a pretty big issue for a lot of kernels if there were only ~84 ymm regs in the PRF, since the instruction window would be halved for AVX code and that would show up in a lot of places. Furthermore, I've never seen this claim anywhere else, and there are lots of resources (including DK's own writeup) that claim otherwise...
BTW, that blog has several other micro-architectural reverse-engineering posts which are great...
> anon (spam.delete.delete@this.this.spam.com) on April 23, 2017 6:51 am wrote:
> >
> > I don't know about SKL-X yet, and I'd be lying if I said I know for sure, but I think the Intel
> > FP/vector PRF uses 128bit, not 256bit so 168x128bit means 168 XMM, but "only" 84 YMM registers.
> >
>
> Time to beg Travis to construct a microbenchmark?
Luckily, someone already did:
http://blog.stuffedcow.net/2013/05/measuring-rob-capacity/
Scroll down to the "Physical Register File Size" section (actually, don't, because the whole entry is worth reading from the top), and he mentions that he tested SSE and AVX registers and found approximately the expected number of speculative registers (the remaining being non-speculative and adding up to about 168).
Now he isn't totally explicitly that he ran the same test with both SSE and AVX registers, but he does mention AVX in several places as if he did (it's possible he could just be assuming AVX behaves in the same way...).
In any case, that's entirely unsurprising to me - it would seem to be a pretty big issue for a lot of kernels if there were only ~84 ymm regs in the PRF, since the instruction window would be halved for AVX code and that would show up in a lot of places. Furthermore, I've never seen this claim anywhere else, and there are lots of resources (including DK's own writeup) that claim otherwise...
BTW, that blog has several other micro-architectural reverse-engineering posts which are great...