By: Andrei F (andrei.delete@this.anandtech.com),
Room: Moderated Discussions
Dummond D. Slow (mental.delete@this.protozoa.us) on November 11, 2020 5:56 am wrote:
> Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on November 11, 2020 1:23 am wrote:
> > In an otherwise excellent article this made me cringe:
> >
>
> I found the claims that "x86 can't do more than 4-wide decode" a bit unfair.
>
> >x86 CPUs today still only feature a 4-wide decoder designs that is seemingly limited
> from going wider at this point in time due to the ISA’s inherent variable instruction
> length nature, making designing decoders that are able to deal with aspect of the architecture
> more difficult compared to the ARM ISA’s fixed-length instructions.
>
> I have a feeling the article originally had something in the vein that it's impossible
> to go over 4-wide for x86, might have been (rightfully) edited or I recall wrong.
>
> 1) correct me if I'm wrong but I think Skylake actually has 5-wide decode.
> 2) article failed to mention that uOP cache are used in on x86 which change matters in this
> regard a lot. Could be said that x86 kinda relies on it instead on the raw decode width.
>
> Also, article said:
> >The four 128-bit NEON pipelines thus on paper match the current throughput capabilities
> of desktop cores from AMD and Intel, albeit with smaller vectors.
>
> How can you say A "matches throughput" of X when A only does half the calculations?
> The vector being half as wide is a huge difference, not some small detail.
>
I didn't say it's impossible to go over 4-wide. I specifically asked AMD's Mike Clarke about it and he mentioned that going to a wider decode than 4 would incur additional pipeline stages which would actually have a detriment on performance, and that's why they're not doing it.
So which they're not technically limited, they're practically limited, at least at this moment in time.
> Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on November 11, 2020 1:23 am wrote:
> > In an otherwise excellent article this made me cringe:
> >
>
> I found the claims that "x86 can't do more than 4-wide decode" a bit unfair.
>
> >x86 CPUs today still only feature a 4-wide decoder designs that is seemingly limited
> from going wider at this point in time due to the ISA’s inherent variable instruction
> length nature, making designing decoders that are able to deal with aspect of the architecture
> more difficult compared to the ARM ISA’s fixed-length instructions.
>
> I have a feeling the article originally had something in the vein that it's impossible
> to go over 4-wide for x86, might have been (rightfully) edited or I recall wrong.
>
> 1) correct me if I'm wrong but I think Skylake actually has 5-wide decode.
> 2) article failed to mention that uOP cache are used in on x86 which change matters in this
> regard a lot. Could be said that x86 kinda relies on it instead on the raw decode width.
>
> Also, article said:
> >The four 128-bit NEON pipelines thus on paper match the current throughput capabilities
> of desktop cores from AMD and Intel, albeit with smaller vectors.
>
> How can you say A "matches throughput" of X when A only does half the calculations?
> The vector being half as wide is a huge difference, not some small detail.
>
I didn't say it's impossible to go over 4-wide. I specifically asked AMD's Mike Clarke about it and he mentioned that going to a wider decode than 4 would incur additional pipeline stages which would actually have a detriment on performance, and that's why they're not doing it.
So which they're not technically limited, they're practically limited, at least at this moment in time.


