By: Heikki Kultala (hkultala.delete@this.iki.fi), October 3, 2015 8:01 am
Room: Moderated Discussions
juanrga (nospam.delete@this.juanrga.com) on October 3, 2015 4:48 am wrote:
> It seems some 'reverse-engineering' of Zen patches provides info that Zen is finally
> a 4 ALU + 2 AGU + 2 SIMD with 32KB L1 and 256KB L2. And each SIMD unit is 128bit FMA
>
>
>
> If that is accurate my prediction [1] of the SIMD wide and FMA units
> was right, which implies Zen is a 16 FLOP arch, as I expected.
Though the 128-bit FPU's are one thing which Dresdenboy quessed without any better knowledge/any source. They may also be 256-bit.
> I also got right the total number of integer/mem pipes, but
> I had predicted 3ALU + 3AGU [1], instead 4ALU + 2AGU.
>
> Albeit I finally proposed a 3ALU+3AGU configuration for Zen, I asked David Kanter in this forum
> about the possibility of Zen using an 4ALU+2AGU configuration, when the Internet was full with
> the slides (those slides that latter I showed to be fake [2]). The discussion was:
>
>
> It seems some 'reverse-engineering' of Zen patches provides info that Zen is finally
> a 4 ALU + 2 AGU + 2 SIMD with 32KB L1 and 256KB L2. And each SIMD unit is 128bit FMA
>
>

>
> If that is accurate my prediction [1] of the SIMD wide and FMA units
> was right, which implies Zen is a 16 FLOP arch, as I expected.
Though the 128-bit FPU's are one thing which Dresdenboy quessed without any better knowledge/any source. They may also be 256-bit.
> I also got right the total number of integer/mem pipes, but
> I had predicted 3ALU + 3AGU [1], instead 4ALU + 2AGU.
>
> Albeit I finally proposed a 3ALU+3AGU configuration for Zen, I asked David Kanter in this forum
> about the possibility of Zen using an 4ALU+2AGU configuration, when the Internet was full with
> the slides (those slides that latter I showed to be fake [2]). The discussion was:
>
>
> I have a question, I predicted 3ALU+3AGUs and the leaked diagram shows six integer
> > pipes. Do you believe a 4ALU+2AGU would be a better combination or not?
>
> 3 AGU + 3 ALU is a much better mix. Remember that x86 is load+op, so generally you want to sustain nearly
> a 1:1 ratio of memory to ALU operations. Haswell and Broadwell have extra ALUs to handle branches, etc.
>
> 2 AGUs + 4 ALUs would be rather disappointing and also at a severe disadvantage for HPC to Intel.
4 ALU's + 2 AGUs + separate FPU cluster would mean, that:
compared to haswell: Zen has equal integer execution bandwidth than Haswell, (but more when running mixed integer/fp code), but less address generation bandwidth
Compared to Sandy bridge: Zen has more integer execution bandwidth with equal address generation bandwidth.
But we do not yet know the widths of the FPUs and the LSUs so hard to say anything about the FPU side.