By: Poindexter (cherullo.delete@this.gmail.com), October 31, 2015 5:37 pm
Room: Moderated Discussions
lurker (lurker9000.delete@this.realemail.mail) on October 31, 2015 4:06 pm wrote:
> Poindexter (cherullo.delete@this.gmail.com) on October 31, 2015 2:47 pm wrote:
> > I find it funny that you like to tout pipe numbers, but you never discuss
> > other architectural features that have direct impact in this discussion:
> > - MOV elimination
> > - Store-to-load forwarding
> > - Memory reordering and memory disambiguation
> > - Instruction fusing
>
> I think the point is that we don't really know the details on the rest of the architecture?
> So I think it's not a problem if he focuses on the details we do know.
Yeah, but to assert that the architecture is unbalanced just by looking at the number of ports, doing some back-of-the-envelope math about it, without access to a single simulation, is just a *bit* too far, don't you think? Makes it look like every AMD engineer is plain incompetent.
> > - Never provided any connection between Haswell's increased IPC over Ivy Bridge to the third AGU.
>
> I have no idea if this is the reason for it, but in Haswell SMT works a lot better than in Ivy.
> Say if you were doing multi-core compilation in Ivy, the whole system would just freeze up until
> the work is completed. It seems to be a bit better in Haswell so perhaps that extra AGU helps?
Sure it helps. Then again Haswell also got a new issue port for ALU operations, including a branch unit which might also help during SMT compilation. Which one is the most important? I don't know!
> > Regarding the FPU, you never mention that Zen's FPU doesn't share ports with the integer ALUs like
> > Haswell does. You never mention that Zen's FPU has more ports and units than Haswell's. You only seem
> > to care about maximum throughput (in the e-penis sense), which frankly, is not that interesting.
>
> Is it really that much of an advantage that FPU doesn't share ports with interger ALUs? In SMT perhaps?
> I guess the separate ports for the ADD and MUL units are an advantage in some workloads.
> And to be fair maximum throughput is important in HPC, right? I
> don't think it's that important in general workloads though.
Having separate ports is certainly an advantage. Just like Haswell's third AGU is. The tricky part is how you quantify this advantage, how you compare those features. With the information we have, it's impossible.
Throughput is important for HPC, that's true, but juan condemned Zen in all markets.
In HPC's case, the lack of AVX-512 has vastly more influence in Zen's ability to get into those HPC supercomputers than a missing AGU (then again, it wasn't long ago that he was saying that ARMs were better than Haswells when driving HPC GPUs, same perf, lower power, but I guess he changed his mind about it). We can't really conclude anything about any market.
I just did some static instruction count on LAPACK (default package on Ubuntu) - it doesn't use packed AVX instructions and it's full of LEAs (think it's due Fortran's calling convention). Looks just the kind of code where Zen's FPU can issue 4 instructions per cycle while the integer side is also on full tilt. Zen may be much better than Haswell for scientific computing.
It also may be great for office work, for laptops, games, all kinds of web servers, so on and so forth.
> Poindexter (cherullo.delete@this.gmail.com) on October 31, 2015 2:47 pm wrote:
> > I find it funny that you like to tout pipe numbers, but you never discuss
> > other architectural features that have direct impact in this discussion:
> > - MOV elimination
> > - Store-to-load forwarding
> > - Memory reordering and memory disambiguation
> > - Instruction fusing
>
> I think the point is that we don't really know the details on the rest of the architecture?
> So I think it's not a problem if he focuses on the details we do know.
Yeah, but to assert that the architecture is unbalanced just by looking at the number of ports, doing some back-of-the-envelope math about it, without access to a single simulation, is just a *bit* too far, don't you think? Makes it look like every AMD engineer is plain incompetent.
> > - Never provided any connection between Haswell's increased IPC over Ivy Bridge to the third AGU.
>
> I have no idea if this is the reason for it, but in Haswell SMT works a lot better than in Ivy.
> Say if you were doing multi-core compilation in Ivy, the whole system would just freeze up until
> the work is completed. It seems to be a bit better in Haswell so perhaps that extra AGU helps?
Sure it helps. Then again Haswell also got a new issue port for ALU operations, including a branch unit which might also help during SMT compilation. Which one is the most important? I don't know!
> > Regarding the FPU, you never mention that Zen's FPU doesn't share ports with the integer ALUs like
> > Haswell does. You never mention that Zen's FPU has more ports and units than Haswell's. You only seem
> > to care about maximum throughput (in the e-penis sense), which frankly, is not that interesting.
>
> Is it really that much of an advantage that FPU doesn't share ports with interger ALUs? In SMT perhaps?
> I guess the separate ports for the ADD and MUL units are an advantage in some workloads.
> And to be fair maximum throughput is important in HPC, right? I
> don't think it's that important in general workloads though.
Having separate ports is certainly an advantage. Just like Haswell's third AGU is. The tricky part is how you quantify this advantage, how you compare those features. With the information we have, it's impossible.
Throughput is important for HPC, that's true, but juan condemned Zen in all markets.
In HPC's case, the lack of AVX-512 has vastly more influence in Zen's ability to get into those HPC supercomputers than a missing AGU (then again, it wasn't long ago that he was saying that ARMs were better than Haswells when driving HPC GPUs, same perf, lower power, but I guess he changed his mind about it). We can't really conclude anything about any market.
I just did some static instruction count on LAPACK (default package on Ubuntu) - it doesn't use packed AVX instructions and it's full of LEAs (think it's due Fortran's calling convention). Looks just the kind of code where Zen's FPU can issue 4 instructions per cycle while the integer side is also on full tilt. Zen may be much better than Haswell for scientific computing.
It also may be great for office work, for laptops, games, all kinds of web servers, so on and so forth.