By: Brett (ggtgp.delete@this.yahoo.com), May 19, 2022 11:23 am
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 18, 2022 10:40 pm wrote:
> Brett (ggtgp.delete@this.yahoo.com) on May 18, 2022 11:03 am wrote:
>
> > High core count CPU’s are already memory starved, so adding AVX512 is pointless.
> > With 5nm CPU’s you can scratch off the HPC market needing AVX512, as you can’t
> > feed that many CPU’s much less the doubled bandwidth needs of AVX512 units.
> The current #1 in HPC has 1 TB/s per socket. AFAIK the bandwidth of SPR-HBM is not
> yet known but could be similar. How do we feed those without 512-bit vectors?
Wait 2-3 years for the next shrink with twice as many cores, making AVX512 pointless again due to lack of bandwidth.
> > Now all 8 cores are down clocked in response and your net performance uplift of AVX512 is negative.
> You might find these results surprising: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#summary
Not surprised, for tests that fit in L1 cache register operations are measured in picojoules, whereas bandwidth is measured in watts.
The real issue is that big OoOE cores are big and hot and expensive, the same vector work can be accomplished by hard coded math blocks at 10% of the die size and 10% of the power.
Rather than go from 8 cores to 16, you can stay at 8 and add 80 hard coded blocks to do otherwise impossible tasks like real video compression encoding. This is the path Apple is taking, the better path. ;)
Most software cannot make use of more than 2 cores, games can use 4, and up to 8.
16 cores is just stupid for the average user, Microsoft Word will not run faster.
> Brett (ggtgp.delete@this.yahoo.com) on May 18, 2022 11:03 am wrote:
>
> > High core count CPU’s are already memory starved, so adding AVX512 is pointless.
> > With 5nm CPU’s you can scratch off the HPC market needing AVX512, as you can’t
> > feed that many CPU’s much less the doubled bandwidth needs of AVX512 units.
> The current #1 in HPC has 1 TB/s per socket. AFAIK the bandwidth of SPR-HBM is not
> yet known but could be similar. How do we feed those without 512-bit vectors?
Wait 2-3 years for the next shrink with twice as many cores, making AVX512 pointless again due to lack of bandwidth.
> > Now all 8 cores are down clocked in response and your net performance uplift of AVX512 is negative.
> You might find these results surprising: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html#summary
Not surprised, for tests that fit in L1 cache register operations are measured in picojoules, whereas bandwidth is measured in watts.
The real issue is that big OoOE cores are big and hot and expensive, the same vector work can be accomplished by hard coded math blocks at 10% of the die size and 10% of the power.
Rather than go from 8 cores to 16, you can stay at 8 and add 80 hard coded blocks to do otherwise impossible tasks like real video compression encoding. This is the path Apple is taking, the better path. ;)
Most software cannot make use of more than 2 cores, games can use 4, and up to 8.
16 cores is just stupid for the average user, Microsoft Word will not run faster.