By: juanrga (nospam.delete@this.juanrga.com), August 25, 2015 6:01 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on August 25, 2015 2:58 pm wrote:
> It means that it was very possible, and may be still possible for Intel to base their GPGPU competitor on
> Sandy Bridge derivative. Probably, not exactly Broadwell and not exactly Skylake, but slightly different
> 14nm core. Tuned for 2.5-3 GHz operation instead of 4 GHz, probably same number of execution ports as sandyB,
> less aggressive bypasses, less agressive divide units etc... In short, slightly compromised absolute performance
> relatively to Broadwell and Skylake, but almost the same performance per Watt at low frequency and at 20-30%
> smaller area. Now let's put 33 such cores running at 1.4/2.6 GHz (Base/Turbo) on a single huge die. Or, may
> be, if it fits, 39 cores with Base=1.2 GHz. Or somthing in the middle, you got the idea.
> Just like in the case of Core-M vs Cherry Trail we will get ALOT better (than KNL) single threaded
> performance, ALOT better scalar multithreaded FP and about the same multithreaded integer at about
> the same power envelop. Now, you are asking: "Who cares? this thing are important for smart customers,
> but the whole point of KNL is pleasing stupid customers by showing them that we can run LINPACK as
> fast the biggest baddest Maxwell and than slightly faster yet! You variant is not even close!".
> And here you understood why AVX512 is a huge mistake. Core-based
> GPGPU killer absolutely needs AVX-1024. Or wider.
>
Nvidia talk at ISC2015 was much more interesting. They compared two KNL CPUs against Power+CUDA using Amdahl's law. At 98% parallel the KNL was competitive. At 90% parallel work the KNL system was about two times slower than the Power+CUDA system: ~2 min vs 4.5 min. At 70% parallel work, the KNL system was more than 3x slower.
Wider vector units and less cores had worked better.
> It means that it was very possible, and may be still possible for Intel to base their GPGPU competitor on
> Sandy Bridge derivative. Probably, not exactly Broadwell and not exactly Skylake, but slightly different
> 14nm core. Tuned for 2.5-3 GHz operation instead of 4 GHz, probably same number of execution ports as sandyB,
> less aggressive bypasses, less agressive divide units etc... In short, slightly compromised absolute performance
> relatively to Broadwell and Skylake, but almost the same performance per Watt at low frequency and at 20-30%
> smaller area. Now let's put 33 such cores running at 1.4/2.6 GHz (Base/Turbo) on a single huge die. Or, may
> be, if it fits, 39 cores with Base=1.2 GHz. Or somthing in the middle, you got the idea.
> Just like in the case of Core-M vs Cherry Trail we will get ALOT better (than KNL) single threaded
> performance, ALOT better scalar multithreaded FP and about the same multithreaded integer at about
> the same power envelop. Now, you are asking: "Who cares? this thing are important for smart customers,
> but the whole point of KNL is pleasing stupid customers by showing them that we can run LINPACK as
> fast the biggest baddest Maxwell and than slightly faster yet! You variant is not even close!".
> And here you understood why AVX512 is a huge mistake. Core-based
> GPGPU killer absolutely needs AVX-1024. Or wider.
>
Nvidia talk at ISC2015 was much more interesting. They compared two KNL CPUs against Power+CUDA using Amdahl's law. At 98% parallel the KNL was competitive. At 90% parallel work the KNL system was about two times slower than the Power+CUDA system: ~2 min vs 4.5 min. At 70% parallel work, the KNL system was more than 3x slower.
Wider vector units and less cores had worked better.