Article: Knights Landing CPU Speculation
By: Eric (eric.kjellen.delete@this.gmail.com), November 22, 2013 2:10 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on November 21, 2013 6:48 am wrote:
> Eric (eric.kjellen.delete@this.gmail.com) on November 21, 2013 2:28 am wrote:
> > Daniel (x.delete@this.y.z) on November 20, 2013 2:04 pm wrote:
> > > Could they have opted for 2 vector units per core? This would make KNL more similar
> > > to SKL from an EU point of view and might explain the decision to go for OoO FP.
> >
> > That thought struck me too: 2x 512-bit vector units with FMA would give 32 DP FLOP/clock
> > and 72 cores would need 1.3 GHz to give 3 TFLOPS (3 TFLOP/s / (32 FLOP * 72) = 1.302 GHz)
> > which would be a more incremental clock frequency boost over previous-generation MIC.
>
> 2 FPUs on 2-issue core? That's silly. 2-issue is barely enough to keep one FPU reasonably busy. I say,
> barely, because, judging by Top500 list, KNC (and Kepler) delivers much lower sustained-to-peak ratio
> than just about everybody else, and that on the benchamark that is commonly considered easy.
>
Yeah, that explanation might not make sense. However, it has to be explained how 72 cores (presumably Atom/Silvermont) with a 512-bit vector unit (for 16 DP FLOP/cycle with FMA) can achieve 3 DP TFLOPS. My first thought would be that the number of cores is incorrect, but Intel has previously stated that future Xeon Phi designs (and this was said in 2011 and should include Knights Landing) will be based on Atom cores. And getting a comparable high number of cores (>100) that they would from a scale-up of the P54C derivative many-core design in KNC (57-61 cores) would probably not be possible if Atom cores are used, as one Atom core is much larger and power hungry.
What's puzzling is also that while 2.6 GHz is way too high for a highly-parallel chip, 72 cores that compute 16 DP FLOP per core and cycle at this clock frequency exactly matches the 3 DP TFLOPS peak performance figure provided by Intel, though this so far is not exactly significant as the 2.6 GHz was something we grabbed out of thin air for the very purpose of solving this equation. On the other hand, assuming 32 DP FLOP pe core and cycle and a matching halving of the clock speed then results in a 1.3 GHz clock frequency which is only slightly higher than Knights Corner (proudction Xeon Phi of this generation offers clock frequencies in a range from 1.053 GHz to 1.238 GHz) and a very realistic increase for the die shrink from 22nm to 14nm. This would suggest that the 72 core figure is correct but Intel has somehow managed to make each core work as if it had double 512-bit vector units or a single 1024-bit vector unit (here I am reminded of Nicholas Capen's predictions for future AVX/LRBni convergence) so that it can process 32 DP FLOP/cycle.
> Eric (eric.kjellen.delete@this.gmail.com) on November 21, 2013 2:28 am wrote:
> > Daniel (x.delete@this.y.z) on November 20, 2013 2:04 pm wrote:
> > > Could they have opted for 2 vector units per core? This would make KNL more similar
> > > to SKL from an EU point of view and might explain the decision to go for OoO FP.
> >
> > That thought struck me too: 2x 512-bit vector units with FMA would give 32 DP FLOP/clock
> > and 72 cores would need 1.3 GHz to give 3 TFLOPS (3 TFLOP/s / (32 FLOP * 72) = 1.302 GHz)
> > which would be a more incremental clock frequency boost over previous-generation MIC.
>
> 2 FPUs on 2-issue core? That's silly. 2-issue is barely enough to keep one FPU reasonably busy. I say,
> barely, because, judging by Top500 list, KNC (and Kepler) delivers much lower sustained-to-peak ratio
> than just about everybody else, and that on the benchamark that is commonly considered easy.
>
Yeah, that explanation might not make sense. However, it has to be explained how 72 cores (presumably Atom/Silvermont) with a 512-bit vector unit (for 16 DP FLOP/cycle with FMA) can achieve 3 DP TFLOPS. My first thought would be that the number of cores is incorrect, but Intel has previously stated that future Xeon Phi designs (and this was said in 2011 and should include Knights Landing) will be based on Atom cores. And getting a comparable high number of cores (>100) that they would from a scale-up of the P54C derivative many-core design in KNC (57-61 cores) would probably not be possible if Atom cores are used, as one Atom core is much larger and power hungry.
What's puzzling is also that while 2.6 GHz is way too high for a highly-parallel chip, 72 cores that compute 16 DP FLOP per core and cycle at this clock frequency exactly matches the 3 DP TFLOPS peak performance figure provided by Intel, though this so far is not exactly significant as the 2.6 GHz was something we grabbed out of thin air for the very purpose of solving this equation. On the other hand, assuming 32 DP FLOP pe core and cycle and a matching halving of the clock speed then results in a 1.3 GHz clock frequency which is only slightly higher than Knights Corner (proudction Xeon Phi of this generation offers clock frequencies in a range from 1.053 GHz to 1.238 GHz) and a very realistic increase for the die shrink from 22nm to 14nm. This would suggest that the 72 core figure is correct but Intel has somehow managed to make each core work as if it had double 512-bit vector units or a single 1024-bit vector unit (here I am reminded of Nicholas Capen's predictions for future AVX/LRBni convergence) so that it can process 32 DP FLOP/cycle.
Topic | Posted By | Date |
---|---|---|
Knights Landing CPU Speculation | David Kanter | 2013/11/18 02:03 AM |
Knights Landing CPU Speculation | none | 2013/11/18 02:59 AM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/23 03:18 PM |
Knights Landing CPU Speculation | 2013/11/26 01:20 AM | |
Over 2,000 mm^2 of eDRAM??? | Mark Roulo | 2013/11/26 09:28 AM |
Over 2,000 mm^2 of eDRAM??? | David Kanter | 2013/11/26 11:09 AM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 11:21 AM |
Over 2,000 mm^2 of eDRAM??? | tarlinian | 2013/11/26 11:50 AM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 01:07 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 01:09 PM |
Over 2,000 mm^2 of eDRAM??? | aaron spink | 2013/11/26 03:03 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 11:42 PM |
Over 2,000 mm^2 of eDRAM??? | aaron spink | 2013/11/27 10:31 AM |
Over 2,000 mm^2 of eDRAM??? | David Kanter | 2013/11/26 04:25 PM |
Over 2,000 mm^2 of eDRAM??? | tarlinian | 2013/11/26 07:01 PM |
Over 2,000 mm^2 of eDRAM??? | Eric | 2013/11/27 02:54 AM |
eDRAM is DRAM in a logic-oriented process | Paul A. Clayton | 2013/11/27 07:10 AM |
Knights Landing CPU Speculation | James | 2013/11/18 05:26 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/18 02:57 PM |
Knights Landing CPU Speculation | Urban Novak | 2013/11/19 12:49 AM |
Knights Landing CPU Speculation | none | 2013/11/19 01:19 AM |
Knights Landing CPU Speculation | Eric | 2013/11/19 07:48 PM |
Total GPGPU/Xeon Phi market maybe ~ $500M/year ... | Mark Roulo | 2013/11/20 10:35 AM |
Knights Landing CPU Speculation | Wes Felter | 2013/11/19 12:06 PM |
Knights Landing CPU Speculation | Michael S | 2013/11/19 12:49 PM |
Knights Landing CPU Speculation | Eric | 2013/11/18 12:17 PM |
Knights Landing CPU Speculation | Daniel | 2013/11/19 02:28 AM |
Knights Landing CPU Speculation | Eric | 2013/11/19 07:36 PM |
HPC guys score FLOPS non-obviously | Mark Roulo | 2013/11/20 10:43 AM |
3-TFlops-DGEMM | Michael S | 2013/11/20 10:59 AM |
3-TFlops-DGEMM | Mark Roulo | 2013/11/20 12:22 PM |
3-TFlops-DGEMM | Daniel | 2013/11/20 01:04 PM |
3-TFlops-DGEMM | Eric | 2013/11/21 01:28 AM |
3-TFlops-DGEMM | Michael S | 2013/11/21 05:48 AM |
3-TFlops-DGEMM | RecessionCone | 2013/11/21 11:13 AM |
3-TFlops-DGEMM | Michael S | 2013/11/21 02:34 PM |
3-TFlops-DGEMM | Eric | 2013/11/22 02:10 AM |
3-TFlops-DGEMM | Michael S | 2013/11/22 04:41 AM |
A (not very sensible) alternative: FMADD + FADD | Paul A. Clayton | 2013/11/22 08:19 AM |
3-TFlops-DGEMM | Sylvain Collange | 2013/11/24 02:37 AM |
3-TFlops-DGEMM | Michael S | 2013/11/24 06:06 AM |
3-TFlops-DGEMM | Sylvain Collange | 2013/11/24 09:28 AM |
HPC guys score FLOPS non-obviously | Patrick Chase | 2013/11/23 02:58 PM |
Knights Landing CPU Speculation | Paul Caheny | 2013/11/18 01:25 PM |
Knights Landing CPU Speculation | Konrad Schwarz | 2013/11/19 12:24 AM |
Knights Landing CPU Speculation | Amiba Gelos | 2013/11/19 07:36 PM |
Knights Landing CPU Speculation | David Kanter | 2013/11/20 09:52 AM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/21 02:12 PM |
Knights Landing CPU Speculation | Amiba Gelos | 2013/11/21 05:14 PM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/23 03:33 PM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/25 11:29 AM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/25 12:05 PM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/25 12:22 PM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/26 10:11 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 03:05 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 03:15 AM |
Knights Landing CPU Speculation | none | 2013/11/26 03:33 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 06:30 PM |
Knights Landing CPU Speculation | Eric | 2013/11/26 06:34 PM |
What is MCDRAM? | anon | 2013/11/26 08:58 PM |
What is MCDRAM? | none | 2013/11/27 01:00 AM |
What is MCDRAM? | Klimax | 2013/11/27 02:19 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/26 11:06 PM |
Knights Landing CPU Speculation | Klimax | 2013/11/26 11:05 PM |
Knights Landing CPU Speculation | anon | 2013/11/26 05:53 AM |
Knights Landing CPU Speculation | none | 2013/11/26 06:20 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/26 08:06 AM |
Knights Landing CPU Speculation | none | 2013/11/26 09:18 AM |
Knights Landing CPU Speculation | Eric Bron | 2013/11/26 01:21 PM |
Knights Landing CPU Speculation | Eric Bron | 2013/11/26 01:27 PM |
Knights Landing CPU Speculation | none | 2013/11/26 02:26 PM |
Knights Landing CPU Speculation | anon | 2013/11/26 05:42 PM |
Knights Landing CPU Speculation | none | 2013/11/27 01:08 AM |
Knights Landing CPU Speculation | anon | 2013/11/27 01:50 AM |
Knights Landing CPU Speculation | none | 2013/11/27 01:58 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 01:25 AM |
Knights Landing CPU Speculation | anon | 2013/11/27 02:32 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 03:08 AM |
Knights Landing CPU Speculation | Chung Leong | 2013/11/27 01:28 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 02:53 AM |
Knights Landing CPU Speculation | Chung Leong | 2013/11/27 01:03 PM |
BiG.LiTTLe for KNL? | Jeff K | 2013/11/22 06:17 AM |
BiG.LiTTLe for KNL? | Patrick Chase | 2013/11/23 02:54 PM |
BiG.LiTTLe for KNL? | Patrick Chase | 2013/11/23 03:01 PM |
Transactional memory | Patrick Chase | 2013/11/23 02:37 PM |
Transactional memory | Bhima | 2013/11/25 07:01 AM |
Transactional memory | Patrick Chase | 2013/11/25 11:52 AM |
Knights Landing CPU Speculation | Daniel | 2013/11/25 02:17 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/25 03:12 AM |
Knights Landing CPU Speculation | none | 2013/11/25 04:05 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/25 04:45 AM |
Knights Landing CPU Speculation | none | 2013/11/25 04:55 AM |
Knights Landing CPU Speculation | gmb | 2013/11/25 07:21 AM |