Article: Knights Landing CPU Speculation
By: Michael S (already5chosen.delete@this.yahoo.com), November 22, 2013 5:41 am
Room: Moderated Discussions
Eric (eric.kjellen.delete@this.gmail.com) on November 22, 2013 3:10 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on November 21, 2013 6:48 am wrote:
> > Eric (eric.kjellen.delete@this.gmail.com) on November 21, 2013 2:28 am wrote:
> > > Daniel (x.delete@this.y.z) on November 20, 2013 2:04 pm wrote:
> > > > Could they have opted for 2 vector units per core? This would make KNL more similar
> > > > to SKL from an EU point of view and might explain the decision to go for OoO FP.
> > >
> > > That thought struck me too: 2x 512-bit vector units with FMA would give 32 DP FLOP/clock
> > > and 72 cores would need 1.3 GHz to give 3 TFLOPS (3 TFLOP/s / (32 FLOP * 72) = 1.302 GHz)
> > > which would be a more incremental clock frequency boost over previous-generation MIC.
> >
> > 2 FPUs on 2-issue core? That's silly. 2-issue is barely enough to keep one FPU reasonably busy. I say,
> > barely, because, judging by Top500 list, KNC (and Kepler) delivers much lower sustained-to-peak ratio
> > than just about everybody else, and that on the benchamark that is commonly considered easy.
> >
>
> Yeah, that explanation might not make sense. However, it has to be explained how 72 cores (presumably
> Atom/Silvermont) with a 512-bit vector unit (for 16 DP FLOP/cycle with FMA) can achieve 3 DP TFLOPS.
> My first thought would be that the number of cores is incorrect, but Intel has previously stated
> that future Xeon Phi designs (and this was said in 2011 and should include Knights Landing) will
> be based on Atom cores.
I'd like to see a quote.
> And getting a comparable high number of cores (>100) that they would from
> a scale-up of the P54C derivative many-core design in KNC (57-61 cores) would probably not be possible
> if Atom cores are used, as one Atom core is much larger and power hungry.
>
> What's puzzling is also that while 2.6 GHz is way too high for a highly-parallel chip, 72 cores that compute
> 16 DP FLOP per core and cycle at this clock frequency exactly matches the 3 DP TFLOPS peak performance figure
> provided by Intel, though this so far is not exactly significant as the 2.6 GHz was something we grabbed
> out of thin air for the very purpose of solving this equation. On the other hand, assuming 32 DP FLOP pe
> core and cycle and a matching halving of the clock speed then results in a 1.3 GHz clock frequency which
> is only slightly higher than Knights Corner (proudction Xeon Phi of this generation offers clock frequencies
> in a range from 1.053 GHz to 1.238 GHz) and a very realistic increase for the die shrink from 22nm to 14nm.
> This would suggest that the 72 core figure is correct but Intel has somehow managed to make each core work
> as if it had double 512-bit vector units or a single 1024-bit vector unit (here I am reminded of Nicholas
> Capen's predictions for future AVX/LRBni convergence) so that it can process 32 DP FLOP/cycle.
1024-bit makes a lot of sense from technical perspective. Except that Intel already said, unequivocally, that Knights Landing implements AVX-512 :(
http://software.intel.com/en-us/blogs/2013/avx-512-instructions
> Michael S (already5chosen.delete@this.yahoo.com) on November 21, 2013 6:48 am wrote:
> > Eric (eric.kjellen.delete@this.gmail.com) on November 21, 2013 2:28 am wrote:
> > > Daniel (x.delete@this.y.z) on November 20, 2013 2:04 pm wrote:
> > > > Could they have opted for 2 vector units per core? This would make KNL more similar
> > > > to SKL from an EU point of view and might explain the decision to go for OoO FP.
> > >
> > > That thought struck me too: 2x 512-bit vector units with FMA would give 32 DP FLOP/clock
> > > and 72 cores would need 1.3 GHz to give 3 TFLOPS (3 TFLOP/s / (32 FLOP * 72) = 1.302 GHz)
> > > which would be a more incremental clock frequency boost over previous-generation MIC.
> >
> > 2 FPUs on 2-issue core? That's silly. 2-issue is barely enough to keep one FPU reasonably busy. I say,
> > barely, because, judging by Top500 list, KNC (and Kepler) delivers much lower sustained-to-peak ratio
> > than just about everybody else, and that on the benchamark that is commonly considered easy.
> >
>
> Yeah, that explanation might not make sense. However, it has to be explained how 72 cores (presumably
> Atom/Silvermont) with a 512-bit vector unit (for 16 DP FLOP/cycle with FMA) can achieve 3 DP TFLOPS.
> My first thought would be that the number of cores is incorrect, but Intel has previously stated
> that future Xeon Phi designs (and this was said in 2011 and should include Knights Landing) will
> be based on Atom cores.
I'd like to see a quote.
> And getting a comparable high number of cores (>100) that they would from
> a scale-up of the P54C derivative many-core design in KNC (57-61 cores) would probably not be possible
> if Atom cores are used, as one Atom core is much larger and power hungry.
>
> What's puzzling is also that while 2.6 GHz is way too high for a highly-parallel chip, 72 cores that compute
> 16 DP FLOP per core and cycle at this clock frequency exactly matches the 3 DP TFLOPS peak performance figure
> provided by Intel, though this so far is not exactly significant as the 2.6 GHz was something we grabbed
> out of thin air for the very purpose of solving this equation. On the other hand, assuming 32 DP FLOP pe
> core and cycle and a matching halving of the clock speed then results in a 1.3 GHz clock frequency which
> is only slightly higher than Knights Corner (proudction Xeon Phi of this generation offers clock frequencies
> in a range from 1.053 GHz to 1.238 GHz) and a very realistic increase for the die shrink from 22nm to 14nm.
> This would suggest that the 72 core figure is correct but Intel has somehow managed to make each core work
> as if it had double 512-bit vector units or a single 1024-bit vector unit (here I am reminded of Nicholas
> Capen's predictions for future AVX/LRBni convergence) so that it can process 32 DP FLOP/cycle.
1024-bit makes a lot of sense from technical perspective. Except that Intel already said, unequivocally, that Knights Landing implements AVX-512 :(
http://software.intel.com/en-us/blogs/2013/avx-512-instructions
Topic | Posted By | Date |
---|---|---|
Knights Landing CPU Speculation | David Kanter | 2013/11/18 03:03 AM |
Knights Landing CPU Speculation | none | 2013/11/18 03:59 AM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/23 04:18 PM |
Knights Landing CPU Speculation | 2013/11/26 02:20 AM | |
Over 2,000 mm^2 of eDRAM??? | Mark Roulo | 2013/11/26 10:28 AM |
Over 2,000 mm^2 of eDRAM??? | David Kanter | 2013/11/26 12:09 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 12:21 PM |
Over 2,000 mm^2 of eDRAM??? | tarlinian | 2013/11/26 12:50 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 02:07 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 02:09 PM |
Over 2,000 mm^2 of eDRAM??? | aaron spink | 2013/11/26 04:03 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/27 12:42 AM |
Over 2,000 mm^2 of eDRAM??? | aaron spink | 2013/11/27 11:31 AM |
Over 2,000 mm^2 of eDRAM??? | David Kanter | 2013/11/26 05:25 PM |
Over 2,000 mm^2 of eDRAM??? | tarlinian | 2013/11/26 08:01 PM |
Over 2,000 mm^2 of eDRAM??? | Eric | 2013/11/27 03:54 AM |
eDRAM is DRAM in a logic-oriented process | Paul A. Clayton | 2013/11/27 08:10 AM |
Knights Landing CPU Speculation | James | 2013/11/18 06:26 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/18 03:57 PM |
Knights Landing CPU Speculation | Urban Novak | 2013/11/19 01:49 AM |
Knights Landing CPU Speculation | none | 2013/11/19 02:19 AM |
Knights Landing CPU Speculation | Eric | 2013/11/19 08:48 PM |
Total GPGPU/Xeon Phi market maybe ~ $500M/year ... | Mark Roulo | 2013/11/20 11:35 AM |
Knights Landing CPU Speculation | Wes Felter | 2013/11/19 01:06 PM |
Knights Landing CPU Speculation | Michael S | 2013/11/19 01:49 PM |
Knights Landing CPU Speculation | Eric | 2013/11/18 01:17 PM |
Knights Landing CPU Speculation | Daniel | 2013/11/19 03:28 AM |
Knights Landing CPU Speculation | Eric | 2013/11/19 08:36 PM |
HPC guys score FLOPS non-obviously | Mark Roulo | 2013/11/20 11:43 AM |
3-TFlops-DGEMM | Michael S | 2013/11/20 11:59 AM |
3-TFlops-DGEMM | Mark Roulo | 2013/11/20 01:22 PM |
3-TFlops-DGEMM | Daniel | 2013/11/20 02:04 PM |
3-TFlops-DGEMM | Eric | 2013/11/21 02:28 AM |
3-TFlops-DGEMM | Michael S | 2013/11/21 06:48 AM |
3-TFlops-DGEMM | RecessionCone | 2013/11/21 12:13 PM |
3-TFlops-DGEMM | Michael S | 2013/11/21 03:34 PM |
3-TFlops-DGEMM | Eric | 2013/11/22 03:10 AM |
3-TFlops-DGEMM | Michael S | 2013/11/22 05:41 AM |
A (not very sensible) alternative: FMADD + FADD | Paul A. Clayton | 2013/11/22 09:19 AM |
3-TFlops-DGEMM | Sylvain Collange | 2013/11/24 03:37 AM |
3-TFlops-DGEMM | Michael S | 2013/11/24 07:06 AM |
3-TFlops-DGEMM | Sylvain Collange | 2013/11/24 10:28 AM |
HPC guys score FLOPS non-obviously | Patrick Chase | 2013/11/23 03:58 PM |
Knights Landing CPU Speculation | Paul Caheny | 2013/11/18 02:25 PM |
Knights Landing CPU Speculation | Konrad Schwarz | 2013/11/19 01:24 AM |
Knights Landing CPU Speculation | Amiba Gelos | 2013/11/19 08:36 PM |
Knights Landing CPU Speculation | David Kanter | 2013/11/20 10:52 AM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/21 03:12 PM |
Knights Landing CPU Speculation | Amiba Gelos | 2013/11/21 06:14 PM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/23 04:33 PM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/25 12:29 PM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/25 01:05 PM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/25 01:22 PM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/26 11:11 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 04:05 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 04:15 AM |
Knights Landing CPU Speculation | none | 2013/11/26 04:33 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 07:30 PM |
Knights Landing CPU Speculation | Eric | 2013/11/26 07:34 PM |
What is MCDRAM? | anon | 2013/11/26 09:58 PM |
What is MCDRAM? | none | 2013/11/27 02:00 AM |
What is MCDRAM? | Klimax | 2013/11/27 03:19 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/27 12:06 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/27 12:05 AM |
Knights Landing CPU Speculation | anon | 2013/11/26 06:53 AM |
Knights Landing CPU Speculation | none | 2013/11/26 07:20 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/26 09:06 AM |
Knights Landing CPU Speculation | none | 2013/11/26 10:18 AM |
Knights Landing CPU Speculation | Eric Bron | 2013/11/26 02:21 PM |
Knights Landing CPU Speculation | Eric Bron | 2013/11/26 02:27 PM |
Knights Landing CPU Speculation | none | 2013/11/26 03:26 PM |
Knights Landing CPU Speculation | anon | 2013/11/26 06:42 PM |
Knights Landing CPU Speculation | none | 2013/11/27 02:08 AM |
Knights Landing CPU Speculation | anon | 2013/11/27 02:50 AM |
Knights Landing CPU Speculation | none | 2013/11/27 02:58 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 02:25 AM |
Knights Landing CPU Speculation | anon | 2013/11/27 03:32 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 04:08 AM |
Knights Landing CPU Speculation | Chung Leong | 2013/11/27 02:28 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 03:53 AM |
Knights Landing CPU Speculation | Chung Leong | 2013/11/27 02:03 PM |
BiG.LiTTLe for KNL? | Jeff K | 2013/11/22 07:17 AM |
BiG.LiTTLe for KNL? | Patrick Chase | 2013/11/23 03:54 PM |
BiG.LiTTLe for KNL? | Patrick Chase | 2013/11/23 04:01 PM |
Transactional memory | Patrick Chase | 2013/11/23 03:37 PM |
Transactional memory | Bhima | 2013/11/25 08:01 AM |
Transactional memory | Patrick Chase | 2013/11/25 12:52 PM |
Knights Landing CPU Speculation | Daniel | 2013/11/25 03:17 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/25 04:12 AM |
Knights Landing CPU Speculation | none | 2013/11/25 05:05 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/25 05:45 AM |
Knights Landing CPU Speculation | none | 2013/11/25 05:55 AM |
Knights Landing CPU Speculation | gmb | 2013/11/25 08:21 AM |