Article: Knights Landing CPU Speculation
By: Michael S (already5chosen.delete@this.yahoo.com), November 24, 2013 6:06 am
Room: Moderated Discussions
Sylvain Collange (firstname.lastname.delete@this.gmail.com) on November 24, 2013 3:37 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on November 21, 2013 6:48 am wrote:
> > 2 FPUs on 2-issue core? That's silly. 2-issue is barely enough to keep one FPU reasonably busy.
>
> That is certainly true for most scalar workloads, but vector-intensive code can easily saturate a single
> vector unit. An FMA pipeline typically runs SIMD integer instructions in addition to FP instructions.
Integer instructions are the smaller part of the problem. The bigger part are memory instructions.
In my experience, for typical linear algebra algorithm with 32 sw visible registers it's pretty hard to reduce the number of memory accesses per to FMA below 0.7-0.8.
And in that regard linear algebra is easier than most.
>
> In SPMD-style code such as OpenCL, every variable is a vector unless the compiler can prove it holds the same
> value for all threads of a warp. Even assuming an omniscient compiler, scalar instructions only represent
> about 30% of the instruction mix, and less with agressive unrolling. Thus vector performance matters.
> Fermi and Kepler already have 2 FMAs for each scheduler, and can
> sustain the peak issue rate on a 100% FMA instruction mix.
I am not sure that Fermi/Kepler reference is relevant in discussion of KNL. I am sorry that I did it myself in a previous post.
>
> A 2-issue core with dual-FMA is the most sensible option in my opinion.
Certainly not for KNC-style core, where load and OP are separate pipeline operations.
For Bonnel-style core, with its CISC (or, if you want, TI TMS320C30/C40 -style) load+op pipeline - may be.
But resulting core wouldn't resemble Bonnel/Saltwell, even less so Silvermont.
> I am much more skeptical about out-of-order execution of a fully mask-predicated instruction set...
You mean, too many register inputs per uOP?
I didn't look at AVX-512 at sufficient details. How many register inputs will be needed per FMA?
> Michael S (already5chosen.delete@this.yahoo.com) on November 21, 2013 6:48 am wrote:
> > 2 FPUs on 2-issue core? That's silly. 2-issue is barely enough to keep one FPU reasonably busy.
>
> That is certainly true for most scalar workloads, but vector-intensive code can easily saturate a single
> vector unit. An FMA pipeline typically runs SIMD integer instructions in addition to FP instructions.
Integer instructions are the smaller part of the problem. The bigger part are memory instructions.
In my experience, for typical linear algebra algorithm with 32 sw visible registers it's pretty hard to reduce the number of memory accesses per to FMA below 0.7-0.8.
And in that regard linear algebra is easier than most.
>
> In SPMD-style code such as OpenCL, every variable is a vector unless the compiler can prove it holds the same
> value for all threads of a warp. Even assuming an omniscient compiler, scalar instructions only represent
> about 30% of the instruction mix, and less with agressive unrolling. Thus vector performance matters.
> Fermi and Kepler already have 2 FMAs for each scheduler, and can
> sustain the peak issue rate on a 100% FMA instruction mix.
I am not sure that Fermi/Kepler reference is relevant in discussion of KNL. I am sorry that I did it myself in a previous post.
>
> A 2-issue core with dual-FMA is the most sensible option in my opinion.
Certainly not for KNC-style core, where load and OP are separate pipeline operations.
For Bonnel-style core, with its CISC (or, if you want, TI TMS320C30/C40 -style) load+op pipeline - may be.
But resulting core wouldn't resemble Bonnel/Saltwell, even less so Silvermont.
> I am much more skeptical about out-of-order execution of a fully mask-predicated instruction set...
You mean, too many register inputs per uOP?
I didn't look at AVX-512 at sufficient details. How many register inputs will be needed per FMA?
Topic | Posted By | Date |
---|---|---|
Knights Landing CPU Speculation | David Kanter | 2013/11/18 02:03 AM |
Knights Landing CPU Speculation | none | 2013/11/18 02:59 AM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/23 03:18 PM |
Knights Landing CPU Speculation | 2013/11/26 01:20 AM | |
Over 2,000 mm^2 of eDRAM??? | Mark Roulo | 2013/11/26 09:28 AM |
Over 2,000 mm^2 of eDRAM??? | David Kanter | 2013/11/26 11:09 AM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 11:21 AM |
Over 2,000 mm^2 of eDRAM??? | tarlinian | 2013/11/26 11:50 AM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 01:07 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 01:09 PM |
Over 2,000 mm^2 of eDRAM??? | aaron spink | 2013/11/26 03:03 PM |
Over 2,000 mm^2 of eDRAM??? | Eric Bron | 2013/11/26 11:42 PM |
Over 2,000 mm^2 of eDRAM??? | aaron spink | 2013/11/27 10:31 AM |
Over 2,000 mm^2 of eDRAM??? | David Kanter | 2013/11/26 04:25 PM |
Over 2,000 mm^2 of eDRAM??? | tarlinian | 2013/11/26 07:01 PM |
Over 2,000 mm^2 of eDRAM??? | Eric | 2013/11/27 02:54 AM |
eDRAM is DRAM in a logic-oriented process | Paul A. Clayton | 2013/11/27 07:10 AM |
Knights Landing CPU Speculation | James | 2013/11/18 05:26 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/18 02:57 PM |
Knights Landing CPU Speculation | Urban Novak | 2013/11/19 12:49 AM |
Knights Landing CPU Speculation | none | 2013/11/19 01:19 AM |
Knights Landing CPU Speculation | Eric | 2013/11/19 07:48 PM |
Total GPGPU/Xeon Phi market maybe ~ $500M/year ... | Mark Roulo | 2013/11/20 10:35 AM |
Knights Landing CPU Speculation | Wes Felter | 2013/11/19 12:06 PM |
Knights Landing CPU Speculation | Michael S | 2013/11/19 12:49 PM |
Knights Landing CPU Speculation | Eric | 2013/11/18 12:17 PM |
Knights Landing CPU Speculation | Daniel | 2013/11/19 02:28 AM |
Knights Landing CPU Speculation | Eric | 2013/11/19 07:36 PM |
HPC guys score FLOPS non-obviously | Mark Roulo | 2013/11/20 10:43 AM |
3-TFlops-DGEMM | Michael S | 2013/11/20 10:59 AM |
3-TFlops-DGEMM | Mark Roulo | 2013/11/20 12:22 PM |
3-TFlops-DGEMM | Daniel | 2013/11/20 01:04 PM |
3-TFlops-DGEMM | Eric | 2013/11/21 01:28 AM |
3-TFlops-DGEMM | Michael S | 2013/11/21 05:48 AM |
3-TFlops-DGEMM | RecessionCone | 2013/11/21 11:13 AM |
3-TFlops-DGEMM | Michael S | 2013/11/21 02:34 PM |
3-TFlops-DGEMM | Eric | 2013/11/22 02:10 AM |
3-TFlops-DGEMM | Michael S | 2013/11/22 04:41 AM |
A (not very sensible) alternative: FMADD + FADD | Paul A. Clayton | 2013/11/22 08:19 AM |
3-TFlops-DGEMM | Sylvain Collange | 2013/11/24 02:37 AM |
3-TFlops-DGEMM | Michael S | 2013/11/24 06:06 AM |
3-TFlops-DGEMM | Sylvain Collange | 2013/11/24 09:28 AM |
HPC guys score FLOPS non-obviously | Patrick Chase | 2013/11/23 02:58 PM |
Knights Landing CPU Speculation | Paul Caheny | 2013/11/18 01:25 PM |
Knights Landing CPU Speculation | Konrad Schwarz | 2013/11/19 12:24 AM |
Knights Landing CPU Speculation | Amiba Gelos | 2013/11/19 07:36 PM |
Knights Landing CPU Speculation | David Kanter | 2013/11/20 09:52 AM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/21 02:12 PM |
Knights Landing CPU Speculation | Amiba Gelos | 2013/11/21 05:14 PM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/23 03:33 PM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/25 11:29 AM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/25 12:05 PM |
Knights Landing CPU Speculation | Patrick Chase | 2013/11/25 12:22 PM |
Knights Landing CPU Speculation | Linus Torvalds | 2013/11/26 10:11 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 03:05 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 03:15 AM |
Knights Landing CPU Speculation | none | 2013/11/26 03:33 AM |
Knights Landing CPU Speculation | Eric | 2013/11/26 06:30 PM |
Knights Landing CPU Speculation | Eric | 2013/11/26 06:34 PM |
What is MCDRAM? | anon | 2013/11/26 08:58 PM |
What is MCDRAM? | none | 2013/11/27 01:00 AM |
What is MCDRAM? | Klimax | 2013/11/27 02:19 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/26 11:06 PM |
Knights Landing CPU Speculation | Klimax | 2013/11/26 11:05 PM |
Knights Landing CPU Speculation | anon | 2013/11/26 05:53 AM |
Knights Landing CPU Speculation | none | 2013/11/26 06:20 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/26 08:06 AM |
Knights Landing CPU Speculation | none | 2013/11/26 09:18 AM |
Knights Landing CPU Speculation | Eric Bron | 2013/11/26 01:21 PM |
Knights Landing CPU Speculation | Eric Bron | 2013/11/26 01:27 PM |
Knights Landing CPU Speculation | none | 2013/11/26 02:26 PM |
Knights Landing CPU Speculation | anon | 2013/11/26 05:42 PM |
Knights Landing CPU Speculation | none | 2013/11/27 01:08 AM |
Knights Landing CPU Speculation | anon | 2013/11/27 01:50 AM |
Knights Landing CPU Speculation | none | 2013/11/27 01:58 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 01:25 AM |
Knights Landing CPU Speculation | anon | 2013/11/27 02:32 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 03:08 AM |
Knights Landing CPU Speculation | Chung Leong | 2013/11/27 01:28 AM |
Knights Landing CPU Speculation | Michael S | 2013/11/27 02:53 AM |
Knights Landing CPU Speculation | Chung Leong | 2013/11/27 01:03 PM |
BiG.LiTTLe for KNL? | Jeff K | 2013/11/22 06:17 AM |
BiG.LiTTLe for KNL? | Patrick Chase | 2013/11/23 02:54 PM |
BiG.LiTTLe for KNL? | Patrick Chase | 2013/11/23 03:01 PM |
Transactional memory | Patrick Chase | 2013/11/23 02:37 PM |
Transactional memory | Bhima | 2013/11/25 07:01 AM |
Transactional memory | Patrick Chase | 2013/11/25 11:52 AM |
Knights Landing CPU Speculation | Daniel | 2013/11/25 02:17 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/25 03:12 AM |
Knights Landing CPU Speculation | none | 2013/11/25 04:05 AM |
Knights Landing CPU Speculation | Klimax | 2013/11/25 04:45 AM |
Knights Landing CPU Speculation | none | 2013/11/25 04:55 AM |
Knights Landing CPU Speculation | gmb | 2013/11/25 07:21 AM |