By: anon (anon.delete@this.anon.com), November 16, 2012 3:07 am
Room: Moderated Discussions
Felid (Felid.delete@this.mailinator.com) on November 15, 2012 3:19 pm wrote:
> Try to replace MOVAPS #2 with «xmm1, xmm2» (1-way dependence), and then to «xmm2,
> xmm3» (no dependence). To remove possible port issue bottleneck, also worth to test
> with GPR's, but not on 8- or 16-bit ones :) This'll give more info on the work logic.
Some measurements:
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
dec ecx
jnz loop ; 2 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm3, xmm2
dec ecx
jnz loop ; 2.1-2.2 clk/loop - unstable
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
movaps xmm0, xmm2
dec ecx
jnz loop ; 3 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
movaps xmm1, xmm0
movaps xmm2, xmm1
dec ecx
jnz loop ; 3 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
movaps xmm3, xmm2
movaps xmm0, xmm3
dec ecx
jnz loop ; 3.33 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
movaps xmm3, xmm2
movaps xmm4, xmm3
dec ecx
jnz loop ; 3.5 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm3, xmm2
movaps xmm5, xmm4
movaps xmm7, xmm6
dec ecx
jnz loop ; 3.67 clk/loop
loop:
mov edx, eax
mov eax, edx
dec ecx
jnz loop ; 1 clk/loop
loop:
mov edx, eax
mov edi, edx
mov eax, edi
dec ecx
jnz loop ; 2 clk/loop
loop:
mov edx, eax
mov edi, edx
mov esi, edi
mov eax, esi
dec ecx
jnz loop ; 2.33 clk/loop
> Try to replace MOVAPS #2 with «xmm1, xmm2» (1-way dependence), and then to «xmm2,
> xmm3» (no dependence). To remove possible port issue bottleneck, also worth to test
> with GPR's, but not on 8- or 16-bit ones :) This'll give more info on the work logic.
Some measurements:
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
dec ecx
jnz loop ; 2 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm3, xmm2
dec ecx
jnz loop ; 2.1-2.2 clk/loop - unstable
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
movaps xmm0, xmm2
dec ecx
jnz loop ; 3 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
movaps xmm1, xmm0
movaps xmm2, xmm1
dec ecx
jnz loop ; 3 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
movaps xmm3, xmm2
movaps xmm0, xmm3
dec ecx
jnz loop ; 3.33 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm2, xmm1
movaps xmm3, xmm2
movaps xmm4, xmm3
dec ecx
jnz loop ; 3.5 clk/loop
loop:
movaps xmm1, xmm0
movaps xmm3, xmm2
movaps xmm5, xmm4
movaps xmm7, xmm6
dec ecx
jnz loop ; 3.67 clk/loop
loop:
mov edx, eax
mov eax, edx
dec ecx
jnz loop ; 1 clk/loop
loop:
mov edx, eax
mov edi, edx
mov eax, edi
dec ecx
jnz loop ; 2 clk/loop
loop:
mov edx, eax
mov edi, edx
mov esi, edi
mov eax, esi
dec ecx
jnz loop ; 2.33 clk/loop
Topic | Posted By | Date |
---|---|---|
Haswell CPU article online | David Kanter | 2012/11/13 02:43 PM |
Haswell CPU article online | Eric | 2012/11/13 03:10 PM |
Haswell CPU article online | hobold | 2012/11/13 04:13 PM |
Haswell CPU article online | Ricardo B | 2012/11/13 05:09 PM |
Haswell CPU article online | anonymou5 | 2012/11/13 04:44 PM |
Haswell CPU article online | none | 2012/11/14 02:40 AM |
Haswell CPU article online | tarlinian | 2012/11/13 03:56 PM |
Fixed (NT) | David Kanter | 2012/11/13 05:06 PM |
Haswell CPU article online | Jacob Marley | 2012/11/14 01:18 AM |
Haswell CPU article online | randomshinichi | 2012/11/14 01:53 AM |
LLC == Last Level Cache (usually L3) (NT) | Paul A. Clayton | 2012/11/14 04:50 AM |
Haswell CPU article online | Joe | 2012/11/14 09:38 AM |
LLC vs. L3 vs. L4 | David Kanter | 2012/11/14 10:09 AM |
LLC vs. L3 vs. L4; LLC = Link Layer Controller | Ray | 2012/11/14 09:08 PM |
A pit there are only 17000 TLAs... (NT) | EduardoS | 2012/11/15 02:14 AM |
Haswell CPU article online | anon | 2012/11/14 04:10 AM |
Move elimination can be a µop fusion | Paul A. Clayton | 2012/11/14 05:41 AM |
That should be "mov R10 <- R9"! (NT) | Paul A. Clayton | 2012/11/14 05:43 AM |
Move elimination can be a µop fusion | anon | 2012/11/14 06:25 AM |
It does avoid the scheduler (NT) | Paul A. Clayton | 2012/11/14 07:47 AM |
Move elimination can be a µop fusion | Stubabe | 2012/11/14 12:43 PM |
Move elimination can be a µop fusion | anon | 2012/11/14 08:33 PM |
Move elimination can be a µop fusion | Felid | 2012/11/14 11:49 PM |
Move elimination can be a µop fusion | anon | 2012/11/15 12:23 AM |
Move elimination can be a µop fusion | Stuart | 2012/11/15 04:04 AM |
Move elimination can be a µop fusion | Stubabe | 2012/11/15 04:14 AM |
Move elimination can be a µop fusion | anon | 2012/11/15 04:48 AM |
Move elimination can be a µop fusion | EduardoS | 2012/11/15 05:00 AM |
Move elimination can be a µop fusion | anon | 2012/11/15 05:14 AM |
Move elimination can be a µop fusion | EduardoS | 2012/11/15 05:21 AM |
Move elimination can be a µop fusion | anon | 2012/11/15 05:31 AM |
Move elimination can be a µop fusion | Stubabe | 2012/11/15 10:38 AM |
There can be only one dependence | Paul A. Clayton | 2012/11/15 11:50 AM |
Move elimination can be a µop fusion | Felid | 2012/11/15 02:19 PM |
Move elimination can be a µop fusion | anon | 2012/11/16 03:07 AM |
Move elimination can be a µop fusion | Felid | 2012/11/16 06:43 PM |
Move elimination can be a µop fusion | Felid | 2012/11/15 01:50 PM |
Move elimination can be a µop fusion | Felid | 2012/11/15 02:03 PM |
Correction! | Felid | 2012/11/19 12:23 AM |
Thanks, I wasn't aware of the change in SB. Good to know... (NT) | Stubabe | 2012/11/15 02:43 PM |
Move fusion assumes adjacency | Paul A. Clayton | 2012/11/15 06:15 AM |
Move fusion assumes adjacency | Felid | 2012/11/15 01:40 PM |
Move elimination can be a µop fusion | Patrick Chase | 2012/11/21 10:52 AM |
Move elimination can be a µop fusion | Patrick Chase | 2012/11/21 11:12 AM |
Haswell CPU article online | Ricardo B | 2012/11/14 08:12 AM |
Haswell CPU article online | gmb | 2012/11/14 07:28 AM |
Haswell CPU article online | Felid | 2012/11/14 10:58 PM |
Haswell CPU article online | David Kanter | 2012/11/15 08:59 AM |
Haswell CPU article online | Felid | 2012/11/15 01:15 PM |
Instruction queue | David Kanter | 2012/11/16 11:23 AM |
Instruction queue | Felid | 2012/11/16 12:05 PM |
128-bit division unit? | Eric Bron | 2012/11/16 03:57 AM |
128-bit division unit? | David Kanter | 2012/11/16 07:59 AM |
128-bit division unit? | Eric Bron | 2012/11/16 08:47 AM |
128-bit division unit? | Felid | 2012/11/16 11:46 AM |
128-bit division unit? | Eric Bron | 2012/11/16 12:24 PM |
128-bit division unit? | Felid | 2012/11/16 06:19 PM |
128-bit division unit? | Eric Bron | 2012/11/18 07:41 AM |
128-bit division unit? | Michael S | 2012/11/17 11:50 AM |
128-bit division unit? | Felid | 2012/11/17 12:44 PM |
128-bit division unit? | Michael S | 2012/11/17 01:45 PM |
128-bit division unit? | Felid | 2012/11/17 04:49 PM |
128-bit division unit? | Michael S | 2012/11/17 05:56 PM |
128-bit division unit? | Eric Bron | 2012/11/18 07:35 AM |
Haswell CPU article online | Jim F | 2012/11/18 08:45 AM |
Haswell CPU article online | Gabriele Svelto | 2012/11/18 11:52 AM |
Probable bottleneck | Laurent Birtz | 2012/11/23 12:45 PM |
Probable bottleneck | EduardoS | 2012/11/23 12:58 PM |
Probable bottleneck | Laurent Birtz | 2012/11/24 09:10 AM |
Probable bottleneck | Stubabe | 2012/11/25 02:08 AM |
Probable bottleneck | EduardoS | 2012/11/25 07:15 AM |
Probable bottleneck | Stubabe | 2012/11/28 03:36 PM |
Urgh. Post got mangled by LESS THAN sign | Stubabe | 2012/11/28 03:41 PM |
Probable bottleneck | Laurent Birtz | 2012/11/29 07:34 AM |
Haswell CPU article online | Mr. Camel | 2012/11/28 02:47 PM |
Haswell CPU article online | EduardoS | 2012/11/28 03:06 PM |
Haswell CPU article online | Mr. Camel | 2012/11/28 06:23 PM |
Haswell CPU article online | EduardoS | 2012/11/28 06:27 PM |
Haswell CPU article online | Mr. Camel | 2012/12/12 12:39 PM |
Much faster iGPU clock ... | Mark Roulo | 2012/12/12 02:53 PM |
Much faster iGPU clock ... | Exophase | 2012/12/12 10:46 PM |
Much faster iGPU clock ... or not :-) | Mark Roulo | 2012/12/13 08:11 AM |
Much faster iGPU clock ... or not :-) | EduardoS | 2012/12/13 09:38 PM |
Much faster iGPU clock ... or not :-) | Michael S | 2012/12/14 04:33 AM |
Much faster iGPU clock ... or not :-) | EduardoS | 2012/12/14 06:06 AM |
Much faster iGPU clock ... or not :-) | Doug S | 2012/12/14 11:13 AM |
Much faster iGPU clock ... or not :-) | EduardoS | 2012/12/14 11:43 AM |
Much faster iGPU clock ... or not :-) | Mr. Camel | 2012/12/14 09:50 AM |
Much faster iGPU clock ... | Michael S | 2012/12/13 01:44 AM |
Much faster iGPU clock ... | Mark Roulo | 2012/12/13 08:09 AM |
Haswell CPU article online | Yang | 2012/12/09 07:28 PM |
possible spam bot? (NT) | I.S.T. | 2012/12/10 02:40 PM |
CPU Crystal Well behavior w/ eGPU? | Robert Williams | 2013/04/17 01:16 PM |
CPU Crystal Well behavior w/ eGPU? | Nicolas Capens | 2013/04/17 02:30 PM |
CPU Crystal Well behavior w/ eGPU? | RecessionCone | 2013/04/17 03:20 PM |
CPU Crystal Well behavior w/ eGPU? | Robert Williams | 2013/04/17 06:37 PM |
CPU Crystal Well behavior w/ eGPU? | Eric Bron | 2013/04/17 08:10 PM |
Haswell CPU article online | Sireesh | 2014/09/01 01:48 PM |
Haswell CPU article online | Maynard Handley | 2014/09/01 02:51 PM |
Great post | David Kanter | 2014/09/01 06:12 PM |
Thanks :) | Alberto | 2014/09/02 12:42 AM |
Thanks (NT) | Poindexter | 2014/09/02 08:31 AM |
Haswell CPU article online | EduardoS | 2014/09/01 03:21 PM |
Haswell CPU article online | Albert | 2015/10/06 12:48 AM |
Haswell CPU article online | Michael S | 2015/10/06 01:10 AM |
Haswell CPU article online | SHK | 2015/10/06 02:51 AM |