AMD Ryzen inst lat and tput

By: Per Hesselgren (perhesselgren.delete@this.yahoo.se), March 18, 2017 7:50 am
Room: Moderated Discussions
Per Hesselgren (perhesselgren.delete@this.yahoo.se) on March 7, 2017 3:19 pm wrote:
> Per Hesselgren (perhesselgren.delete@this.yahoo.se) on March 7, 2017 1:48 pm wrote:
> > Travis (travis.downs.delete@this.gmail.com) on March 7, 2017 1:01 pm wrote:
> > > Per Hesselgren (perhesselgren.delete@this.yahoo.se) on March 6, 2017 11:00 pm wrote:
> > > > Ryzen lat and tput
> > > >
> > >
> > > > I wanted to add that here we have our first evidence of a 4
> > > > wide x86 chip: certain zeroing idioms, register-register
> > > > moves, some nops and LFENCE all show up as having an inverse throughput of 0.20c, i.e., 5/cycle:
> > >
> > >
> > > 3 X86 : 3x 0x66 NOP L: [no true dep.] T: 0.06ns= 0.20c
> > > 22 X86 :MOV r32, r32 L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > > 23 AMD64 :MOV r64, r64 L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > > 772 SSE2 :LFENCE L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > > 2250 X86 :MOV r1_32, r2_32 L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > > 2251 AMD64 :MOV r1_64, r2_64 L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > >
> > >
> > > > Of course, we didn't expect this in this test for any instructions that actually execute, since this
> > > > test is using one instruction at a time and we know there are only 4 scalar ALUs, 4 vector ALUs,
> > > > 2 load units, etc - so you need a mix of instruction types to get 4. At least it shows though that
> > > > the earlier parts of the pipeline don't have a hard limit at 4 like current Intel does (despite not
> > > > executing, nops, zeroing idioms and reg-reg moves are still all capped at 4 on Intel).
> > >
> > > LFENCE is interesting. Perhaps Zen's memory pipeline is
> > > such that this is just a no-op on their architecture.
> > > For example, maybe they apply the stronger consistency model to operations that are documented not to
> > > need it, such as the non-temporal ops: that's the only place you need an LFENCE anyway since for normal
> > > memory access loads are already documented not to pass loads. On Intel this is definitely not a no-op:
> > > it has a throughput of 1 per 4-5 cycles. So an AMD LFENCE is fully 20 times faster, heh.
> >
> > I have no idea if people are interested in RDRAND and RDSEED.
> > It is slow on AMD Ryzen 1224 clocks for 16 and 32 bit and 2401 clocks for 64 bit
> > It was very fast on Ivy Bridge but slower on later Intel CPUs.
>
> Slow is a relative property- Excavator has RDRAND 2936 clocks for 16/32 and 7193 for 64 bit.
> Kaby Lake has 473 clocks for all.

32-bit RDRAND Monte Carlo test for Ryzen 1700:
Generating 200 Million random numbers with RDRAND.
That'll take a little while ...

Area = 0.249963490000000
Absolute Error = 0.000036510000000
Elapsed Time = 145.19 Seconds

Generating 200 Million random numbers with C.
That'll take a little while ...

Area = 0.250044445000000
Absolute Error = 0.000044445000000
Elapsed Time = 5.58 Seconds
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Is Ryzen 6-wide?Travis2017/03/03 06:27 PM
  Is Ryzen 6-wide?Maynard Handley2017/03/03 06:50 PM
  AMD describes it as 6 wideVertexMaster2017/03/03 11:27 PM
    AMD describes it as 6 wideExophase2017/03/04 12:21 PM
      AMD describes it as 6 wideanon2017/03/04 12:54 PM
        AMD describes it as 6 wideTravis2017/03/04 01:12 PM
          AMD describes it as 6 wideTravis2017/03/04 02:05 PM
            AMD describes it as 6 wideExophase2017/03/04 05:16 PM
              AMD describes it as 6 wideMichael S2017/03/05 10:27 AM
              AMD describes it as 6 wideTravis2017/03/05 12:45 PM
                AMD Ryzen inst lat and tputPer Hesselgren2017/03/07 12:00 AM
                  AMD Ryzen inst lat and tputTravis2017/03/07 01:50 PM
                    AMD Ryzen inst lat and tputGabriele Svelto2017/03/08 04:37 AM
                      AMD Ryzen inst lat and tputTravis2017/03/08 12:03 PM
                  AMD Ryzen inst lat and tputTravis2017/03/07 02:01 PM
                    Sorry, failed to properly close code tag :( (NT)Travis2017/03/07 02:02 PM
                    AMD Ryzen inst lat and tputPer Hesselgren2017/03/07 02:48 PM
                      AMD Ryzen inst lat and tputTravis2017/03/07 03:16 PM
                      AMD Ryzen inst lat and tputPer Hesselgren2017/03/07 03:19 PM
                        AMD Ryzen inst lat and tputPer Hesselgren2017/03/18 07:50 AM
                          AMD Ryzen inst lat and tputPer Hesselgren2017/03/18 08:05 AM
                            AMD Ryzen inst lat and tput-7zipPer Hesselgren2017/03/25 03:48 AM
                  FMA tput looks like a mistakeMichael S2017/03/12 03:24 AM
                    FMA tput looks like a mistakeGian-Carlo Pascutto2017/03/14 01:27 AM
                      FMA tput looks like a mistakeMichael S2017/03/14 03:39 AM
                        FMA tput looks like a mistakeGian-Carlo Pascutto2017/03/14 07:36 AM
                          FMA tput looks like a mistakeMichael S2017/03/14 08:45 AM
                      FMA tput looks like a mistakeEmil Briggs2017/03/14 04:24 PM
                        FMA tput looks like a mistakeGian-Carlo Pascutto2017/03/15 01:32 AM
        AMD describes it as 6 wideBrett2017/03/04 01:39 PM
      AMD describes it as 6 wideTravis2017/03/04 01:06 PM
        FP & Int pipelines, AVX-256VertexMaster2017/03/04 02:09 PM
          FP & Int pipelines, AVX-256Travis2017/03/04 03:26 PM
          No Ryzen BKDG yetg c2017/03/19 07:43 AM
            No Ryzen BKDG yetGroo2017/03/20 02:02 PM
              No Ryzen BKDG & revision guide yetBrendan2017/03/22 12:37 AM
                No Ryzen BKDG & revision guide yetmuziqaz2017/03/22 03:26 AM
                No Ryzen BKDG & revision guide yetGroo2017/03/22 06:54 AM
                  No Ryzen BKDG & revision guide yetAdrian2017/03/22 07:54 AM
                  No Ryzen BKDG & revision guide yetblue2017/03/22 07:58 AM
                    FMA3 bug possibly connected to power managementhobold2017/03/22 09:14 AM
                      FMA3 bug possibly connected to power managementMatthias Waldhauer2017/03/28 07:20 AM
                        FMA3 bug only in WindowsPer Hesselgren2017/04/05 06:29 AM
                    No Ryzen BKDG & revision guide yetGroo2017/03/23 06:50 AM
                  No Ryzen BKDG & revision guide yetBrendan2017/03/23 11:31 PM
                    No Ryzen BKDG & revision guide yetMatthias Waldhauer2017/03/24 06:08 PM
        AMD describes it as 6 wideDavid Kanter2017/03/04 04:19 PM
          AMD describes it as 6 wide (how to test this?)Domaldel2017/03/22 09:58 AM
            AMD describes it as 6 wide (how to test this?)Travis2017/03/22 03:17 PM
              AMD describes it as 6 wide (how to test this?)Domaldel2017/03/23 12:49 AM
        AMD describes it as 6 wideanon2017/03/04 05:16 PM
          AMD describes it as 6 wideTravis2017/03/05 12:55 PM
            AMD describes it as 6 wideDavid Kanter2017/03/05 06:31 PM
              AMD describes it as 6 wideTravis2017/03/05 07:23 PM
                AMD describes it as 6 wideDavid Kanter2017/03/05 10:00 PM
                  AMD describes it as 6 wideTravis2017/03/06 11:15 AM
                    AMD describes it as 6 wideRobert David Graham2017/03/06 11:53 AM
                      AMD describes it as 6 wideTravis2017/03/06 04:09 PM
                    AMD describes it as 6 wideTIm McCaffrey2017/03/06 02:56 PM
                  AMD describes it as 6 wideDomaldel2017/03/08 04:36 PM
                1800x system availablemuziqaz2017/03/13 09:22 AM
                  1800x system availablePer Hesselgren2017/03/16 03:14 AM
                    1800x system availablemuziqaz2017/03/17 03:53 AM
                      1800x system availableAdrian2017/03/17 07:25 AM
                        1800x system availableAdrian2017/03/17 08:00 AM
                          1800x system availablemuziqaz2017/03/17 11:48 AM
                          1800x system availablemuziqaz2017/03/17 01:42 PM
                            1800x system availableAdrian2017/03/17 02:42 PM
                              1800x system availablemuziqaz2017/03/17 03:07 PM
                                1800x system availableAdrian2017/03/17 03:18 PM
                            1800x system availableAdrian2017/03/17 03:07 PM
                            1800x system availableDomaldel2017/03/18 06:40 AM
                              1800x system availablemuziqaz2017/03/18 06:54 AM
                                1800x system availablemuziqaz2017/03/18 02:28 PM
                                  1800x system availablePer Hesselgren2017/03/21 12:30 AM
                                    1800x system availablePer Hesselgren2017/03/21 02:58 AM
                                    1800x system availableGian-Carlo Pascutto2017/03/21 12:25 PM
                                      1800x system availableMichael_S2017/03/21 02:24 PM
                                        1800x system availableGian-Carlo Pascutto2017/03/21 02:56 PM
                                          1800x system availableMichael S2017/03/21 03:36 PM
                                            done (NT)Michael S2017/03/22 03:59 AM
                                      1800x system availableanonymou52017/03/21 03:27 PM
                                        1800x system availableWilco2017/03/22 06:17 PM
                                          1800x system availableDomaldel2017/03/23 12:52 AM
                                          1800x system availableanonymou52017/03/23 12:24 PM
                                            1800x system availableKlimax2017/04/03 01:35 AM
                                    1800x system availablemuziqaz2017/03/22 03:25 AM
                                      1800x system availablemuziqaz2017/03/24 03:24 AM
                              1800x system availableAdrian2017/03/18 08:36 AM
                    1800x system availablePer Hesselgren2017/03/17 08:49 AM
                      1800x system available-16TPer Hesselgren2017/03/17 01:10 PM
                      1800x system availableMichael S2017/03/18 10:21 AM
                        1800x system availablePer Hesselgren2017/03/19 03:06 AM
                          better test desired. This one is not pushing FPU to the limitsMichael S2017/03/19 04:28 AM
                            better test desired. This one is not pushing FPU to the limitsPer Hesselgren2017/03/19 08:49 AM
                              better test desired. This one is not pushing FPU to the limitsPer Hesselgren2017/03/19 10:47 AM
                                better test desired. This one is not pushing FPU to the limitsMichael S2017/03/19 11:45 AM
                                  better test desired. This one is not pushing FPU to the limitsMichael S2017/03/19 11:47 AM
        AMD describes it as 6 widePoindexter2017/03/05 07:55 PM
          AMD describes it as 6 wideTravis2017/03/05 08:05 PM
            AMD describes it as 6 wideDavid Kanter2017/03/05 10:03 PM
              AMD describes it as 6 wideTravis2017/03/06 11:19 AM
                AMD describes it as 6 widePoindexter2017/03/07 08:54 AM
                  AMD describes it as 6 wideTravis2017/03/07 01:54 PM
                  AMD describes it as 6 wideanon2017/03/08 04:19 AM
                    AMD describes it as 6 wideTravis2017/03/08 12:35 PM
                      AMD describes it as 6 wideanon2017/03/08 02:48 PM
                        AMD describes it as 6 wideTravis2017/03/09 12:02 PM
                          AMD describes it as 6 wideanon2017/03/09 05:44 PM
                            AMD describes it as 6 wideanon2017/03/12 08:28 AM
                            AMD describes it as 6 wideTravis2017/06/20 04:48 PM
                              AMD describes it as 6 wideAdrian2017/06/21 07:48 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?