AMD Ryzen inst lat and tput

By: Per Hesselgren (perhesselgren.delete@this.yahoo.se), March 18, 2017 7:50 am
Room: Moderated Discussions
Per Hesselgren (perhesselgren.delete@this.yahoo.se) on March 7, 2017 3:19 pm wrote:
> Per Hesselgren (perhesselgren.delete@this.yahoo.se) on March 7, 2017 1:48 pm wrote:
> > Travis (travis.downs.delete@this.gmail.com) on March 7, 2017 1:01 pm wrote:
> > > Per Hesselgren (perhesselgren.delete@this.yahoo.se) on March 6, 2017 11:00 pm wrote:
> > > > Ryzen lat and tput
> > > >
> > >
> > > > I wanted to add that here we have our first evidence of a 4
> > > > wide x86 chip: certain zeroing idioms, register-register
> > > > moves, some nops and LFENCE all show up as having an inverse throughput of 0.20c, i.e., 5/cycle:
> > >
> > >
> > > 3 X86 : 3x 0x66 NOP L: [no true dep.] T: 0.06ns= 0.20c
> > > 22 X86 :MOV r32, r32 L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > > 23 AMD64 :MOV r64, r64 L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > > 772 SSE2 :LFENCE L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > > 2250 X86 :MOV r1_32, r2_32 L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > > 2251 AMD64 :MOV r1_64, r2_64 L: 0.06ns= 0.2c T: 0.06ns= 0.20c
> > >
> > >
> > > > Of course, we didn't expect this in this test for any instructions that actually execute, since this
> > > > test is using one instruction at a time and we know there are only 4 scalar ALUs, 4 vector ALUs,
> > > > 2 load units, etc - so you need a mix of instruction types to get 4. At least it shows though that
> > > > the earlier parts of the pipeline don't have a hard limit at 4 like current Intel does (despite not
> > > > executing, nops, zeroing idioms and reg-reg moves are still all capped at 4 on Intel).
> > >
> > > LFENCE is interesting. Perhaps Zen's memory pipeline is
> > > such that this is just a no-op on their architecture.
> > > For example, maybe they apply the stronger consistency model to operations that are documented not to
> > > need it, such as the non-temporal ops: that's the only place you need an LFENCE anyway since for normal
> > > memory access loads are already documented not to pass loads. On Intel this is definitely not a no-op:
> > > it has a throughput of 1 per 4-5 cycles. So an AMD LFENCE is fully 20 times faster, heh.
> >
> > I have no idea if people are interested in RDRAND and RDSEED.
> > It is slow on AMD Ryzen 1224 clocks for 16 and 32 bit and 2401 clocks for 64 bit
> > It was very fast on Ivy Bridge but slower on later Intel CPUs.
>
> Slow is a relative property- Excavator has RDRAND 2936 clocks for 16/32 and 7193 for 64 bit.
> Kaby Lake has 473 clocks for all.

32-bit RDRAND Monte Carlo test for Ryzen 1700:
Generating 200 Million random numbers with RDRAND.
That'll take a little while ...

Area = 0.249963490000000
Absolute Error = 0.000036510000000
Elapsed Time = 145.19 Seconds

Generating 200 Million random numbers with C.
That'll take a little while ...

Area = 0.250044445000000
Absolute Error = 0.000044445000000
Elapsed Time = 5.58 Seconds
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Is Ryzen 6-wide?Travis03/03/17 06:27 PM
  Is Ryzen 6-wide?Maynard Handley03/03/17 06:50 PM
  AMD describes it as 6 wideVertexMaster03/03/17 11:27 PM
    AMD describes it as 6 wideExophase03/04/17 12:21 PM
      AMD describes it as 6 wideanon03/04/17 12:54 PM
        AMD describes it as 6 wideTravis03/04/17 01:12 PM
          AMD describes it as 6 wideTravis03/04/17 02:05 PM
            AMD describes it as 6 wideExophase03/04/17 05:16 PM
              AMD describes it as 6 wideMichael S03/05/17 10:27 AM
              AMD describes it as 6 wideTravis03/05/17 12:45 PM
                AMD Ryzen inst lat and tputPer Hesselgren03/07/17 12:00 AM
                  AMD Ryzen inst lat and tputTravis03/07/17 01:50 PM
                    AMD Ryzen inst lat and tputGabriele Svelto03/08/17 04:37 AM
                      AMD Ryzen inst lat and tputTravis03/08/17 12:03 PM
                  AMD Ryzen inst lat and tputTravis03/07/17 02:01 PM
                    Sorry, failed to properly close code tag :( (NT)Travis03/07/17 02:02 PM
                    AMD Ryzen inst lat and tputPer Hesselgren03/07/17 02:48 PM
                      AMD Ryzen inst lat and tputTravis03/07/17 03:16 PM
                      AMD Ryzen inst lat and tputPer Hesselgren03/07/17 03:19 PM
                        AMD Ryzen inst lat and tputPer Hesselgren03/18/17 07:50 AM
                          AMD Ryzen inst lat and tputPer Hesselgren03/18/17 08:05 AM
                            AMD Ryzen inst lat and tput-7zipPer Hesselgren03/25/17 03:48 AM
                  FMA tput looks like a mistakeMichael S03/12/17 03:24 AM
                    FMA tput looks like a mistakeGian-Carlo Pascutto03/14/17 01:27 AM
                      FMA tput looks like a mistakeMichael S03/14/17 03:39 AM
                        FMA tput looks like a mistakeGian-Carlo Pascutto03/14/17 07:36 AM
                          FMA tput looks like a mistakeMichael S03/14/17 08:45 AM
                      FMA tput looks like a mistakeEmil Briggs03/14/17 04:24 PM
                        FMA tput looks like a mistakeGian-Carlo Pascutto03/15/17 01:32 AM
        AMD describes it as 6 wideBrett03/04/17 01:39 PM
      AMD describes it as 6 wideTravis03/04/17 01:06 PM
        FP & Int pipelines, AVX-256VertexMaster03/04/17 02:09 PM
          FP & Int pipelines, AVX-256Travis03/04/17 03:26 PM
          No Ryzen BKDG yetg c03/19/17 07:43 AM
            No Ryzen BKDG yetGroo03/20/17 02:02 PM
              No Ryzen BKDG & revision guide yetBrendan03/22/17 12:37 AM
                No Ryzen BKDG & revision guide yetmuziqaz03/22/17 03:26 AM
                No Ryzen BKDG & revision guide yetGroo03/22/17 06:54 AM
                  No Ryzen BKDG & revision guide yetAdrian03/22/17 07:54 AM
                  No Ryzen BKDG & revision guide yetblue03/22/17 07:58 AM
                    FMA3 bug possibly connected to power managementhobold03/22/17 09:14 AM
                      FMA3 bug possibly connected to power managementMatthias Waldhauer03/28/17 07:20 AM
                        FMA3 bug only in WindowsPer Hesselgren04/05/17 06:29 AM
                    No Ryzen BKDG & revision guide yetGroo03/23/17 06:50 AM
                  No Ryzen BKDG & revision guide yetBrendan03/23/17 11:31 PM
                    No Ryzen BKDG & revision guide yetMatthias Waldhauer03/24/17 06:08 PM
        AMD describes it as 6 wideDavid Kanter03/04/17 04:19 PM
          AMD describes it as 6 wide (how to test this?)Domaldel03/22/17 09:58 AM
            AMD describes it as 6 wide (how to test this?)Travis03/22/17 03:17 PM
              AMD describes it as 6 wide (how to test this?)Domaldel03/23/17 12:49 AM
        AMD describes it as 6 wideanon03/04/17 05:16 PM
          AMD describes it as 6 wideTravis03/05/17 12:55 PM
            AMD describes it as 6 wideDavid Kanter03/05/17 06:31 PM
              AMD describes it as 6 wideTravis03/05/17 07:23 PM
                AMD describes it as 6 wideDavid Kanter03/05/17 10:00 PM
                  AMD describes it as 6 wideTravis03/06/17 11:15 AM
                    AMD describes it as 6 wideRobert David Graham03/06/17 11:53 AM
                      AMD describes it as 6 wideTravis03/06/17 04:09 PM
                    AMD describes it as 6 wideTIm McCaffrey03/06/17 02:56 PM
                  AMD describes it as 6 wideDomaldel03/08/17 04:36 PM
                1800x system availablemuziqaz03/13/17 09:22 AM
                  1800x system availablePer Hesselgren03/16/17 03:14 AM
                    1800x system availablemuziqaz03/17/17 03:53 AM
                      1800x system availableAdrian03/17/17 07:25 AM
                        1800x system availableAdrian03/17/17 08:00 AM
                          1800x system availablemuziqaz03/17/17 11:48 AM
                          1800x system availablemuziqaz03/17/17 01:42 PM
                            1800x system availableAdrian03/17/17 02:42 PM
                              1800x system availablemuziqaz03/17/17 03:07 PM
                                1800x system availableAdrian03/17/17 03:18 PM
                            1800x system availableAdrian03/17/17 03:07 PM
                            1800x system availableDomaldel03/18/17 06:40 AM
                              1800x system availablemuziqaz03/18/17 06:54 AM
                                1800x system availablemuziqaz03/18/17 02:28 PM
                                  1800x system availablePer Hesselgren03/21/17 12:30 AM
                                    1800x system availablePer Hesselgren03/21/17 02:58 AM
                                    1800x system availableGian-Carlo Pascutto03/21/17 12:25 PM
                                      1800x system availableMichael_S03/21/17 02:24 PM
                                        1800x system availableGian-Carlo Pascutto03/21/17 02:56 PM
                                          1800x system availableMichael S03/21/17 03:36 PM
                                            done (NT)Michael S03/22/17 03:59 AM
                                      1800x system availableanonymou503/21/17 03:27 PM
                                        1800x system availableWilco03/22/17 06:17 PM
                                          1800x system availableDomaldel03/23/17 12:52 AM
                                          1800x system availableanonymou503/23/17 12:24 PM
                                            1800x system availableKlimax04/03/17 01:35 AM
                                    1800x system availablemuziqaz03/22/17 03:25 AM
                                      1800x system availablemuziqaz03/24/17 03:24 AM
                              1800x system availableAdrian03/18/17 08:36 AM
                    1800x system availablePer Hesselgren03/17/17 08:49 AM
                      1800x system available-16TPer Hesselgren03/17/17 01:10 PM
                      1800x system availableMichael S03/18/17 10:21 AM
                        1800x system availablePer Hesselgren03/19/17 03:06 AM
                          better test desired. This one is not pushing FPU to the limitsMichael S03/19/17 04:28 AM
                            better test desired. This one is not pushing FPU to the limitsPer Hesselgren03/19/17 08:49 AM
                              better test desired. This one is not pushing FPU to the limitsPer Hesselgren03/19/17 10:47 AM
                                better test desired. This one is not pushing FPU to the limitsMichael S03/19/17 11:45 AM
                                  better test desired. This one is not pushing FPU to the limitsMichael S03/19/17 11:47 AM
        AMD describes it as 6 widePoindexter03/05/17 07:55 PM
          AMD describes it as 6 wideTravis03/05/17 08:05 PM
            AMD describes it as 6 wideDavid Kanter03/05/17 10:03 PM
              AMD describes it as 6 wideTravis03/06/17 11:19 AM
                AMD describes it as 6 widePoindexter03/07/17 08:54 AM
                  AMD describes it as 6 wideTravis03/07/17 01:54 PM
                  AMD describes it as 6 wideanon03/08/17 04:19 AM
                    AMD describes it as 6 wideTravis03/08/17 12:35 PM
                      AMD describes it as 6 wideanon03/08/17 02:48 PM
                        AMD describes it as 6 wideTravis03/09/17 12:02 PM
                          AMD describes it as 6 wideanon03/09/17 05:44 PM
                            AMD describes it as 6 wideanon03/12/17 08:28 AM
                            AMD describes it as 6 wideTravis06/20/17 04:48 PM
                              AMD describes it as 6 wideAdrian06/21/17 07:48 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?