1800x system available

By: Per Hesselgren (perhesselgren.delete@this.yahoo.se), March 16, 2017 3:14 am
Room: Moderated Discussions
muziqaz (m.delete@this.gmail.com) on March 13, 2017 9:22 am wrote:
> Travis (travis.downs.delete@this.gmail.com) on March 5, 2017 7:23 pm wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on March 5, 2017 5:31 pm wrote:
> > > Travis (travis.downs.delete@this.gmail.com) on March 5, 2017 11:55 am wrote:
> > > > anon (spam.delete.delete@this.this.spam.com) on March 4, 2017 4:16 pm wrote:
> > > >
> > > > > Fusing cmp + branch like they did on BD might also be possible.
> > > >
> > > > Yeah. I tried to avoid even bringing that in since it is already confusing enough with all the
> > > > different ways of measuring things, and invariably someone will try to add macro-fused branch stuff
> > > > to the calculation. So for now I'm just assuming no branch fusion is occurring, or, equivalently,
> > > > that it is occurring and we just count the pair as one instruction and one (fused) uop.
> > > >
> > > > This reduces the complexity - but of course if Ryzen doesn't do that fusion it has to be noted separately
> > > > too, since it would be an advantage for Intel, separate from the "more generic" width discussion.
> > > >
> > > > > Either way decode is probably slightly weaker than SKL in terms of the raw number of instructions.
> > > > > Bandwidth to the decode queue might be higher though, so possible benefits on more complex instructions.
> > > > >
> > > > > If the uop cache actually uses mops, which should be as powerful as fused
> > > > > uops, if not more, then there's at least parity with SKL here.
> > > > > 4 mops/cycle -> 6 uops/cycle seems not enough to explain
> > > > > the performance with SMT, so I'm leaning towards 6 mops.
> > > > >
> > > > > I don't believe 6 mops / fused uops rename on the integer side is happening. So 6 mops dispatch
> > > > > towards int seems unlikely. 4 mops towards int, with combined load + alu mops splitting into
> > > > > 2 uops to sustain 6 uop/cycle schedule & execute seems much more balanced and realistic. Ideally
> > > > > 4 mops to fp as well, although limited by 6 mops total dispatch. 256bit ops get split into 2
> > > > > uops after rename ideally. Same with FMA. Really not sure about these two though.
> > > > >
> > > > > No idea about how retire slots map to uops/mops/instructions either.
> > > > >
> > > > > So my take on it is that int rename is about equivalent to 4 fused uops, as is fp rename and
> > > > > retire. Combine with the higher latencies (instructions and mov to fp) and lower bandwidth
> > > > > (cache/load/store) the IPC between HSW and SKL in ST makes sense. The rename bottleneck being
> > > > > alleviated as soon as the fp side gets involved would explain why SMT works so well.
> > > > >
> > > > >
> > > > > Sure, some things could be changed to make it beat SKL but that's what Zen2 is for. All trade
> > > > > offs that cost performance seem to be in favour of lower power consumption. Given the efficiency
> > > > > we've seen with a slight process disadvantage you can't really argue with that.
> > > > > It really seems like a reverse Bulldozer. Instead of starting with a good concept
> > > > > and then making all the wrong decision so nothing works well together they started
> > > > > with a good concept and everything actually fits together well.
> > > >
> > > > Yeah...
> > > >
> > > > Why aren't there any review sites that do these kind of microbenchmarks/micro-architectural investigation?
> > > > It's probably a couple hours to throw together the asm, and run it while looking at the timing
> > > > and performance counters. I'd do it in a heartbeat if I had access to a Ryzen box.
> > > >
> > > > No, instead you have 500 sites just pumping out the same basic suite of benchmarks, filled with
> > > > wild speculation about why the numbers are as they are. If they sorted out the microarctecture
> > > > details first, they could be way more informed when running the primary benchmarks...
> > > >
> > > > I guess Agner is the guy who has done it in the past (publicly at least),
> > > > but it could be months (if ever) before we see a new guide.
> > >
> > > I may have a system soon, and if you have stuff written, I'd be happy to run it.
> > >
> > > David
> >
> > I will put something together. Do you prefer a Windows or Linux binary?
> > Or I can just provide a small project and you can compile it.
>
> I have Ryzen sitting in BIOS right now, its just waiting for me to get windows in.
> I could delay windows installation and drop in some sort of Linux distro today.
> Though I have to admit I'm quite rusty with linux, been a long time. So yeah
> if interested, drop me a line on muziqazatgmaildotcom. Will be happy to help

If you have a Ryzen I would recommend you to test this:
http://home.vianetworks.nl/users/mhx/mm.c
This is a single thread matrix multiplication and the alternatives are interesting.
-n is the normal. I have never got the same speed up for AMD and Intel with for example -r.
DN=500 is perhaps too small and 800 could be relevant.
If you prefer 16 threads test some of the Open MP here:
http://people.sc.fsu.edu/~jburkardt/c_src/openmp/openmp.html
Floating point examples will give you some speed up but most of the integer tests are too small.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Is Ryzen 6-wide?Travis03/03/17 06:27 PM
  Is Ryzen 6-wide?Maynard Handley03/03/17 06:50 PM
  AMD describes it as 6 wideVertexMaster03/03/17 11:27 PM
    AMD describes it as 6 wideExophase03/04/17 12:21 PM
      AMD describes it as 6 wideanon03/04/17 12:54 PM
        AMD describes it as 6 wideTravis03/04/17 01:12 PM
          AMD describes it as 6 wideTravis03/04/17 02:05 PM
            AMD describes it as 6 wideExophase03/04/17 05:16 PM
              AMD describes it as 6 wideMichael S03/05/17 10:27 AM
              AMD describes it as 6 wideTravis03/05/17 12:45 PM
                AMD Ryzen inst lat and tputPer Hesselgren03/07/17 12:00 AM
                  AMD Ryzen inst lat and tputTravis03/07/17 01:50 PM
                    AMD Ryzen inst lat and tputGabriele Svelto03/08/17 04:37 AM
                      AMD Ryzen inst lat and tputTravis03/08/17 12:03 PM
                  AMD Ryzen inst lat and tputTravis03/07/17 02:01 PM
                    Sorry, failed to properly close code tag :( (NT)Travis03/07/17 02:02 PM
                    AMD Ryzen inst lat and tputPer Hesselgren03/07/17 02:48 PM
                      AMD Ryzen inst lat and tputTravis03/07/17 03:16 PM
                      AMD Ryzen inst lat and tputPer Hesselgren03/07/17 03:19 PM
                        AMD Ryzen inst lat and tputPer Hesselgren03/18/17 07:50 AM
                          AMD Ryzen inst lat and tputPer Hesselgren03/18/17 08:05 AM
                            AMD Ryzen inst lat and tput-7zipPer Hesselgren03/25/17 03:48 AM
                  FMA tput looks like a mistakeMichael S03/12/17 03:24 AM
                    FMA tput looks like a mistakeGian-Carlo Pascutto03/14/17 01:27 AM
                      FMA tput looks like a mistakeMichael S03/14/17 03:39 AM
                        FMA tput looks like a mistakeGian-Carlo Pascutto03/14/17 07:36 AM
                          FMA tput looks like a mistakeMichael S03/14/17 08:45 AM
                      FMA tput looks like a mistakeEmil Briggs03/14/17 04:24 PM
                        FMA tput looks like a mistakeGian-Carlo Pascutto03/15/17 01:32 AM
        AMD describes it as 6 wideBrett03/04/17 01:39 PM
      AMD describes it as 6 wideTravis03/04/17 01:06 PM
        FP & Int pipelines, AVX-256VertexMaster03/04/17 02:09 PM
          FP & Int pipelines, AVX-256Travis03/04/17 03:26 PM
          No Ryzen BKDG yetg c03/19/17 07:43 AM
            No Ryzen BKDG yetGroo03/20/17 02:02 PM
              No Ryzen BKDG & revision guide yetBrendan03/22/17 12:37 AM
                No Ryzen BKDG & revision guide yetmuziqaz03/22/17 03:26 AM
                No Ryzen BKDG & revision guide yetGroo03/22/17 06:54 AM
                  No Ryzen BKDG & revision guide yetAdrian03/22/17 07:54 AM
                  No Ryzen BKDG & revision guide yetblue03/22/17 07:58 AM
                    FMA3 bug possibly connected to power managementhobold03/22/17 09:14 AM
                      FMA3 bug possibly connected to power managementMatthias Waldhauer03/28/17 07:20 AM
                        FMA3 bug only in WindowsPer Hesselgren04/05/17 06:29 AM
                    No Ryzen BKDG & revision guide yetGroo03/23/17 06:50 AM
                  No Ryzen BKDG & revision guide yetBrendan03/23/17 11:31 PM
                    No Ryzen BKDG & revision guide yetMatthias Waldhauer03/24/17 06:08 PM
        AMD describes it as 6 wideDavid Kanter03/04/17 04:19 PM
          AMD describes it as 6 wide (how to test this?)Domaldel03/22/17 09:58 AM
            AMD describes it as 6 wide (how to test this?)Travis03/22/17 03:17 PM
              AMD describes it as 6 wide (how to test this?)Domaldel03/23/17 12:49 AM
        AMD describes it as 6 wideanon03/04/17 05:16 PM
          AMD describes it as 6 wideTravis03/05/17 12:55 PM
            AMD describes it as 6 wideDavid Kanter03/05/17 06:31 PM
              AMD describes it as 6 wideTravis03/05/17 07:23 PM
                AMD describes it as 6 wideDavid Kanter03/05/17 10:00 PM
                  AMD describes it as 6 wideTravis03/06/17 11:15 AM
                    AMD describes it as 6 wideRobert David Graham03/06/17 11:53 AM
                      AMD describes it as 6 wideTravis03/06/17 04:09 PM
                    AMD describes it as 6 wideTIm McCaffrey03/06/17 02:56 PM
                  AMD describes it as 6 wideDomaldel03/08/17 04:36 PM
                1800x system availablemuziqaz03/13/17 09:22 AM
                  1800x system availablePer Hesselgren03/16/17 03:14 AM
                    1800x system availablemuziqaz03/17/17 03:53 AM
                      1800x system availableAdrian03/17/17 07:25 AM
                        1800x system availableAdrian03/17/17 08:00 AM
                          1800x system availablemuziqaz03/17/17 11:48 AM
                          1800x system availablemuziqaz03/17/17 01:42 PM
                            1800x system availableAdrian03/17/17 02:42 PM
                              1800x system availablemuziqaz03/17/17 03:07 PM
                                1800x system availableAdrian03/17/17 03:18 PM
                            1800x system availableAdrian03/17/17 03:07 PM
                            1800x system availableDomaldel03/18/17 06:40 AM
                              1800x system availablemuziqaz03/18/17 06:54 AM
                                1800x system availablemuziqaz03/18/17 02:28 PM
                                  1800x system availablePer Hesselgren03/21/17 12:30 AM
                                    1800x system availablePer Hesselgren03/21/17 02:58 AM
                                    1800x system availableGian-Carlo Pascutto03/21/17 12:25 PM
                                      1800x system availableMichael_S03/21/17 02:24 PM
                                        1800x system availableGian-Carlo Pascutto03/21/17 02:56 PM
                                          1800x system availableMichael S03/21/17 03:36 PM
                                            done (NT)Michael S03/22/17 03:59 AM
                                      1800x system availableanonymou503/21/17 03:27 PM
                                        1800x system availableWilco03/22/17 06:17 PM
                                          1800x system availableDomaldel03/23/17 12:52 AM
                                          1800x system availableanonymou503/23/17 12:24 PM
                                            1800x system availableKlimax04/03/17 01:35 AM
                                    1800x system availablemuziqaz03/22/17 03:25 AM
                                      1800x system availablemuziqaz03/24/17 03:24 AM
                              1800x system availableAdrian03/18/17 08:36 AM
                    1800x system availablePer Hesselgren03/17/17 08:49 AM
                      1800x system available-16TPer Hesselgren03/17/17 01:10 PM
                      1800x system availableMichael S03/18/17 10:21 AM
                        1800x system availablePer Hesselgren03/19/17 03:06 AM
                          better test desired. This one is not pushing FPU to the limitsMichael S03/19/17 04:28 AM
                            better test desired. This one is not pushing FPU to the limitsPer Hesselgren03/19/17 08:49 AM
                              better test desired. This one is not pushing FPU to the limitsPer Hesselgren03/19/17 10:47 AM
                                better test desired. This one is not pushing FPU to the limitsMichael S03/19/17 11:45 AM
                                  better test desired. This one is not pushing FPU to the limitsMichael S03/19/17 11:47 AM
        AMD describes it as 6 widePoindexter03/05/17 07:55 PM
          AMD describes it as 6 wideTravis03/05/17 08:05 PM
            AMD describes it as 6 wideDavid Kanter03/05/17 10:03 PM
              AMD describes it as 6 wideTravis03/06/17 11:19 AM
                AMD describes it as 6 widePoindexter03/07/17 08:54 AM
                  AMD describes it as 6 wideTravis03/07/17 01:54 PM
                  AMD describes it as 6 wideanon03/08/17 04:19 AM
                    AMD describes it as 6 wideTravis03/08/17 12:35 PM
                      AMD describes it as 6 wideanon03/08/17 02:48 PM
                        AMD describes it as 6 wideTravis03/09/17 12:02 PM
                          AMD describes it as 6 wideanon03/09/17 05:44 PM
                            AMD describes it as 6 wideanon03/12/17 08:28 AM
                            AMD describes it as 6 wideTravis06/20/17 04:48 PM
                              AMD describes it as 6 wideAdrian06/21/17 07:48 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?