1800x system available-16T

By: Per Hesselgren (perhesselgren.delete@this.yahoo.se), March 17, 2017 1:10 pm
Room: Moderated Discussions
Per Hesselgren (perhesselgren.delete@this.yahoo.se) on March 17, 2017 8:49 am wrote:
> Per Hesselgren (perhesselgren.delete@this.yahoo.se) on March 16, 2017 3:14 am wrote:
> > muziqaz (m.delete@this.gmail.com) on March 13, 2017 9:22 am wrote:
> > > Travis (travis.downs.delete@this.gmail.com) on March 5, 2017 7:23 pm wrote:
> > > > David Kanter (dkanter.delete@this.realworldtech.com) on March 5, 2017 5:31 pm wrote:
> > > > > Travis (travis.downs.delete@this.gmail.com) on March 5, 2017 11:55 am wrote:
> > > > > > anon (spam.delete.delete@this.this.spam.com) on March 4, 2017 4:16 pm wrote:
> > > > > >
> > > > > > > Fusing cmp + branch like they did on BD might also be possible.
> > > > > >
> > > > > > Yeah. I tried to avoid even bringing that in since it is already confusing enough with all the
> > > > > > different ways of measuring things, and invariably someone will try to add macro-fused branch stuff
> > > > > > to the calculation. So for now I'm just assuming no branch fusion is occurring, or, equivalently,
> > > > > > that it is occurring and we just count the pair as one instruction and one (fused) uop.
> > > > > >
> > > > > > This reduces the complexity - but of course if Ryzen doesn't do that fusion it has to be noted separately
> > > > > > too, since it would be an advantage for Intel, separate from the "more generic" width discussion.
> > > > > >
> > > > > > > Either way decode is probably slightly weaker than SKL in terms of the raw number of instructions.
> > > > > > > Bandwidth to the decode queue might be higher though, so possible benefits on more complex instructions.
> > > > > > >
> > > > > > > If the uop cache actually uses mops, which should be as powerful as fused
> > > > > > > uops, if not more, then there's at least parity with SKL here.
> > > > > > > 4 mops/cycle -> 6 uops/cycle seems not enough to explain
> > > > > > > the performance with SMT, so I'm leaning towards 6 mops.
> > > > > > >
> > > > > > > I don't believe 6 mops / fused uops rename on the integer side is happening. So 6 mops dispatch
> > > > > > > towards int seems unlikely. 4 mops towards int, with combined load + alu mops splitting into
> > > > > > > 2 uops to sustain 6 uop/cycle schedule & execute seems much more balanced and realistic. Ideally
> > > > > > > 4 mops to fp as well, although limited by 6 mops total dispatch. 256bit ops get split into 2
> > > > > > > uops after rename ideally. Same with FMA. Really not sure about these two though.
> > > > > > >
> > > > > > > No idea about how retire slots map to uops/mops/instructions either.
> > > > > > >
> > > > > > > So my take on it is that int rename is about equivalent to 4 fused uops, as is fp rename and
> > > > > > > retire. Combine with the higher latencies (instructions and mov to fp) and lower bandwidth
> > > > > > > (cache/load/store) the IPC between HSW and SKL in ST makes sense. The rename bottleneck being
> > > > > > > alleviated as soon as the fp side gets involved would explain why SMT works so well.
> > > > > > >
> > > > > > >
> > > > > > > Sure, some things could be changed to make it beat SKL but that's what Zen2 is for. All trade
> > > > > > > offs that cost performance seem to be in favour of lower power consumption. Given the efficiency
> > > > > > > we've seen with a slight process disadvantage you can't really argue with that.
> > > > > > > It really seems like a reverse Bulldozer. Instead of starting with a good concept
> > > > > > > and then making all the wrong decision so nothing works well together they started
> > > > > > > with a good concept and everything actually fits together well.
> > > > > >
> > > > > > Yeah...
> > > > > >
> > > > > > Why aren't there any review sites that do these kind of microbenchmarks/micro-architectural investigation?
> > > > > > It's probably a couple hours to throw together the asm, and run it while looking at the timing
> > > > > > and performance counters. I'd do it in a heartbeat if I had access to a Ryzen box.
> > > > > >
> > > > > > No, instead you have 500 sites just pumping out the same basic suite of benchmarks, filled with
> > > > > > wild speculation about why the numbers are as they are. If they sorted out the microarctecture
> > > > > > details first, they could be way more informed when running the primary benchmarks...
> > > > > >
> > > > > > I guess Agner is the guy who has done it in the past (publicly at least),
> > > > > > but it could be months (if ever) before we see a new guide.
> > > > >
> > > > > I may have a system soon, and if you have stuff written, I'd be happy to run it.
> > > > >
> > > > > David
> > > >
> > > > I will put something together. Do you prefer a Windows or Linux binary?
> > > > Or I can just provide a small project and you can compile it.
> > >
> > > I have Ryzen sitting in BIOS right now, its just waiting for me to get windows in.
> > > I could delay windows installation and drop in some sort of Linux distro today.
> > > Though I have to admit I'm quite rusty with linux, been a long time. So yeah
> > > if interested, drop me a line on muziqazatgmaildotcom. Will be happy to help
> >
> > If you have a Ryzen I would recommend you to test this:
> > http://home.vianetworks.nl/users/mhx/mm.c
> > This is a single thread matrix multiplication and the alternatives are interesting.
> > -n is the normal. I have never got the same speed up for AMD and Intel with for example -r.
> > DN=500 is perhaps too small and 800 could be relevant.
> > If you prefer 16 threads test some of the Open MP here:
> > http://people.sc.fsu.edu/~jburkardt/c_src/openmp/openmp.html
> > Floating point examples will give you some speed up but most of the integer tests are too small.
>
> Now I have got a Ryzen 1700 myself so I have some results.
> This is the single thread matrix multiply:
>
> Algorithm Ivy Bridge Excavator Ryzen
> ----n 8,09 8,25 5,22
> ----v 7,91 7,59 5,58
> ----u 7,79 4,56 2,5
> ----p 8,06 7,74 5,27
> ----t 3,08 6,35 4,94
> ----i 1,58 2,5 1,31
> ----b 4,19 6,26 3,87
> ----m 1,39 3,08 1,15
> ----w 2,22 3,66 1,97
> ----r 3,09 6,2 4,94
>
> The times in secs are not so interesting as the clocks are all different.
> But if we use the -n algorithm time as 100% index we get:
> Algorithm Ivy Bridge Excavator Ryzen
> -----n 100 100 100
> -----v 98 92 107
> -----u 96 55 48
> -----p 100 94 101
> -----t 38 77 95
> -----i 20 30 25
> -----b 52 76 74
> -----m 17 37 22
> -----w 27 44 38
> -----r 38 75 95

One result with 16 threads:

COMPUTE_PI
C/OpenMP version

Estimate the value of PI by summing a series.

Number of processors available = 16
Number of threads = 16

R8_TEST:
Estimate the value of PI,
using double arithmetic.

N = number of terms computed and added;

MODE = SEQ for sequential code;
MODE = OMP for Open MP enabled code;
(performance depends on whether Open MP is used,
and how many processes are available)

ESTIMATE = the computed estimate of PI;

ERROR = ( the computed estimate - PI );

TIME = elapsed wall clock time;

Note that you can''t increase N forever, because:
A) ROUNDOFF starts to be a problem, and
B) maximum integer size is a problem.

N Mode Estimate Error Time Ratio

1 SEQ 3.20000000000000 5.84e-02 0.00000020
1 OMP 3.20000000000000 5.84e-02 0.00151341 0.0001

10 SEQ 3.14242598500110 8.33e-04 0.00000012
10 OMP 3.14242598500110 8.33e-04 0.00114450 0.0001

100 SEQ 3.14160098692313 8.33e-06 0.00000058
100 OMP 3.14160098692312 8.33e-06 0.00114989 0.0005

1000 SEQ 3.14159273692312 8.33e-08 0.00000709
1000 OMP 3.14159273692313 8.33e-08 0.00112568 0.0063

10000 SEQ 3.14159265442313 8.33e-10 0.00005276
10000 OMP 3.14159265442313 8.33e-10 0.00112217 0.0470

100000 SEQ 3.14159265359816 8.37e-12 0.00052703
100000 OMP 3.14159265359812 8.33e-12 0.00071872 0.7333

1000000 SEQ 3.14159265358976 2.84e-14 0.00524157
1000000 OMP 3.14159265358987 7.95e-14 0.00109948 4.7673

10000000 SEQ 3.14159265358973 6.22e-14 0.04834265
10000000 OMP 3.14159265358980 1.02e-14 0.00349946 13.8143

100000000 SEQ 3.14159265359043 6.33e-13 0.44884477
100000000 OMP 3.14159265358988 8.93e-14 0.02533827 17.7141

1000000000 SEQ 3.14159265358997 1.78e-13 4.45002986
1000000000 OMP 3.14159265358983 3.95e-14 0.19478657 22.8457


COMPUTE_PI
Normal end of execution.
[person@localhost Dokument]$

The last ratio is in the range 14-24 but if N=10⁷ it is almost stable.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Is Ryzen 6-wide?Travis03/03/17 06:27 PM
  Is Ryzen 6-wide?Maynard Handley03/03/17 06:50 PM
  AMD describes it as 6 wideVertexMaster03/03/17 11:27 PM
    AMD describes it as 6 wideExophase03/04/17 12:21 PM
      AMD describes it as 6 wideanon03/04/17 12:54 PM
        AMD describes it as 6 wideTravis03/04/17 01:12 PM
          AMD describes it as 6 wideTravis03/04/17 02:05 PM
            AMD describes it as 6 wideExophase03/04/17 05:16 PM
              AMD describes it as 6 wideMichael S03/05/17 10:27 AM
              AMD describes it as 6 wideTravis03/05/17 12:45 PM
                AMD Ryzen inst lat and tputPer Hesselgren03/07/17 12:00 AM
                  AMD Ryzen inst lat and tputTravis03/07/17 01:50 PM
                    AMD Ryzen inst lat and tputGabriele Svelto03/08/17 04:37 AM
                      AMD Ryzen inst lat and tputTravis03/08/17 12:03 PM
                  AMD Ryzen inst lat and tputTravis03/07/17 02:01 PM
                    Sorry, failed to properly close code tag :( (NT)Travis03/07/17 02:02 PM
                    AMD Ryzen inst lat and tputPer Hesselgren03/07/17 02:48 PM
                      AMD Ryzen inst lat and tputTravis03/07/17 03:16 PM
                      AMD Ryzen inst lat and tputPer Hesselgren03/07/17 03:19 PM
                        AMD Ryzen inst lat and tputPer Hesselgren03/18/17 07:50 AM
                          AMD Ryzen inst lat and tputPer Hesselgren03/18/17 08:05 AM
                            AMD Ryzen inst lat and tput-7zipPer Hesselgren03/25/17 03:48 AM
                  FMA tput looks like a mistakeMichael S03/12/17 03:24 AM
                    FMA tput looks like a mistakeGian-Carlo Pascutto03/14/17 01:27 AM
                      FMA tput looks like a mistakeMichael S03/14/17 03:39 AM
                        FMA tput looks like a mistakeGian-Carlo Pascutto03/14/17 07:36 AM
                          FMA tput looks like a mistakeMichael S03/14/17 08:45 AM
                      FMA tput looks like a mistakeEmil Briggs03/14/17 04:24 PM
                        FMA tput looks like a mistakeGian-Carlo Pascutto03/15/17 01:32 AM
        AMD describes it as 6 wideBrett03/04/17 01:39 PM
      AMD describes it as 6 wideTravis03/04/17 01:06 PM
        FP & Int pipelines, AVX-256VertexMaster03/04/17 02:09 PM
          FP & Int pipelines, AVX-256Travis03/04/17 03:26 PM
          No Ryzen BKDG yetg c03/19/17 07:43 AM
            No Ryzen BKDG yetGroo03/20/17 02:02 PM
              No Ryzen BKDG & revision guide yetBrendan03/22/17 12:37 AM
                No Ryzen BKDG & revision guide yetmuziqaz03/22/17 03:26 AM
                No Ryzen BKDG & revision guide yetGroo03/22/17 06:54 AM
                  No Ryzen BKDG & revision guide yetAdrian03/22/17 07:54 AM
                  No Ryzen BKDG & revision guide yetblue03/22/17 07:58 AM
                    FMA3 bug possibly connected to power managementhobold03/22/17 09:14 AM
                      FMA3 bug possibly connected to power managementMatthias Waldhauer03/28/17 07:20 AM
                        FMA3 bug only in WindowsPer Hesselgren04/05/17 06:29 AM
                    No Ryzen BKDG & revision guide yetGroo03/23/17 06:50 AM
                  No Ryzen BKDG & revision guide yetBrendan03/23/17 11:31 PM
                    No Ryzen BKDG & revision guide yetMatthias Waldhauer03/24/17 06:08 PM
        AMD describes it as 6 wideDavid Kanter03/04/17 04:19 PM
          AMD describes it as 6 wide (how to test this?)Domaldel03/22/17 09:58 AM
            AMD describes it as 6 wide (how to test this?)Travis03/22/17 03:17 PM
              AMD describes it as 6 wide (how to test this?)Domaldel03/23/17 12:49 AM
        AMD describes it as 6 wideanon03/04/17 05:16 PM
          AMD describes it as 6 wideTravis03/05/17 12:55 PM
            AMD describes it as 6 wideDavid Kanter03/05/17 06:31 PM
              AMD describes it as 6 wideTravis03/05/17 07:23 PM
                AMD describes it as 6 wideDavid Kanter03/05/17 10:00 PM
                  AMD describes it as 6 wideTravis03/06/17 11:15 AM
                    AMD describes it as 6 wideRobert David Graham03/06/17 11:53 AM
                      AMD describes it as 6 wideTravis03/06/17 04:09 PM
                    AMD describes it as 6 wideTIm McCaffrey03/06/17 02:56 PM
                  AMD describes it as 6 wideDomaldel03/08/17 04:36 PM
                1800x system availablemuziqaz03/13/17 09:22 AM
                  1800x system availablePer Hesselgren03/16/17 03:14 AM
                    1800x system availablemuziqaz03/17/17 03:53 AM
                      1800x system availableAdrian03/17/17 07:25 AM
                        1800x system availableAdrian03/17/17 08:00 AM
                          1800x system availablemuziqaz03/17/17 11:48 AM
                          1800x system availablemuziqaz03/17/17 01:42 PM
                            1800x system availableAdrian03/17/17 02:42 PM
                              1800x system availablemuziqaz03/17/17 03:07 PM
                                1800x system availableAdrian03/17/17 03:18 PM
                            1800x system availableAdrian03/17/17 03:07 PM
                            1800x system availableDomaldel03/18/17 06:40 AM
                              1800x system availablemuziqaz03/18/17 06:54 AM
                                1800x system availablemuziqaz03/18/17 02:28 PM
                                  1800x system availablePer Hesselgren03/21/17 12:30 AM
                                    1800x system availablePer Hesselgren03/21/17 02:58 AM
                                    1800x system availableGian-Carlo Pascutto03/21/17 12:25 PM
                                      1800x system availableMichael_S03/21/17 02:24 PM
                                        1800x system availableGian-Carlo Pascutto03/21/17 02:56 PM
                                          1800x system availableMichael S03/21/17 03:36 PM
                                            done (NT)Michael S03/22/17 03:59 AM
                                      1800x system availableanonymou503/21/17 03:27 PM
                                        1800x system availableWilco03/22/17 06:17 PM
                                          1800x system availableDomaldel03/23/17 12:52 AM
                                          1800x system availableanonymou503/23/17 12:24 PM
                                            1800x system availableKlimax04/03/17 01:35 AM
                                    1800x system availablemuziqaz03/22/17 03:25 AM
                                      1800x system availablemuziqaz03/24/17 03:24 AM
                              1800x system availableAdrian03/18/17 08:36 AM
                    1800x system availablePer Hesselgren03/17/17 08:49 AM
                      1800x system available-16TPer Hesselgren03/17/17 01:10 PM
                      1800x system availableMichael S03/18/17 10:21 AM
                        1800x system availablePer Hesselgren03/19/17 03:06 AM
                          better test desired. This one is not pushing FPU to the limitsMichael S03/19/17 04:28 AM
                            better test desired. This one is not pushing FPU to the limitsPer Hesselgren03/19/17 08:49 AM
                              better test desired. This one is not pushing FPU to the limitsPer Hesselgren03/19/17 10:47 AM
                                better test desired. This one is not pushing FPU to the limitsMichael S03/19/17 11:45 AM
                                  better test desired. This one is not pushing FPU to the limitsMichael S03/19/17 11:47 AM
        AMD describes it as 6 widePoindexter03/05/17 07:55 PM
          AMD describes it as 6 wideTravis03/05/17 08:05 PM
            AMD describes it as 6 wideDavid Kanter03/05/17 10:03 PM
              AMD describes it as 6 wideTravis03/06/17 11:19 AM
                AMD describes it as 6 widePoindexter03/07/17 08:54 AM
                  AMD describes it as 6 wideTravis03/07/17 01:54 PM
                  AMD describes it as 6 wideanon03/08/17 04:19 AM
                    AMD describes it as 6 wideTravis03/08/17 12:35 PM
                      AMD describes it as 6 wideanon03/08/17 02:48 PM
                        AMD describes it as 6 wideTravis03/09/17 12:02 PM
                          AMD describes it as 6 wideanon03/09/17 05:44 PM
                            AMD describes it as 6 wideanon03/12/17 08:28 AM
                            AMD describes it as 6 wideTravis06/20/17 04:48 PM
                              AMD describes it as 6 wideAdrian06/21/17 07:48 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?