Benchmarks

By: blaine (myname.delete@this.acm.org), November 26, 2020 12:04 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on November 26, 2020 9:05 am wrote:
> Rayla (rayla.delete@this.example.com) on November 26, 2020 6:46 am wrote:
> > Groo (charlie.delete@this.semiaccurate.com) on November 25, 2020 12:48 pm wrote:
> > > Chester (lamchester.delete@this.gmail.com) on November 24, 2020 11:32 pm wrote:
> > >
> > > >
> > > > My opinion is requiring a license reduces its value as a benchmark. It means you won't
> > > > have a lot of runs from the general public and less data to compare against.
> > > >
> > >
> > > This is quite true but the point of the license is to make
> > > sure people follow the rules. If you have a benchmark
> > > that is used for serious evaluations and large purchases,
> > > integrity is a lot more important than a, "Gee look
> > > at the pretty colors spinning around"-mark 2020. If you require a license, you can also dictate terms, and
> > > that keeps the idiocy down, or at least makes companies work creatively for it. *COUGH* Sun *COUGH*
> > >
> > > Sure the terms are a major pain in the ass but they also keep people from gaming the system
> > > (somewhat) and doing things that are expressly forbidden. You can do these things, or do anything
> > > you want, but you just can't call it an official Spec score when done. To me, what they are
> > > doing makes a lot of sense and removing the license would make the suite worthless.
> > >
> > > -Charlie
> >
> > As a SPEC2017 licensee, I typically agree - though I still think vendor results are largely crap
> > (we have our own DB of results run with configurations explicitly intended to be comparable to
> > each other.) I also think that allowing OpenMP multithreading in SPECspeed (for 657.xz_s) was
> > a really and truly baffling decision by SPEC that reduces the value of SPECspeed results.
> >
> > Still, it's the best of limited options, especially if you're looking at the individual subtest
> > results. I can't comprehend the "SPEC is just microbenchmarks, but Cinebench is a REAL WORKLOAD"
> > argument - SPEC is real application code that does matter in the real world (gcc, perlbench,
> > compression benches, XML parsing) distributed in a repeatable, portable form.
> >
> > And that counts for a lot.
>
> As someone who is intimately involved in producing industry standard benchmarks (MLPerf)
> and has also seen benchmarks in the same field...I have some strong opinions here.
>
> 1. The value of an open group of folks working together is huge. If you just
> ask one company what they care about you will get a limited view point.
>
> For example, FB is very vocal about recommendation being their #1 workload.
> Should we just benchmark machine learning using recommendation?
>
> Probably not. There are lots of other applications that someone like Amazon might be interested in.
>
> 2. It's extremely hard to get things right without exploring a huge amount of subtle details. Go look at the
> MLPerf rules. They look obvious because we wrote them down. But getting to them isn't obvious at all.
>
> 3. If you use an off-the-shelf benchmark like Cinebench, you are explicitly outsourcing all of this.
>
> That's fine if it's a widely used application, but many benchmarks
> have the simple virtue of availability rather than quality.
>
> It's also easy to test the wrong things.
>
> As a simple example, should an ML benchmark include pre-processing of images? Most
> networks want to use raw images, and not JPGs or PNGs. What do you do about that?
>
> The answer is not obvious at all.
>
> Building a benchmark requires a lot of deliberate thought, experimentation, and domain expertise.
> It also requires access to lots of code and data. Honestly, a lot of 'benchmarks' don't
> have that, and most reviewers don't have the time or expertise to pull it off.
>
> When I was doing reviews myself, I had a bit of expertise, but didn't really have
> the time, or the ability to program, or access to interesting data sets.
>
> Many of the RWT readers helped out and provided benchmarks from their own usage (e.g., Carlie Coats, etc.).
>
> But it's really quite hard.
>
> In the case of Phoronix, I think they are also trying to be cross platform...and
> that introduces a huge degree of difficulty. In part because typical usage on
> e.g., Linux and Windows is different, and in part because it's just hard!
>
> David

Having a standards body and a license is not proof against idiocy and avarice. e.g. TPC-C's rules had the effect of making sure that it was always run with the CPU saturated. This is generally not the way that OLTP systems run/ran. Gaming TPC-C was quite the sport, but ultimately, many of those efforts did not help the typical OLTP user. I am suspicious of benchmark efforts with Corporate sponsors or members.

I think that another issue is the "fixed test" problem. Given enough time and the ability to retake the test, students learn to pass a fixed test, even if they didn't learn the subject. The tests should be changed often enough to counter gaming.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Interesting Zen IPC benchmarksAdrian2020/11/21 07:14 AM
  Interesting Zen IPC benchmarksjuanrga2020/11/21 09:22 AM
    Interesting Zen IPC benchmarksChester2020/11/21 02:49 PM
      Interesting Zen IPC benchmarksAndrei F2020/11/22 04:08 AM
        Interesting Zen IPC benchmarksChester2020/11/22 08:33 PM
          Interesting Zen IPC benchmarksnone2020/11/23 12:59 AM
            Interesting Zen IPC benchmarksnone2020/11/23 01:01 AM
              Interesting Zen IPC benchmarksnone2020/11/23 01:01 AM
          No...David Kanter2020/11/23 07:16 AM
            No...Chester2020/11/23 02:15 PM
              No...Foo_2020/11/24 02:34 AM
                No...Chester2020/11/24 11:45 PM
                  No...Per Hesselgren2020/11/25 12:57 AM
                  No...Adrian2020/11/25 03:21 AM
          Interesting Zen IPC benchmarksDoug S2020/11/23 12:03 PM
            Interesting Zen IPC benchmarksChester2020/11/23 02:26 PM
              Programs people useFoo_2020/11/24 02:43 AM
                Programs people useJukka Larja2020/11/24 06:07 AM
                Cinebench is the new Dhrystone :) (NT)none2020/11/25 12:10 AM
              Interesting Zen IPC benchmarksjuanrga2020/11/24 07:38 AM
          Interesting Zen IPC benchmarksAndrei F2020/11/24 04:47 AM
            Interesting Zen IPC benchmarksChester2020/11/24 11:32 PM
              Questionable thoughtsbenchmark critic2020/11/25 07:41 AM
                Questionable thoughtsChester2020/11/25 02:14 PM
                Questionable thoughtsnone2020/11/26 12:14 AM
                  Links?benchmark critic2020/11/26 08:48 AM
              Interesting Zen IPC benchmarksGroo2020/11/25 12:48 PM
                Interesting Zen IPC benchmarksChester2020/11/25 03:36 PM
                  Interesting Zen IPC benchmarksGroo2020/11/26 01:46 PM
                    Interesting Zen IPC benchmarksChester2020/11/26 06:32 PM
                      Interesting Zen IPC benchmarksGroo2020/11/27 09:27 AM
                        Interesting Zen IPC benchmarksChester2020/11/29 06:16 AM
                          Interesting Zen IPC benchmarksGroo2020/11/29 08:56 AM
                            Interesting Zen IPC benchmarksChester2020/11/29 03:41 PM
                Interesting Zen IPC benchmarksRayla2020/11/26 06:46 AM
                  BenchmarksDavid Kanter2020/11/26 09:05 AM
                    Benchmarksblaine2020/11/26 12:04 PM
          Interesting Zen IPC benchmarksPer Hesselgren2020/11/24 09:11 AM
            Interesting Zen IPC benchmarksChester2020/11/24 11:42 PM
      Interesting Zen IPC benchmarksjuanrga2020/11/22 06:09 AM
        Interesting Zen IPC benchmarksChester2020/11/22 08:53 PM
          Interesting Zen IPC benchmarksjuanrga2020/11/23 12:16 PM
            Interesting Zen IPC benchmarksChester2020/11/23 01:27 PM
              Interesting Zen IPC benchmarksjuanrga2020/11/24 07:25 AM
            Interesting Zen IPC benchmarksAdrian2020/11/24 10:51 AM
              Interesting Zen IPC benchmarksjuanrga2020/11/26 03:52 AM
  The Stilt's Zen 3 IPC benchmarksDummond D. Slow2020/11/25 08:29 AM
    The Stilt's Zen 3 IPC benchmarksChester2020/11/25 03:49 PM
      The Stilt's Zen 3 IPC benchmarksDummond D. Slow2020/11/25 04:58 PM
        The Stilt's Zen 3 IPC benchmarksDoug S2020/11/26 08:19 AM
      The Stilt's Zen 3 IPC benchmarksDummond D. Slow2020/11/25 05:13 PM
        The Stilt's Zen 3 IPC benchmarksChester2020/11/26 10:24 AM
          The Stilt's Zen 3 IPC benchmarksitsmydamnation2020/11/26 02:06 PM
            The Stilt's Zen 3 IPC benchmarksChester2020/11/26 06:10 PM
            The Stilt's Zen 3 IPC benchmarksDoug S2020/11/27 03:17 PM
      The Stilt's Zen 3 IPC benchmarksjuanrga2020/11/26 04:10 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?