Benchmarks

By: David Kanter (dkanter.delete@this.realworldtech.com), November 26, 2020 9:05 am
Room: Moderated Discussions
Rayla (rayla.delete@this.example.com) on November 26, 2020 6:46 am wrote:
> Groo (charlie.delete@this.semiaccurate.com) on November 25, 2020 12:48 pm wrote:
> > Chester (lamchester.delete@this.gmail.com) on November 24, 2020 11:32 pm wrote:
> >
> > >
> > > My opinion is requiring a license reduces its value as a benchmark. It means you won't
> > > have a lot of runs from the general public and less data to compare against.
> > >
> >
> > This is quite true but the point of the license is to make
> > sure people follow the rules. If you have a benchmark
> > that is used for serious evaluations and large purchases,
> > integrity is a lot more important than a, "Gee look
> > at the pretty colors spinning around"-mark 2020. If you require a license, you can also dictate terms, and
> > that keeps the idiocy down, or at least makes companies work creatively for it. *COUGH* Sun *COUGH*
> >
> > Sure the terms are a major pain in the ass but they also keep people from gaming the system
> > (somewhat) and doing things that are expressly forbidden. You can do these things, or do anything
> > you want, but you just can't call it an official Spec score when done. To me, what they are
> > doing makes a lot of sense and removing the license would make the suite worthless.
> >
> > -Charlie
>
> As a SPEC2017 licensee, I typically agree - though I still think vendor results are largely crap
> (we have our own DB of results run with configurations explicitly intended to be comparable to
> each other.) I also think that allowing OpenMP multithreading in SPECspeed (for 657.xz_s) was
> a really and truly baffling decision by SPEC that reduces the value of SPECspeed results.
>
> Still, it's the best of limited options, especially if you're looking at the individual subtest
> results. I can't comprehend the "SPEC is just microbenchmarks, but Cinebench is a REAL WORKLOAD"
> argument - SPEC is real application code that does matter in the real world (gcc, perlbench,
> compression benches, XML parsing) distributed in a repeatable, portable form.
>
> And that counts for a lot.

As someone who is intimately involved in producing industry standard benchmarks (MLPerf) and has also seen benchmarks in the same field...I have some strong opinions here.

1. The value of an open group of folks working together is huge. If you just ask one company what they care about you will get a limited view point.

For example, FB is very vocal about recommendation being their #1 workload. Should we just benchmark machine learning using recommendation?

Probably not. There are lots of other applications that someone like Amazon might be interested in.

2. It's extremely hard to get things right without exploring a huge amount of subtle details. Go look at the MLPerf rules. They look obvious because we wrote them down. But getting to them isn't obvious at all.

3. If you use an off-the-shelf benchmark like Cinebench, you are explicitly outsourcing all of this.

That's fine if it's a widely used application, but many benchmarks have the simple virtue of availability rather than quality.

It's also easy to test the wrong things.

As a simple example, should an ML benchmark include pre-processing of images? Most networks want to use raw images, and not JPGs or PNGs. What do you do about that?

The answer is not obvious at all.

Building a benchmark requires a lot of deliberate thought, experimentation, and domain expertise. It also requires access to lots of code and data. Honestly, a lot of 'benchmarks' don't have that, and most reviewers don't have the time or expertise to pull it off.

When I was doing reviews myself, I had a bit of expertise, but didn't really have the time, or the ability to program, or access to interesting data sets.

Many of the RWT readers helped out and provided benchmarks from their own usage (e.g., Carlie Coats, etc.).

But it's really quite hard.

In the case of Phoronix, I think they are also trying to be cross platform...and that introduces a huge degree of difficulty. In part because typical usage on e.g., Linux and Windows is different, and in part because it's just hard!

David
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Interesting Zen IPC benchmarksAdrian2020/11/21 07:14 AM
  Interesting Zen IPC benchmarksjuanrga2020/11/21 09:22 AM
    Interesting Zen IPC benchmarksChester2020/11/21 02:49 PM
      Interesting Zen IPC benchmarksAndrei F2020/11/22 04:08 AM
        Interesting Zen IPC benchmarksChester2020/11/22 08:33 PM
          Interesting Zen IPC benchmarksnone2020/11/23 12:59 AM
            Interesting Zen IPC benchmarksnone2020/11/23 01:01 AM
              Interesting Zen IPC benchmarksnone2020/11/23 01:01 AM
          No...David Kanter2020/11/23 07:16 AM
            No...Chester2020/11/23 02:15 PM
              No...Foo_2020/11/24 02:34 AM
                No...Chester2020/11/24 11:45 PM
                  No...Per Hesselgren2020/11/25 12:57 AM
                  No...Adrian2020/11/25 03:21 AM
          Interesting Zen IPC benchmarksDoug S2020/11/23 12:03 PM
            Interesting Zen IPC benchmarksChester2020/11/23 02:26 PM
              Programs people useFoo_2020/11/24 02:43 AM
                Programs people useJukka Larja2020/11/24 06:07 AM
                Cinebench is the new Dhrystone :) (NT)none2020/11/25 12:10 AM
              Interesting Zen IPC benchmarksjuanrga2020/11/24 07:38 AM
          Interesting Zen IPC benchmarksAndrei F2020/11/24 04:47 AM
            Interesting Zen IPC benchmarksChester2020/11/24 11:32 PM
              Questionable thoughtsbenchmark critic2020/11/25 07:41 AM
                Questionable thoughtsChester2020/11/25 02:14 PM
                Questionable thoughtsnone2020/11/26 12:14 AM
                  Links?benchmark critic2020/11/26 08:48 AM
              Interesting Zen IPC benchmarksGroo2020/11/25 12:48 PM
                Interesting Zen IPC benchmarksChester2020/11/25 03:36 PM
                  Interesting Zen IPC benchmarksGroo2020/11/26 01:46 PM
                    Interesting Zen IPC benchmarksChester2020/11/26 06:32 PM
                      Interesting Zen IPC benchmarksGroo2020/11/27 09:27 AM
                        Interesting Zen IPC benchmarksChester2020/11/29 06:16 AM
                          Interesting Zen IPC benchmarksGroo2020/11/29 08:56 AM
                            Interesting Zen IPC benchmarksChester2020/11/29 03:41 PM
                Interesting Zen IPC benchmarksRayla2020/11/26 06:46 AM
                  BenchmarksDavid Kanter2020/11/26 09:05 AM
                    Benchmarksblaine2020/11/26 12:04 PM
          Interesting Zen IPC benchmarksPer Hesselgren2020/11/24 09:11 AM
            Interesting Zen IPC benchmarksChester2020/11/24 11:42 PM
      Interesting Zen IPC benchmarksjuanrga2020/11/22 06:09 AM
        Interesting Zen IPC benchmarksChester2020/11/22 08:53 PM
          Interesting Zen IPC benchmarksjuanrga2020/11/23 12:16 PM
            Interesting Zen IPC benchmarksChester2020/11/23 01:27 PM
              Interesting Zen IPC benchmarksjuanrga2020/11/24 07:25 AM
            Interesting Zen IPC benchmarksAdrian2020/11/24 10:51 AM
              Interesting Zen IPC benchmarksjuanrga2020/11/26 03:52 AM
  The Stilt's Zen 3 IPC benchmarksDummond D. Slow2020/11/25 08:29 AM
    The Stilt's Zen 3 IPC benchmarksChester2020/11/25 03:49 PM
      The Stilt's Zen 3 IPC benchmarksDummond D. Slow2020/11/25 04:58 PM
        The Stilt's Zen 3 IPC benchmarksDoug S2020/11/26 08:19 AM
      The Stilt's Zen 3 IPC benchmarksDummond D. Slow2020/11/25 05:13 PM
        The Stilt's Zen 3 IPC benchmarksChester2020/11/26 10:24 AM
          The Stilt's Zen 3 IPC benchmarksitsmydamnation2020/11/26 02:06 PM
            The Stilt's Zen 3 IPC benchmarksChester2020/11/26 06:10 PM
            The Stilt's Zen 3 IPC benchmarksDoug S2020/11/27 03:17 PM
      The Stilt's Zen 3 IPC benchmarksjuanrga2020/11/26 04:10 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?