By: David Kanter (dkanter.delete@this.realworldtech.com), November 26, 2020 9:05 am
Room: Moderated Discussions
Rayla (rayla.delete@this.example.com) on November 26, 2020 6:46 am wrote:
> Groo (charlie.delete@this.semiaccurate.com) on November 25, 2020 12:48 pm wrote:
> > Chester (lamchester.delete@this.gmail.com) on November 24, 2020 11:32 pm wrote:
> >
> > >
> > > My opinion is requiring a license reduces its value as a benchmark. It means you won't
> > > have a lot of runs from the general public and less data to compare against.
> > >
> >
> > This is quite true but the point of the license is to make
> > sure people follow the rules. If you have a benchmark
> > that is used for serious evaluations and large purchases,
> > integrity is a lot more important than a, "Gee look
> > at the pretty colors spinning around"-mark 2020. If you require a license, you can also dictate terms, and
> > that keeps the idiocy down, or at least makes companies work creatively for it. *COUGH* Sun *COUGH*
> >
> > Sure the terms are a major pain in the ass but they also keep people from gaming the system
> > (somewhat) and doing things that are expressly forbidden. You can do these things, or do anything
> > you want, but you just can't call it an official Spec score when done. To me, what they are
> > doing makes a lot of sense and removing the license would make the suite worthless.
> >
> > -Charlie
>
> As a SPEC2017 licensee, I typically agree - though I still think vendor results are largely crap
> (we have our own DB of results run with configurations explicitly intended to be comparable to
> each other.) I also think that allowing OpenMP multithreading in SPECspeed (for 657.xz_s) was
> a really and truly baffling decision by SPEC that reduces the value of SPECspeed results.
>
> Still, it's the best of limited options, especially if you're looking at the individual subtest
> results. I can't comprehend the "SPEC is just microbenchmarks, but Cinebench is a REAL WORKLOAD"
> argument - SPEC is real application code that does matter in the real world (gcc, perlbench,
> compression benches, XML parsing) distributed in a repeatable, portable form.
>
> And that counts for a lot.
As someone who is intimately involved in producing industry standard benchmarks (MLPerf) and has also seen benchmarks in the same field...I have some strong opinions here.
1. The value of an open group of folks working together is huge. If you just ask one company what they care about you will get a limited view point.
For example, FB is very vocal about recommendation being their #1 workload. Should we just benchmark machine learning using recommendation?
Probably not. There are lots of other applications that someone like Amazon might be interested in.
2. It's extremely hard to get things right without exploring a huge amount of subtle details. Go look at the MLPerf rules. They look obvious because we wrote them down. But getting to them isn't obvious at all.
3. If you use an off-the-shelf benchmark like Cinebench, you are explicitly outsourcing all of this.
That's fine if it's a widely used application, but many benchmarks have the simple virtue of availability rather than quality.
It's also easy to test the wrong things.
As a simple example, should an ML benchmark include pre-processing of images? Most networks want to use raw images, and not JPGs or PNGs. What do you do about that?
The answer is not obvious at all.
Building a benchmark requires a lot of deliberate thought, experimentation, and domain expertise. It also requires access to lots of code and data. Honestly, a lot of 'benchmarks' don't have that, and most reviewers don't have the time or expertise to pull it off.
When I was doing reviews myself, I had a bit of expertise, but didn't really have the time, or the ability to program, or access to interesting data sets.
Many of the RWT readers helped out and provided benchmarks from their own usage (e.g., Carlie Coats, etc.).
But it's really quite hard.
In the case of Phoronix, I think they are also trying to be cross platform...and that introduces a huge degree of difficulty. In part because typical usage on e.g., Linux and Windows is different, and in part because it's just hard!
David
> Groo (charlie.delete@this.semiaccurate.com) on November 25, 2020 12:48 pm wrote:
> > Chester (lamchester.delete@this.gmail.com) on November 24, 2020 11:32 pm wrote:
> >
> > >
> > > My opinion is requiring a license reduces its value as a benchmark. It means you won't
> > > have a lot of runs from the general public and less data to compare against.
> > >
> >
> > This is quite true but the point of the license is to make
> > sure people follow the rules. If you have a benchmark
> > that is used for serious evaluations and large purchases,
> > integrity is a lot more important than a, "Gee look
> > at the pretty colors spinning around"-mark 2020. If you require a license, you can also dictate terms, and
> > that keeps the idiocy down, or at least makes companies work creatively for it. *COUGH* Sun *COUGH*
> >
> > Sure the terms are a major pain in the ass but they also keep people from gaming the system
> > (somewhat) and doing things that are expressly forbidden. You can do these things, or do anything
> > you want, but you just can't call it an official Spec score when done. To me, what they are
> > doing makes a lot of sense and removing the license would make the suite worthless.
> >
> > -Charlie
>
> As a SPEC2017 licensee, I typically agree - though I still think vendor results are largely crap
> (we have our own DB of results run with configurations explicitly intended to be comparable to
> each other.) I also think that allowing OpenMP multithreading in SPECspeed (for 657.xz_s) was
> a really and truly baffling decision by SPEC that reduces the value of SPECspeed results.
>
> Still, it's the best of limited options, especially if you're looking at the individual subtest
> results. I can't comprehend the "SPEC is just microbenchmarks, but Cinebench is a REAL WORKLOAD"
> argument - SPEC is real application code that does matter in the real world (gcc, perlbench,
> compression benches, XML parsing) distributed in a repeatable, portable form.
>
> And that counts for a lot.
As someone who is intimately involved in producing industry standard benchmarks (MLPerf) and has also seen benchmarks in the same field...I have some strong opinions here.
1. The value of an open group of folks working together is huge. If you just ask one company what they care about you will get a limited view point.
For example, FB is very vocal about recommendation being their #1 workload. Should we just benchmark machine learning using recommendation?
Probably not. There are lots of other applications that someone like Amazon might be interested in.
2. It's extremely hard to get things right without exploring a huge amount of subtle details. Go look at the MLPerf rules. They look obvious because we wrote them down. But getting to them isn't obvious at all.
3. If you use an off-the-shelf benchmark like Cinebench, you are explicitly outsourcing all of this.
That's fine if it's a widely used application, but many benchmarks have the simple virtue of availability rather than quality.
It's also easy to test the wrong things.
As a simple example, should an ML benchmark include pre-processing of images? Most networks want to use raw images, and not JPGs or PNGs. What do you do about that?
The answer is not obvious at all.
Building a benchmark requires a lot of deliberate thought, experimentation, and domain expertise. It also requires access to lots of code and data. Honestly, a lot of 'benchmarks' don't have that, and most reviewers don't have the time or expertise to pull it off.
When I was doing reviews myself, I had a bit of expertise, but didn't really have the time, or the ability to program, or access to interesting data sets.
Many of the RWT readers helped out and provided benchmarks from their own usage (e.g., Carlie Coats, etc.).
But it's really quite hard.
In the case of Phoronix, I think they are also trying to be cross platform...and that introduces a huge degree of difficulty. In part because typical usage on e.g., Linux and Windows is different, and in part because it's just hard!
David
Topic | Posted By | Date |
---|---|---|
Interesting Zen IPC benchmarks | Adrian | 2020/11/21 07:14 AM |
Interesting Zen IPC benchmarks | juanrga | 2020/11/21 09:22 AM |
Interesting Zen IPC benchmarks | Chester | 2020/11/21 02:49 PM |
Interesting Zen IPC benchmarks | Andrei F | 2020/11/22 04:08 AM |
Interesting Zen IPC benchmarks | Chester | 2020/11/22 08:33 PM |
Interesting Zen IPC benchmarks | none | 2020/11/23 12:59 AM |
Interesting Zen IPC benchmarks | none | 2020/11/23 01:01 AM |
Interesting Zen IPC benchmarks | none | 2020/11/23 01:01 AM |
No... | David Kanter | 2020/11/23 07:16 AM |
No... | Chester | 2020/11/23 02:15 PM |
No... | Foo_ | 2020/11/24 02:34 AM |
No... | Chester | 2020/11/24 11:45 PM |
No... | Per Hesselgren | 2020/11/25 12:57 AM |
No... | Adrian | 2020/11/25 03:21 AM |
Interesting Zen IPC benchmarks | Doug S | 2020/11/23 12:03 PM |
Interesting Zen IPC benchmarks | Chester | 2020/11/23 02:26 PM |
Programs people use | Foo_ | 2020/11/24 02:43 AM |
Programs people use | Jukka Larja | 2020/11/24 06:07 AM |
Cinebench is the new Dhrystone :) (NT) | none | 2020/11/25 12:10 AM |
Interesting Zen IPC benchmarks | juanrga | 2020/11/24 07:38 AM |
Interesting Zen IPC benchmarks | Andrei F | 2020/11/24 04:47 AM |
Interesting Zen IPC benchmarks | Chester | 2020/11/24 11:32 PM |
Questionable thoughts | benchmark critic | 2020/11/25 07:41 AM |
Questionable thoughts | Chester | 2020/11/25 02:14 PM |
Questionable thoughts | none | 2020/11/26 12:14 AM |
Links? | benchmark critic | 2020/11/26 08:48 AM |
Interesting Zen IPC benchmarks | Groo | 2020/11/25 12:48 PM |
Interesting Zen IPC benchmarks | Chester | 2020/11/25 03:36 PM |
Interesting Zen IPC benchmarks | Groo | 2020/11/26 01:46 PM |
Interesting Zen IPC benchmarks | Chester | 2020/11/26 06:32 PM |
Interesting Zen IPC benchmarks | Groo | 2020/11/27 09:27 AM |
Interesting Zen IPC benchmarks | Chester | 2020/11/29 06:16 AM |
Interesting Zen IPC benchmarks | Groo | 2020/11/29 08:56 AM |
Interesting Zen IPC benchmarks | Chester | 2020/11/29 03:41 PM |
Interesting Zen IPC benchmarks | Rayla | 2020/11/26 06:46 AM |
Benchmarks | David Kanter | 2020/11/26 09:05 AM |
Benchmarks | blaine | 2020/11/26 12:04 PM |
Interesting Zen IPC benchmarks | Per Hesselgren | 2020/11/24 09:11 AM |
Interesting Zen IPC benchmarks | Chester | 2020/11/24 11:42 PM |
Interesting Zen IPC benchmarks | juanrga | 2020/11/22 06:09 AM |
Interesting Zen IPC benchmarks | Chester | 2020/11/22 08:53 PM |
Interesting Zen IPC benchmarks | juanrga | 2020/11/23 12:16 PM |
Interesting Zen IPC benchmarks | Chester | 2020/11/23 01:27 PM |
Interesting Zen IPC benchmarks | juanrga | 2020/11/24 07:25 AM |
Interesting Zen IPC benchmarks | Adrian | 2020/11/24 10:51 AM |
Interesting Zen IPC benchmarks | juanrga | 2020/11/26 03:52 AM |
The Stilt's Zen 3 IPC benchmarks | Dummond D. Slow | 2020/11/25 08:29 AM |
The Stilt's Zen 3 IPC benchmarks | Chester | 2020/11/25 03:49 PM |
The Stilt's Zen 3 IPC benchmarks | Dummond D. Slow | 2020/11/25 04:58 PM |
The Stilt's Zen 3 IPC benchmarks | Doug S | 2020/11/26 08:19 AM |
The Stilt's Zen 3 IPC benchmarks | Dummond D. Slow | 2020/11/25 05:13 PM |
The Stilt's Zen 3 IPC benchmarks | Chester | 2020/11/26 10:24 AM |
The Stilt's Zen 3 IPC benchmarks | itsmydamnation | 2020/11/26 02:06 PM |
The Stilt's Zen 3 IPC benchmarks | Chester | 2020/11/26 06:10 PM |
The Stilt's Zen 3 IPC benchmarks | Doug S | 2020/11/27 03:17 PM |
The Stilt's Zen 3 IPC benchmarks | juanrga | 2020/11/26 04:10 AM |