By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 29, 2016 12:01 pm
Room: Moderated Discussions
John Poole (john.delete@this.primatelabs.com) on July 29, 2016 11:57 am wrote:
>
> Sure thing. Here's some preliminary documentation on the CPU workloads included
> in Geekbench 4: http://www.primatelabs.com/beta/v4-cpu-workloads.pdf
Looks much better.
The two sub-tests that worry me are:
- camera:
How much of that is "integer", and how much of that is crypto engine and GPU?
(the jpeg test could be in this situation too: you say you use "libjpeg", but there are various variations of that with various vector versions, and I could also imagine that people could use GPU jpeg capabilities for it)
And I don't think having "system" tests that use combinations of GPU and crypto engines etc is wrong at all, I just think it's potentially very misleading to make it be a "integer" test and mix it up with other tests that are clearly just CPU.
Part of the problems with GB3 is the confusion about "crypto" and real integer loads. GB4 split out crypto (good), but "camera" (and maybe "jpeg")seems to have potential for introducing another form of confusion.
I think it would be lovely if you had a new category that was called "system" that uses mroe of the SoC and memory/flash, so this is really not saying that your camera test is necessarily bad in itself, but just a "that may not actually be an 'integer' load".
- memory latency:
It's very easy to get this wrong, and have the test be invalidated by CPU prefetching effects. The fact that you say that you've worked to lessn TLB misses makes me nervous. That tends to mean that the accesses are much more predictable, which in turn tends to mean that they are also much more likely to be affected by the prefetcher.
Lots and lots of people have gotten memory latency testing wrong over the years. Even things like "build up a simple linear list from independent allocations" ends up then having really subtle patterns that depend on the allocation patterns of the system, so it may tell less about the CPU memory subsystem than about just random luck in what the linked list access pattern happens to be due to allocator implementation details.
So doing a good job of getting somewhat real memory latency is really quite hard. It's just so easy to be captured by caches and prefetching. And trying to avoid TLB misses makes it much more likely that that will happen.
Linus
>
> Sure thing. Here's some preliminary documentation on the CPU workloads included
> in Geekbench 4: http://www.primatelabs.com/beta/v4-cpu-workloads.pdf
Looks much better.
The two sub-tests that worry me are:
- camera:
How much of that is "integer", and how much of that is crypto engine and GPU?
(the jpeg test could be in this situation too: you say you use "libjpeg", but there are various variations of that with various vector versions, and I could also imagine that people could use GPU jpeg capabilities for it)
And I don't think having "system" tests that use combinations of GPU and crypto engines etc is wrong at all, I just think it's potentially very misleading to make it be a "integer" test and mix it up with other tests that are clearly just CPU.
Part of the problems with GB3 is the confusion about "crypto" and real integer loads. GB4 split out crypto (good), but "camera" (and maybe "jpeg")seems to have potential for introducing another form of confusion.
I think it would be lovely if you had a new category that was called "system" that uses mroe of the SoC and memory/flash, so this is really not saying that your camera test is necessarily bad in itself, but just a "that may not actually be an 'integer' load".
- memory latency:
It's very easy to get this wrong, and have the test be invalidated by CPU prefetching effects. The fact that you say that you've worked to lessn TLB misses makes me nervous. That tends to mean that the accesses are much more predictable, which in turn tends to mean that they are also much more likely to be affected by the prefetcher.
Lots and lots of people have gotten memory latency testing wrong over the years. Even things like "build up a simple linear list from independent allocations" ends up then having really subtle patterns that depend on the allocation patterns of the system, so it may tell less about the CPU memory subsystem than about just random luck in what the linked list access pattern happens to be due to allocator implementation details.
So doing a good job of getting somewhat real memory latency is really quite hard. It's just so easy to be captured by caches and prefetching. And trying to avoid TLB misses makes it much more likely that that will happen.
Linus
Topic | Posted By | Date |
---|---|---|
Geekbench 4 | John Poole | 2016/07/28 07:12 PM |
Geekbench 4 | anon | 2016/07/28 09:58 PM |
Geekbench 4 | Doug S | 2016/07/29 08:08 AM |
Geekbench 4 | John Poole | 2016/07/29 10:57 AM |
Geekbench 4 | someone | 2016/07/29 11:58 AM |
Geekbench 4 | Linus Torvalds | 2016/07/29 12:01 PM |
Geekbench 4 | John Poole | 2016/07/29 12:51 PM |
Geekbench 4 | Linus Torvalds | 2016/07/29 01:28 PM |
Geekbench 4 | Doug S | 2016/07/29 06:59 PM |
Geekbench 4 | Yoav | 2016/07/29 10:22 PM |
Geekbench 4 | Yoav | 2016/07/29 10:22 PM |
Geekbench 4 | anon | 2016/07/29 10:03 PM |
Geekbench 4 | Doug S | 2016/07/30 09:06 AM |
Geekbench 4 | Gabriele Svelto | 2016/07/30 11:49 AM |
Geekbench 4 | Maynard Handley | 2016/07/30 04:45 PM |
libjpeg, LLVM and sqlite | Gabriele Svelto | 2016/07/29 10:26 PM |
Geekbench 4 | none | 2016/07/31 09:48 AM |
Xcode7( llvm3.7?) vs Clang 3.8 | xx | 2016/08/11 10:18 PM |