Geekbench 4

By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 29, 2016 12:01 pm
Room: Moderated Discussions
John Poole (john.delete@this.primatelabs.com) on July 29, 2016 11:57 am wrote:
>
> Sure thing. Here's some preliminary documentation on the CPU workloads included
> in Geekbench 4: http://www.primatelabs.com/beta/v4-cpu-workloads.pdf

Looks much better.

The two sub-tests that worry me are:

- camera:

How much of that is "integer", and how much of that is crypto engine and GPU?

(the jpeg test could be in this situation too: you say you use "libjpeg", but there are various variations of that with various vector versions, and I could also imagine that people could use GPU jpeg capabilities for it)

And I don't think having "system" tests that use combinations of GPU and crypto engines etc is wrong at all, I just think it's potentially very misleading to make it be a "integer" test and mix it up with other tests that are clearly just CPU.

Part of the problems with GB3 is the confusion about "crypto" and real integer loads. GB4 split out crypto (good), but "camera" (and maybe "jpeg")seems to have potential for introducing another form of confusion.

I think it would be lovely if you had a new category that was called "system" that uses mroe of the SoC and memory/flash, so this is really not saying that your camera test is necessarily bad in itself, but just a "that may not actually be an 'integer' load".

- memory latency:

It's very easy to get this wrong, and have the test be invalidated by CPU prefetching effects. The fact that you say that you've worked to lessn TLB misses makes me nervous. That tends to mean that the accesses are much more predictable, which in turn tends to mean that they are also much more likely to be affected by the prefetcher.

Lots and lots of people have gotten memory latency testing wrong over the years. Even things like "build up a simple linear list from independent allocations" ends up then having really subtle patterns that depend on the allocation patterns of the system, so it may tell less about the CPU memory subsystem than about just random luck in what the linked list access pattern happens to be due to allocator implementation details.

So doing a good job of getting somewhat real memory latency is really quite hard. It's just so easy to be captured by caches and prefetching. And trying to avoid TLB misses makes it much more likely that that will happen.

Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Geekbench 4John Poole2016/07/28 07:12 PM
  Geekbench 4anon2016/07/28 09:58 PM
    Geekbench 4Doug S2016/07/29 08:08 AM
      Geekbench 4John Poole2016/07/29 10:57 AM
        Geekbench 4someone2016/07/29 11:58 AM
        Geekbench 4Linus Torvalds2016/07/29 12:01 PM
          Geekbench 4John Poole2016/07/29 12:51 PM
            Geekbench 4Linus Torvalds2016/07/29 01:28 PM
        Geekbench 4Doug S2016/07/29 06:59 PM
          Geekbench 4Yoav2016/07/29 10:22 PM
            Geekbench 4Yoav2016/07/29 10:22 PM
        Geekbench 4anon2016/07/29 10:03 PM
          Geekbench 4Doug S2016/07/30 09:06 AM
            Geekbench 4Gabriele Svelto2016/07/30 11:49 AM
            Geekbench 4Maynard Handley2016/07/30 04:45 PM
        libjpeg, LLVM and sqliteGabriele Svelto2016/07/29 10:26 PM
        Geekbench 4none2016/07/31 09:48 AM
        Xcode7( llvm3.7?) vs Clang 3.8 xx2016/08/11 10:18 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?