Timing sensitive performance

Article: Performance Analysis for Core 2 and K8: Part 1
By: David Kanter (dkanter.delete@this.realworldtech.com), November 3, 2008 9:52 am
Room: Moderated Discussions
Linus Torvalds (torvalds@linux-foundation.org) on 10/31/08 wrote:
---------------------------
>Howard Chu (hyc@symas.com) on 10/31/08 wrote:
>>
>>Given the flakiness of the tools, it would have been
>>worthwhile to hack the codepath selection in each of the
>>programs-under-test, to force identical codepaths on both
>>platforms.
>
>That really isn't very easy at all.
>
>In fact, even if you make your CPU lie about cpuid (by
>using virtualization, for example), or force the software
>to ignore cpuid and always use the same code path, there
>is really a bigger problem: timing-based path selection.

I would tend to agree. There was a great paper on TPC-C, that showed that small changes in the timing of cache misses could result in pretty big OS scheduling differences. IIRC, in one run they injected a cache miss every 100 cycles (0,100,200...), while in a second run they injected a miss every 100 cycles, starting at cycle 50, 150...

The overall difference in performance turned out to be around 5-10%.

Unfortunately I don't know how to avoid this problem, although as you later pointed out, detection is feasible.

>For example, David used "misses per kilo-instruction" as
>a way to 'normalize' the numbers, but that's often not a
>good normalization at all.
>
>Why? Because rather than 'normalize' things for path
>differences, it can cause seriously misleading values in
>the face of anything that is timing-sensitive.
>
>For example, let's assume that some of the benchmarks are
>almost entirely limited by the graphics card (which is not
>at all unlikely for the high-quality cases for some of the
>games). What does that lead to?
>
>It leads to the CPU being throttled, and while throttling,
>you're going to get a very special code-path selection,
>and not one that is at all dependent on the type of CPU.
>
>Now, if the throttling ends up doing something that
>isn't counted at all (for example, it might halt the CPU
>waiting for an interrupt from the graphics card), you are
>going to get numbers that are still largely "relevant". The
>"misses-per-instruction" is still a valid number.

So I counted non-halted clock cycles. How the CPU handles being told to go idle is unclear - I'd hope that it does halt, but it may not.

>But quite often, throttling ends up being a busy loop. Yeah,
>the game may end up doing AI while waiting for graphics,
>and just generally doing something relevant. But it's also
>quite possible that the throttling ends up being some kind
>of busy loop.
>
>Now, the "busy loop" may be a really big one, like the
>Windows idle loop, but it can be a fairly tight one as
>well. Especially for a game that is single-threaded, and
>doesn't care about multi-tasking (and many games do not),
>I can well see the case of "graphics card is busy" being
>a very tight busy loop that just reads a status register.
>
>And if so, your "per instruction" values may be very
>misleading indeed. Depending on just how much you wait for
>the graphics card, your statistics may be swamped not by
>the actual work you do, but by all the dead time.
>
>That's true regardless of whether the loop is large or
>small, but with a small loop the results can be even more
>misleading, especially if looking at things like cache
>misses per instruction - your numbers may be more indicative
>of the loop than of the load you actually want to measure,
>and a tight loop i likely to be more wildly different from
>the real load than a large one.
>
>Things like that can really make your numbers be
>meaningless, and hard to compare across CPU's (not just
>different architectures, but even with the same
>microarchitecture, just running at different speeds).
>
>It's quite possible that the games David tested had no
>such issues, but in general, I would suggest that if there
>is a possibility of timing-related measurement affecting the
>end result, you should try to test otherwise identical
>machines with different CPU speeds to at least verify that
>timing does not make a huge difference.
>
>So it would be interesting to hear, for example, whether
>the Intel Core 2 numbers (that seemed to be much more
>reliable) were similar when running at 2.93GHz and when
>running at (say) 1.86GHz.
>
>If they are similar, you have a much better confidence in
>the numbers being meaningful. And if they are not, then
>you know that what you're looking at isn't even tied to
>microarchitecture, so comparing two different uarcs using
>the numbers is now much less likely to be interesting.
>
>Think of it as an inherent "error bar". How big is it?
>
>Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Performance analysis of K8 and Core onlineDavid Kanter2008/10/29 01:47 AM
  Performance analysis of K8 and Core onlineJames2008/10/29 02:36 AM
    Performance analysis of K8 and Core onlineMatt Sayler2008/10/29 05:37 AM
    Performance analysis of K8 and Core onlineDavid Kanter2008/10/30 12:00 PM
      Performance analysis of K8 and Core onlineMichael S2008/10/30 06:32 PM
        Performance analysis of K8 and Core onlineDavid Kanter2008/10/31 08:38 AM
        Performance analysis of K8 and Core onlineEduardoS2008/10/31 05:36 PM
          Performance analysis of K8 and Core onlineMichael S2008/11/01 08:21 AM
            Performance analysis of K8 and Core onlineEduardoS2008/11/01 11:15 AM
              Performance analysis of K8 and Core onlineanonymous2008/11/01 01:06 PM
              I stay correctedMichael S2008/11/01 01:53 PM
    Performance analysis of K8 and Core onlineDavid Kanter2008/11/08 06:23 PM
  Performance analysis of K8 and Core onlinehobold2008/10/29 06:36 AM
  Performance analysis of K8 and Core onlinerwessel2008/10/29 11:08 AM
    Performance analysis of K8 and Core onlineDavid Kanter2008/10/30 11:48 AM
    Error fixedDavid Kanter2008/11/08 06:22 PM
  I cache fetches.Jouni Osmala2008/10/31 01:10 AM
    I cache fetches.anon2008/10/31 02:57 AM
      I cache fetches.anon.moose2008/10/31 08:06 AM
        I cache fetches.anon2008/10/31 02:32 PM
          I cache fetches.Peter2008/10/31 03:38 PM
            I cache fetches - clarificationPeter2008/10/31 03:50 PM
        instruction fetch vs. icache accessDavid Kanter2008/10/31 06:19 PM
  Performance analysis of K8 and Core onlineHoward Chu2008/10/31 02:39 AM
    Performance analysis of K8 and Core onlineLinus Torvalds2008/10/31 08:08 AM
      Performance analysis of K8 and Core onlineEduardoS2008/10/31 01:11 PM
      Timing sensitive performanceDavid Kanter2008/11/03 09:52 AM
    Performance analysis of K8 and Core onlineDavid Kanter2008/11/01 11:53 PM
      Performance analysis of K8 and Core onlineMichael S2008/11/02 05:12 AM
      Performance analysis of K8 and Core onlineEduardoS2008/11/02 07:47 AM
    Performance analysis of K8 and Core onlineDavid Kanter2008/11/03 11:23 AM
  Shanghai vs Penryn Spec PowerJoe Chang2008/11/09 04:38 PM
    Shanghai vs Penryn Spec PowerEduardoS2008/11/09 05:14 PM
      Shanghai vs Penryn Spec PowerMichael S2008/11/09 06:08 PM
        Shanghai vs Penryn Spec PowerHenrik S2008/11/09 11:52 PM
        Shanghai vs Penryn Spec PowerEduardoS2008/11/10 05:32 AM
          Shanghai vs Penryn Spec PowerMichael S2008/11/10 06:56 AM
            Shanghai vs Penryn Spec PowerEduardoS2008/11/10 12:32 PM
              Shanghai vs Penryn Spec PowerMichael S2008/11/10 05:55 PM
                Shanghai vs Penryn Spec Poweranonymous2008/11/10 08:28 PM
                  Shanghai vs Penryn Spec PowerMichael S2008/11/11 02:32 AM
                    Shanghai vs Penryn Spec Poweranonymous2008/11/11 11:38 AM
                Shanghai vs Penryn Spec Powermpx2008/11/11 02:12 PM
                  Shanghai vs Penryn Spec PowerMichael S2008/11/11 04:39 PM
    Seaburg vs San ClementeMichael S2008/11/09 05:37 PM
      Seaburg vs San ClementeJoe Chang2008/11/09 08:11 PM
    Links to spec.org + commentsHenrik S2008/11/10 03:18 AM
      Links to spec.org + commentsMichael S2008/11/10 04:31 AM
        Links to spec.org + commentsHenrik S2008/11/10 08:09 AM
          Links to spec.org + commentsMichael S2008/11/10 08:58 AM
            Links to spec.org + commentsHenrik S2008/11/10 10:48 AM
      Links to spec.org + commentsEduardoS2008/11/10 05:12 AM
        Links to spec.org + commentsMichael S2008/11/10 07:00 AM
          Links to spec.org + commentspgerassi2008/11/11 10:29 AM
            Links to spec.org + commentsMichael S2008/11/11 11:15 AM
  Performance analysis of K8 and Core onlineaap2008/11/16 05:08 PM
    Performance analysis of K8 and Core onlineDavid Kanter2008/11/16 07:20 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊