CMPXCHG latency

By: nick (, March 31, 2008 7:20 pm
Room: Moderated Discussions
Michael S ( on 3/31/08 wrote:
>nick ( on 3/31/08 wrote:
>>Michael S ( on 3/31/08 wrote:
>>>Gabriele Svelto ( on 3/31/08 wrote:
>>>>Linus Torvalds ( on 3/29/08 wrote:
>>>>>If Intel really will be able to pipeline the thing, I'll
>>>>>be very happy. Spinlocks will still end up being
>>>>>very expensive for the cache-bouncing case, but they'd
>>>>>be basically free for the non-contended case.
>>>>That would be sooo nice. If I end up using a spinlock it is obviously because I've
>>>>gone to great lengths to make it as little contended as possible and it sucks *big*
>>>>time to have to pay a tens-of-cycles-long price for what is basically a no-op in 99.99% of the cases.
>>>Non-contended spinlock doesn't mean the absence of cache-bouncing. I'd guess that
>>>in majority of practical cases you will see a combination of very low contention
>>>rate with high (or, when the number of threads exceeds two, very high) cache-bouncing rates.
>>Not at all. With fine-grained locking and spinlocks-in-objects, cache hot spinlocks
>>are a _very_ common case.
>With fine-grained locking - I agree. But too fine-grained locking is probably a
>bad idea for other reason. Besides, how would you apply a fine-grain locking to
>something like [modifiable] balanced binary tree?

It is too complex to just fit a solution to such an underspecified problem. Often though, it is true that you have to simply use different (and even less optimal in a single-threaded environment) data structure in order to scale.

In many situations you can do a "dumb" fine-graining of more complex data structures by hashing. So you could have a hash of N binary trees. (although you lose some properties like traversals).

Other times, you need to implement genuinely lock free storage algorithms, and then once you have found the item you want, you can lock it and perform more complex operations on it. It is possible to perform a simple lock free algorithm on most dynamic data structures, but readers can get starved by writers and also you need to provide existence guarantee somehow (which can be hard if you don't have access to RCU).

>>Global cachelines get unscalable very easily, even if there is no contention.
>Exactly. Although "very" sounds like exaggeration.

Depends on what you consider a big system, and how frequent is the operation you want to perform. The cacheline transfer latency, by virtue of Amdahl's law, puts a strict upper limit on the throughput.

Say that average lock/unlock time is 1us including cacheline transfer (pretty generous if we're talking about more than a handful of sockets), and the then the upper limit on the number of ops per second is 1,000,000. If you can do 50,000 of these ops on a single core per second, then absolute upper limit of scalability is 20 cores worth of throughput, and actually the scalability will be poor much before 20 cores.

Luckisly for desktop / workstation, and especially for single socket with shared last level cache, yes the cacheline transfer is not such a problem (but still can be if you're stupid).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Nehalem Architecture: Improvements Detailed Blut Aus Nord2008/03/17 02:52 PM
  Nehalem Architecture: Improvements Detailed bah2008/03/17 04:45 PM
    Nehalem Architecture: Improvements Detailed Linus Torvalds2008/03/17 06:14 PM
      Nehalem Architecture: Improvements Detailed Gabriele Svelto2008/03/18 01:11 AM
        Nehalem Architecture: Improvements Detailed Henrik S2008/03/18 04:23 AM
        Nehalem Architecture: Improvements Detailed Doug Siebert2008/03/18 09:48 PM
          Nehalem Architecture: Improvements Detailed anon2008/03/18 10:37 PM
            Nehalem Architecture: Improvements Detailed Doug Siebert2008/03/19 05:23 PM
          Nehalem Architecture: Improvements Detailed Ian Ollmann2008/03/19 08:15 AM
            SSE 4.2 Michael S2008/03/19 04:13 PM
              SSE 4.2 Ian Ollmann2008/03/20 09:56 AM
              SSE 4.2 anonymous2008/03/20 12:29 PM
                SSE 4.2 David W. Hess2008/03/21 07:24 AM
                  SSE 4.2 anonymous2008/03/22 07:27 AM
      CMPXCHG latencyDavid Kanter2008/03/28 05:59 PM
        CMPXCHG latencyanonymous coward2008/03/28 10:24 PM
          CMPXCHG latencyDavid Kanter2008/03/28 10:26 PM
            CMPXCHG latencyLinus Torvalds2008/03/29 11:43 AM
              CMPXCHG latencyDavid W. Hess2008/03/29 11:56 AM
              CMPXCHG latencyLinus Torvalds2008/03/29 02:17 PM
                CMPXCHG latencyGabriele Svelto2008/03/31 12:25 AM
                  CMPXCHG latencyMichael S2008/03/31 12:38 AM
                    CMPXCHG latencynick2008/03/31 12:52 AM
                      CMPXCHG latencyMichael S2008/03/31 01:51 AM
                        CMPXCHG latencyGabriele Svelto2008/03/31 02:08 AM
                        CMPXCHG latencynick2008/03/31 07:20 PM
                          CMPXCHG latencyMichael S2008/04/01 01:14 AM
                            CMPXCHG latencynick2008/04/01 02:34 AM
                    CMPXCHG latencyLinus Torvalds2008/03/31 10:16 AM
                      CMPXCHG latencyAaron Spink2008/03/31 07:15 PM
                        CMPXCHG latencynick2008/03/31 07:34 PM
                        CMPXCHG latencyLinus Torvalds2008/04/01 08:25 AM
                          CMPXCHG latencyZan2008/04/01 09:54 PM
                          CMPXCHG latencyZan2008/04/02 12:11 AM
                            CMPXCHG latencyLinus Torvalds2008/04/02 08:04 AM
                              CMPXCHG latencyZan2008/04/02 11:02 AM
                                CMPXCHG latencyLinus Torvalds2008/04/02 12:02 PM
                                  CMPXCHG latencyZan2008/04/02 04:15 PM
                      CMPXCHG latencyMichael S2008/04/01 01:26 AM
                        CMPXCHG latencyLinus Torvalds2008/04/01 07:08 AM
                CMPXCHG latency - Intel sourceWouter Tinus2008/04/02 12:36 PM
                  CMPXCHG latency - Intel sourceLinus Torvalds2008/04/02 02:21 PM
                    CMPXCHG latency - Intel sourceDavid Kanter2008/04/02 02:39 PM
    Nehalem Architecture: Improvements Detailed Philip Honermann2008/03/19 01:11 PM
      Nehalem Architecture: Improvements Detailed Linus Torvalds2008/03/19 01:43 PM
        CMPXCHG - all or nothingMichael S2008/03/19 03:49 PM
          multithreading - all or nothingno@thanks.com2008/03/19 05:17 PM
          CMPXCHG - all or nothingLinus Torvalds2008/03/19 05:21 PM
            CMPXCHG - all or nothingMichael S2008/03/20 06:38 AM
              CMPXCHG - all or nothingLinus Torvalds2008/03/20 08:45 AM
                CMPXCHG - all or nothingMichael S2008/03/21 07:08 AM
                  CMPXCHG - all or nothingLinus Torvalds2008/03/21 08:47 AM
            CMPXCHG - all or nothingHenrik S2008/03/20 10:09 AM
              CMPXCHG - all or nothingLinus Torvalds2008/03/20 10:53 AM
                CMPXCHG - all or nothingHenrik S2008/03/20 12:03 PM
                  CMPXCHG - all or nothingLinus Torvalds2008/03/20 01:12 PM
                    CMPXCHG - all or nothingHenrik S2008/03/21 12:13 AM
                      CMPXCHG - all or nothingGabriele Svelto2008/03/21 01:22 AM
        Nehalem Architecture: Improvements Detailed Philip Honermann2008/03/19 06:28 PM
          Nehalem Architecture: Improvements Detailed Linus Torvalds2008/03/19 07:42 PM
            Nehalem Architecture: Improvements Detailed Philip Honermann2008/03/20 06:03 PM
              Nehalem Architecture: Improvements Detailed Linus Torvalds2008/03/20 06:33 PM
                Nehalem Architecture: Improvements Detailed Philip Honermann2008/03/25 06:37 AM
                  Nehalem Architecture: Improvements Detailed Linus Torvalds2008/03/25 08:52 AM
                    What is DCAS? (NT)David Kanter2008/03/25 10:13 AM
                      Double compare-and-exchangeHenrik S2008/03/25 10:57 AM
                        Double compare-and-exchangeLinus Torvalds2008/03/25 11:38 AM
                          Double compare-and-exchangesavantu2008/03/25 01:54 PM
                            Double compare-and-exchangeLinus Torvalds2008/03/25 04:09 PM
                              Double compare-and-exchangeJamie Lucier2008/03/25 08:55 PM
                                Double compare-and-exchangesavantu2008/03/25 09:15 PM
                                  Double compare-and-exchangeHenrik S2008/03/26 08:40 AM
                                    Double compare-and-exchangeArun Ramakrishnan2008/03/27 02:07 AM
                                      Double compare-and-exchangeHenrik S2008/03/27 04:45 AM
                                  Surely GPL applies ?Richard Cownie2008/03/26 10:05 AM
                                    Surely GPL applies ?anon2008/03/26 02:58 PM
                                    Surely GPL applies ?Paul2008/03/26 05:01 PM
                                Double compare-and-exchangesomeone2008/03/25 09:18 PM
                                  Double compare-and-exchangeArun Ramakrishnan2008/03/27 02:03 AM
                                    Double compare-and-exchangesavantu2008/03/27 03:01 AM
                                      Double compare-and-exchangeArun Ramakrishnan2008/03/30 09:09 AM
                                        Double compare-and-exchangesavantu2008/03/30 09:59 AM
                                Double compare-and-exchangeLinus Torvalds2008/03/26 10:50 AM
                                  Double compare-and-exchangeanon2008/03/26 04:47 PM
                                  Double compare-and-exchangePaul2008/03/26 05:07 PM
                          Double compare-and-exchangeHoward Chu2008/03/25 05:18 PM
  Nehalem Architecture: Improvements Detailed Mr. Camel2008/03/17 08:50 PM
    Nehalem Architecture: Improvements Detailed anonymous2008/03/17 09:20 PM
  TFP will finally come :-)Paul A. Clayton2008/03/18 12:56 PM
  Nehalem Architecture: Improvements Detailed IntelUser20002008/03/27 07:46 PM
    Nehalem Architecture: Improvements Detailed David Kanter2008/03/27 10:21 PM
      Nehalem Architecture: Improvements Detailed nick2008/03/27 11:06 PM
        Nehalem Architecture: Improvements Detailed David Kanter2008/03/28 02:45 PM
          Nehalem Architecture: Improvements Detailed nick2008/03/28 07:52 PM
  L1 I-cachepuzzled2008/04/01 07:53 AM
    L1 I-cacheS. Rao2008/04/01 09:47 AM
    L1 I-cacherwessel2008/04/01 12:23 PM
    L1 I-cacheGabriele Svelto2008/04/03 12:30 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊