By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), August 21, 2018 5:10 pm
Room: Moderated Discussions
Travis (travis.downs.delete@this.gmail.com) on August 21, 2018 4:27 pm wrote:
>
> The unfairness that was problematic was a constant stream of readers
> starving out writers, or between various writers or what?
We've had that, yes (and in fact, we've had that even for the regular rwlocks).
It's really nice when it works, but it can get really outrageously bad when there is contention. And the unfairness can become visible in the cache coherency protocol itself, where a CPU that just owned a lock has a much easier time grabbing it again immediately, because it might still be exclusive in the caches.
There's no guarantee that the cache coherency is fair, after all.
So we've almost invariably had to build in fairness in the queuing itself. Our spinlocks, for example, are not just a "owner" lock. No, they are ticket locks, so that people that get blocked on a spinlock get a particular ordering, and you don't get in the situation that some CPU's have an easier time re-taking the lock than others.
And yes, that ended up slowing the spinlocks down and making them more complex, but not really noticeably so - and the unfairness case was really noticeable on some big machines. To the point where people had watchdogs fire because some CPU wouldn't make progress for tens of seconds at a time, just because other CPU's could re-take the lock so quickly.
So I used to detest fairness. It makes locking harder and slower. But almost every time, we've found that if something can get contended, fairness isn't just a good idea, it's pretty much required.
> A reasonable strategy would seem to be to make such locks favor writers as far as fairness goes: once
> a writer expresses interest, no new readers enter the critical section.
That ends up being potentially even worse, because quite often the reader is the critical case, and the writer has to wait for other readers to finish anyway, so a writer-favoring lock can be really bad and then cause hickups for the readers, in case there is some way an untrusted user can schedule writers.
So what we do (and I might mis-remember) is
- if it sees the "no writer" flag, it just increments the percpu reader count, and is done
which basically makes the "default" reader case optimal. This is all done non-preemptibly because of that percpu sequence, of course.
But whenever a writer shows up, the percpu fast case simply goes away, and we fall back to a fair rwlock. Obviously with some logic on the part of the writer to wait for the readers that came in before it marked itself (by adding up those percpu counts).
So the percpu rwlock basically ends up just being a n almost perfectly normal rwlock when writers are around, but has a special fast-case for when there are no writers and it can use pure percpu accounting of readers. And there's some percpu and RCU logic for some of the serialization issues between these two states.
I may have oversimplified and misstated it a bit, but it's close to something like that.
Linus
>
> The unfairness that was problematic was a constant stream of readers
> starving out writers, or between various writers or what?
We've had that, yes (and in fact, we've had that even for the regular rwlocks).
It's really nice when it works, but it can get really outrageously bad when there is contention. And the unfairness can become visible in the cache coherency protocol itself, where a CPU that just owned a lock has a much easier time grabbing it again immediately, because it might still be exclusive in the caches.
There's no guarantee that the cache coherency is fair, after all.
So we've almost invariably had to build in fairness in the queuing itself. Our spinlocks, for example, are not just a "owner" lock. No, they are ticket locks, so that people that get blocked on a spinlock get a particular ordering, and you don't get in the situation that some CPU's have an easier time re-taking the lock than others.
And yes, that ended up slowing the spinlocks down and making them more complex, but not really noticeably so - and the unfairness case was really noticeable on some big machines. To the point where people had watchdogs fire because some CPU wouldn't make progress for tens of seconds at a time, just because other CPU's could re-take the lock so quickly.
So I used to detest fairness. It makes locking harder and slower. But almost every time, we've found that if something can get contended, fairness isn't just a good idea, it's pretty much required.
> A reasonable strategy would seem to be to make such locks favor writers as far as fairness goes: once
> a writer expresses interest, no new readers enter the critical section.
That ends up being potentially even worse, because quite often the reader is the critical case, and the writer has to wait for other readers to finish anyway, so a writer-favoring lock can be really bad and then cause hickups for the readers, in case there is some way an untrusted user can schedule writers.
So what we do (and I might mis-remember) is
- if it sees the "no writer" flag, it just increments the percpu reader count, and is done
which basically makes the "default" reader case optimal. This is all done non-preemptibly because of that percpu sequence, of course.
But whenever a writer shows up, the percpu fast case simply goes away, and we fall back to a fair rwlock. Obviously with some logic on the part of the writer to wait for the readers that came in before it marked itself (by adding up those percpu counts).
So the percpu rwlock basically ends up just being a n almost perfectly normal rwlock when writers are around, but has a special fast-case for when there are no writers and it can use pure percpu accounting of readers. And there's some percpu and RCU logic for some of the serialization issues between these two states.
I may have oversimplified and misstated it a bit, but it's close to something like that.
Linus
Topic | Posted By | Date |
---|---|---|
ARM turns to a god and a hero | AM | 2018/08/16 09:32 AM |
ARM turns to a god and a hero | Maynard Handley | 2018/08/16 09:41 AM |
ARM turns to a god and a hero | Doug S | 2018/08/16 11:11 AM |
ARM turns to a god and a hero | Geoff Langdale | 2018/08/16 11:59 PM |
ARM turns to a god and a hero | dmcq | 2018/08/17 05:12 AM |
ARM is somewhat misleading | Adrian | 2018/08/16 11:56 PM |
It's marketing material | Gabriele Svelto | 2018/08/17 01:00 AM |
It's marketing material | Michael S | 2018/08/17 03:13 AM |
It's marketing material | dmcq | 2018/08/17 05:23 AM |
It's marketing material | Andrei Frumusanu | 2018/08/17 07:25 AM |
It's marketing material | Linus Torvalds | 2018/08/17 11:20 AM |
It's marketing material | Groo | 2018/08/17 01:44 PM |
It's marketing material | Doug S | 2018/08/17 02:14 PM |
promises and deliveries | AM | 2018/08/17 02:32 PM |
promises and deliveries | Passing Through | 2018/08/17 03:02 PM |
Just by way of clarification | Passing Through | 2018/08/17 03:15 PM |
Just by way of clarification | AM | 2018/08/18 12:49 PM |
Just by way of clarification | Passing Through | 2018/08/18 01:34 PM |
This ain't the nineties any longer | Passing Through | 2018/08/18 01:54 PM |
This ain't the nineties any longer | Maynard Handley | 2018/08/18 02:50 PM |
This ain't the nineties any longer | Passing Through | 2018/08/18 03:57 PM |
This ain't the nineties any longer | Passing Through | 2018/09/06 02:42 PM |
This ain't the nineties any longer | Maynard Handley | 2018/09/07 04:10 PM |
This ain't the nineties any longer | Passing Through | 2018/09/07 04:48 PM |
This ain't the nineties any longer | Maynard Handley | 2018/09/07 05:22 PM |
Just by way of clarification | Wilco | 2018/08/18 01:26 PM |
Just by way of clarification | Passing Through | 2018/08/18 01:39 PM |
Just by way of clarification | none | 2018/08/18 10:52 PM |
Just by way of clarification | dmcq | 2018/08/19 08:32 AM |
Just by way of clarification | none | 2018/08/19 08:54 AM |
Just by way of clarification | dmcq | 2018/08/19 11:24 AM |
Just by way of clarification | none | 2018/08/19 11:52 AM |
Just by way of clarification | Gabriele Svelto | 2018/08/19 06:41 AM |
Just by way of clarification | Passing Through | 2018/08/19 09:25 AM |
Whiteboards at Gatwick airport anyone? | Passing Through | 2018/08/20 04:24 AM |
It's marketing material | Michael S | 2018/08/18 11:12 AM |
It's marketing material | Brett | 2018/08/18 05:22 PM |
It's marketing material | Brett | 2018/08/18 05:33 PM |
It's marketing material | Adrian | 2018/08/19 01:21 AM |
A76 | AM | 2018/08/17 02:45 PM |
A76 | Michael S | 2018/08/18 11:20 AM |
A76 | AM | 2018/08/18 12:39 PM |
A76 | Michael S | 2018/08/18 12:49 PM |
A76 | AM | 2018/08/18 01:06 PM |
A76 | Doug S | 2018/08/18 01:43 PM |
A76 | Maynard Handley | 2018/08/18 02:42 PM |
A76 | Maynard Handley | 2018/08/18 04:22 PM |
Why write zeros when one can use metadata? | Paul A. Clayton | 2018/08/18 06:19 PM |
Why write zeros when one can use metadata? | Maynard Handley | 2018/08/19 11:12 AM |
Dictionary compress might apply to memcopy | Paul A. Clayton | 2018/08/19 01:45 PM |
Instructions for zeroing | Konrad Schwarz | 2018/08/30 06:37 AM |
Instructions for zeroing | Maynard Handley | 2018/08/30 08:41 AM |
Instructions for zeroing | Adrian | 2018/08/30 11:37 AM |
dcbz -> dcbzl (was: Instructions for zeroing) | hobold | 2018/08/31 01:50 AM |
dcbz -> dcbzl (was: Instructions for zeroing) | dmcq | 2018/09/01 05:28 AM |
A76 | Travis | 2018/08/19 11:36 AM |
A76 | Maynard Handley | 2018/08/19 12:22 PM |
A76 | Travis | 2018/08/19 02:07 PM |
A76 | Maynard Handley | 2018/08/19 06:24 PM |
Remote atomics | matthew | 2018/08/19 12:51 PM |
Remote atomics | Michael S | 2018/08/19 01:58 PM |
Remote atomics | matthew | 2018/08/19 02:32 PM |
Remote atomics | Michael S | 2018/08/19 02:36 PM |
Remote atomics | matthew | 2018/08/19 02:48 PM |
Remote atomics | Michael S | 2018/08/19 03:16 PM |
Remote atomics | Ricardo B | 2018/08/20 10:05 AM |
Remote atomics | dmcq | 2018/08/19 02:33 PM |
Remote atomics | Travis | 2018/08/19 02:32 PM |
Remote atomics | Michael S | 2018/08/19 02:46 PM |
Remote atomics | Travis | 2018/08/19 05:35 PM |
Remote atomics | Michael S | 2018/08/20 03:29 AM |
Remote atomics | matthew | 2018/08/19 07:58 PM |
Remote atomics | anon | 2018/08/20 12:59 AM |
Remote atomics | Travis | 2018/08/20 10:26 AM |
Remote atomics | Travis | 2018/08/20 09:57 AM |
Remote atomics | Linus Torvalds | 2018/08/20 04:29 PM |
Fitting time slices to execution phases | Paul A. Clayton | 2018/08/21 09:09 AM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 02:34 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 03:31 PM |
Fitting time slices to execution phases | Gabriele Svelto | 2018/08/21 03:54 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 04:26 PM |
Fitting time slices to execution phases | Travis | 2018/08/21 04:21 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 04:39 PM |
Fitting time slices to execution phases | Travis | 2018/08/21 04:59 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 05:13 PM |
Fitting time slices to execution phases | anon | 2018/08/21 04:27 PM |
Fitting time slices to execution phases | Linus Torvalds | 2018/08/21 06:02 PM |
Fitting time slices to execution phases | Etienne | 2018/08/22 02:28 AM |
Fitting time slices to execution phases | Gabriele Svelto | 2018/08/22 03:07 PM |
Fitting time slices to execution phases | Travis | 2018/08/22 04:00 PM |
Fitting time slices to execution phases | anon | 2018/08/22 06:52 PM |
Fitting time slices to execution phases | Travis | 2018/08/21 04:37 PM |
Is preventing misuse that complex? | Paul A. Clayton | 2018/08/23 05:42 AM |
Is preventing misuse that complex? | Linus Torvalds | 2018/08/23 12:46 PM |
Is preventing misuse that complex? | Travis | 2018/08/23 01:29 PM |
Is preventing misuse that complex? | Travis | 2018/08/23 01:33 PM |
Is preventing misuse that complex? | Jeff S. | 2018/08/24 07:57 AM |
Is preventing misuse that complex? | Travis | 2018/08/24 08:47 AM |
Is preventing misuse that complex? | Linus Torvalds | 2018/08/23 02:30 PM |
Is preventing misuse that complex? | Travis | 2018/08/23 03:11 PM |
Is preventing misuse that complex? | Linus Torvalds | 2018/08/24 01:00 PM |
Is preventing misuse that complex? | Gabriele Svelto | 2018/08/24 01:25 PM |
Is preventing misuse that complex? | Linus Torvalds | 2018/08/24 01:33 PM |
Fitting time slices to execution phases | Travis | 2018/08/21 03:54 PM |
rseq: holy grail rwlock? | Travis | 2018/08/21 03:18 PM |
rseq: holy grail rwlock? | Linus Torvalds | 2018/08/21 03:59 PM |
rseq: holy grail rwlock? | Travis | 2018/08/21 04:27 PM |
rseq: holy grail rwlock? | Linus Torvalds | 2018/08/21 05:10 PM |
rseq: holy grail rwlock? | Travis | 2018/08/21 06:21 PM |
ARM design houses | Michael S | 2018/08/21 05:07 AM |
ARM design houses | Wilco | 2018/08/22 12:38 PM |
ARM design houses | Michael S | 2018/08/22 02:21 PM |
ARM design houses | Wilco | 2018/08/22 03:23 PM |
ARM design houses | Michael S | 2018/08/29 01:58 AM |
Qualcomm's core naming scheme really, really sucks | Heikki Kultala | 2018/08/29 02:19 AM |
A76 | Maynard Handley | 2018/08/18 02:07 PM |
A76 | Michael S | 2018/08/18 02:32 PM |
A76 | Maynard Handley | 2018/08/18 02:52 PM |
A76 | Michael S | 2018/08/18 03:04 PM |
ARM is somewhat misleading | juanrga | 2018/08/17 01:20 AM |
Surprised?? | Alberto | 2018/08/17 01:52 AM |
Surprised?? | Alberto | 2018/08/17 02:10 AM |
Surprised?? | none | 2018/08/17 02:46 AM |
Garbage talk | Andrei Frumusanu | 2018/08/17 07:30 AM |
Garbage talk | Michael S | 2018/08/17 07:43 AM |
Garbage talk | Andrei Frumusanu | 2018/08/17 09:51 AM |
Garbage talk | Michael S | 2018/08/18 11:29 AM |
Garbage talk | Adrian | 2018/08/17 08:28 AM |
Garbage talk | Alberto | 2018/08/17 09:20 AM |
Garbage talk | Andrei Frumusanu | 2018/08/17 09:48 AM |
Garbage talk | Adrian | 2018/08/17 10:17 AM |
Garbage talk | Andrei Frumusanu | 2018/08/17 10:36 AM |
Garbage talk | Adrian | 2018/08/17 02:53 PM |
Garbage talk | Andrei Frumusanu | 2018/08/18 12:17 AM |
More like a religion he?? ARM has an easy life :) | Alberto | 2018/08/17 09:13 AM |
More like a religion he?? ARM has an easy life :) | Andrei Frumusanu | 2018/08/17 09:34 AM |
More like a religion he?? ARM has an easy life :) | Alberto | 2018/08/17 10:03 AM |
More like a religion he?? ARM has an easy life :) | Andrei Frumusanu | 2018/08/17 10:43 AM |
More like a religion he?? ARM has an easy life :) | Doug S | 2018/08/17 02:17 PM |
15W phone SoCs | AM | 2018/08/17 03:04 PM |
More like a religion he?? ARM has an easy life :) | Maynard Handley | 2018/08/17 12:29 PM |
my future stuff will be better than your old stuff, hey I'm a god at last (NT) | Eric Bron | 2018/08/18 03:34 AM |
my future stuff will be better than your old stuff, hey I'm a god at last | none | 2018/08/18 08:34 AM |