By: nick (anon.delete@this.anon.com), May 14, 2006 10:46 pm
Room: Moderated Discussions
Brendan (btrotter@gmail.com) on 5/14/06 wrote:
---------------------------
>Hi,
>
>nick (anon@anon.com) on 5/14/06 wrote:
>>Doesn't matter, if they have to both take the lock ->
>>cacheline bouncing; if one CPU writes a messages which is
>>read by another -> cacheline bouncing.
>
>That would involve a cache line fill. It's not even close to the performance effects
>of lock contention (which either involves repeated cache
No. I told you, you're 10 years out of date. Lock
contention is largely not an issue any more, it is cache
misses "cache line fill". Read some of literature for
reasons why there is increasing focus on lock-free
algorithms and data structures.
Maybe in your big lock based kernel, contention is an
issue, but not for those (eg Linux) which care about
scalability.
>line fills or context switches and TLB flushing).
TLB flushing? You mean due to sleeping locks causing
context switches? Again that's obviously bad if it is
out of control, but that is not the case on real OSes.
You're first assuming that there is a large amount of
lock contention in scalable OSes then concluding
that, therefore, the relatively minor cost of cacheline
contention is unimportant.
Your assumption is wrong.
>
>>>Given that (IIRC) Linux itself used a "big kernel lock" and nothing else for version
>>>2.0, and that this lock is still used (although for a lot less than it originally
>>>was), and that the slowest kernel function within Linux 2.0 would've been much longer,
>>>I don't think you're in a position to argue too much.
>>
>>Err, I don't follow your logic at all. Linux 2.0 has a
>>global lock, scalability sucked. What is my position, and
>>why isn't it one to argue too much? How much am I allowed
>>to argue?
>
>Linux 2.0 used a big kernel lock that would've been acceptable for that release
>(given developer time constraints and a probable desire to support 2-way SMP before
>worrying much about larger systems). Sometimes scalability isn't the most important
>factor - if it was they probably wouldn't have used a big kernel lock to start with.
>
Of course, my position was never contrary. Again, why am I
not in a position to argue about this too much?
>Still, I'm getting "side-tracked". My original comment was about the time taken
>by the longest kernel function. Consider a system where a CPU spends 50% of it's
>time in the kernel on average, compared to a system where a CPU spends 1% of it's
>time in the kernel on average. For the first system "big kernel lock contention"
>is extremely likely (even for a 2-way system), while for the second system "big
>kernel lock contention" is unlikely (even for 8-way SMP).
I'm not talking about lock contention, I'm talking about
cacheline contention. But if all else was equal, of course
you're right.
However, your 1% system may need to enter the kernel much
more *frequently*, in which case the cacheline contention
might be as bad or worse.
>
>For monolithic systems the time spent in the kernel includes the time spent within
>all device drivers, so the time spent inside the kernel is going to be greater than
>an equivelent micro-kernel (where time spent in the kernel doesn't include the time
>spent within device drivers). Therefore using a big kernel lock is going to effect
>performance/scalability in a monolithic system much more than it would effect performance/scalability
>in an equivelent micro-kernel.
>
Yes. But nobody would call a monolithic big lock kernel
scalable, so I don't think you can transitively conclude
anything about the microkernel.
>>In Linux, the control data does not need to go across
>>node because it is basically kept in the stack of the
>>running process. Instruction text is a different matter,
>>but AFAIK, that can be replicated on the big NUMA machines
>>in Linux.
>
>My last prototype used multiple copies of the kernel (one for each NUMA domain)
>and my current prototype uses multiple copies of the kernel and multiple copies
>of "static kernel data" (anything that doesn't change after boot). My kernels are
>normally less than 64 KB though so the memory costs aren't too bad and it's still
>sensible when the "across node penalty" isn't too high (e.g. 2 or 4 way Opterons).
>Linux is typically around 3 MB so you might get better results using this memory for disk cache instead. :-)
Well are you replecating the text of your servers?
Considering that nobody in Linux even cares that much
about it except the guys with 1024 CPU systems, I'm
guessing it is completely unmeasurable on your kernel
(outside microbenchmarks, maybe). :-)
I'd wager that passing messages across interconnect would
be more interesting...
---------------------------
>Hi,
>
>nick (anon@anon.com) on 5/14/06 wrote:
>>Doesn't matter, if they have to both take the lock ->
>>cacheline bouncing; if one CPU writes a messages which is
>>read by another -> cacheline bouncing.
>
>That would involve a cache line fill. It's not even close to the performance effects
>of lock contention (which either involves repeated cache
No. I told you, you're 10 years out of date. Lock
contention is largely not an issue any more, it is cache
misses "cache line fill". Read some of literature for
reasons why there is increasing focus on lock-free
algorithms and data structures.
Maybe in your big lock based kernel, contention is an
issue, but not for those (eg Linux) which care about
scalability.
>line fills or context switches and TLB flushing).
TLB flushing? You mean due to sleeping locks causing
context switches? Again that's obviously bad if it is
out of control, but that is not the case on real OSes.
You're first assuming that there is a large amount of
lock contention in scalable OSes then concluding
that, therefore, the relatively minor cost of cacheline
contention is unimportant.
Your assumption is wrong.
>
>>>Given that (IIRC) Linux itself used a "big kernel lock" and nothing else for version
>>>2.0, and that this lock is still used (although for a lot less than it originally
>>>was), and that the slowest kernel function within Linux 2.0 would've been much longer,
>>>I don't think you're in a position to argue too much.
>>
>>Err, I don't follow your logic at all. Linux 2.0 has a
>>global lock, scalability sucked. What is my position, and
>>why isn't it one to argue too much? How much am I allowed
>>to argue?
>
>Linux 2.0 used a big kernel lock that would've been acceptable for that release
>(given developer time constraints and a probable desire to support 2-way SMP before
>worrying much about larger systems). Sometimes scalability isn't the most important
>factor - if it was they probably wouldn't have used a big kernel lock to start with.
>
Of course, my position was never contrary. Again, why am I
not in a position to argue about this too much?
>Still, I'm getting "side-tracked". My original comment was about the time taken
>by the longest kernel function. Consider a system where a CPU spends 50% of it's
>time in the kernel on average, compared to a system where a CPU spends 1% of it's
>time in the kernel on average. For the first system "big kernel lock contention"
>is extremely likely (even for a 2-way system), while for the second system "big
>kernel lock contention" is unlikely (even for 8-way SMP).
I'm not talking about lock contention, I'm talking about
cacheline contention. But if all else was equal, of course
you're right.
However, your 1% system may need to enter the kernel much
more *frequently*, in which case the cacheline contention
might be as bad or worse.
>
>For monolithic systems the time spent in the kernel includes the time spent within
>all device drivers, so the time spent inside the kernel is going to be greater than
>an equivelent micro-kernel (where time spent in the kernel doesn't include the time
>spent within device drivers). Therefore using a big kernel lock is going to effect
>performance/scalability in a monolithic system much more than it would effect performance/scalability
>in an equivelent micro-kernel.
>
Yes. But nobody would call a monolithic big lock kernel
scalable, so I don't think you can transitively conclude
anything about the microkernel.
>>In Linux, the control data does not need to go across
>>node because it is basically kept in the stack of the
>>running process. Instruction text is a different matter,
>>but AFAIK, that can be replicated on the big NUMA machines
>>in Linux.
>
>My last prototype used multiple copies of the kernel (one for each NUMA domain)
>and my current prototype uses multiple copies of the kernel and multiple copies
>of "static kernel data" (anything that doesn't change after boot). My kernels are
>normally less than 64 KB though so the memory costs aren't too bad and it's still
>sensible when the "across node penalty" isn't too high (e.g. 2 or 4 way Opterons).
>Linux is typically around 3 MB so you might get better results using this memory for disk cache instead. :-)
Well are you replecating the text of your servers?
Considering that nobody in Linux even cares that much
about it except the guys with 1024 CPU systems, I'm
guessing it is completely unmeasurable on your kernel
(outside microbenchmarks, maybe). :-)
I'd wager that passing messages across interconnect would
be more interesting...
Topic | Posted By | Date |
---|---|---|
Hybrid (micro)kernels | Tzvetan Mikov | 2006/05/08 03:41 PM |
Hybrid (micro)kernels | S. Rao | 2006/05/08 05:14 PM |
Hybrid (micro)kernels | Bill Todd | 2006/05/08 05:16 PM |
Hybrid (micro)kernels | Tzvetan Mikov | 2006/05/08 06:21 PM |
Hybrid (micro)kernels | nick | 2006/05/08 06:50 PM |
Hybrid (micro)kernels | Bill Todd | 2006/05/09 12:26 AM |
There aren't enough words... | Rob Thorpe | 2006/05/09 01:39 AM |
There aren't enough words... | Tzvetan Mikov | 2006/05/09 02:10 PM |
There aren't enough words... | Rob Thorpe | 2006/05/14 11:25 PM |
Hybrid (micro)kernels | Tzvetan Mikov | 2006/05/09 10:17 AM |
Hybrid (micro)kernels | Bill Todd | 2006/05/09 03:05 PM |
Hybrid (micro)kernels | rwessel | 2006/05/08 10:23 PM |
Hybrid kernel, not NT | Richard Urich | 2006/05/09 05:03 AM |
Hybrid kernel, not NT | _Arthur | 2006/05/09 06:06 AM |
Hybrid kernel, not NT | Rob Thorpe | 2006/05/09 06:40 AM |
Hybrid kernel, not NT | _Arthur | 2006/05/09 07:30 AM |
Hybrid kernel, not NT | Rob Thorpe | 2006/05/09 08:07 AM |
Hybrid kernel, not NT | _Arthur | 2006/05/09 08:36 AM |
Linux vs MacOSX peformance, debunked | _Arthur | 2006/05/18 06:30 AM |
Linux vs MacOSX peformance, debunked | Rob Thorpe | 2006/05/18 07:19 AM |
Linux vs MacOSX peformance, debunked | Anonymous | 2006/05/18 11:31 AM |
Hybrid kernel, not NT | Linus Torvalds | 2006/05/09 07:16 AM |
Hybrid kernel, not NT | Andi Kleen | 2006/05/09 01:32 PM |
Hybrid kernel, not NT | myself | 2006/05/09 02:24 PM |
Hybrid kernel, not NT | myself | 2006/05/09 02:41 PM |
Hybrid kernel, not NT | Brendan | 2006/05/09 04:26 PM |
Hybrid kernel, not NT | Linus Torvalds | 2006/05/09 07:06 PM |
Hybrid kernel, not NT | Brendan | 2006/05/13 12:35 AM |
Hybrid kernel, not NT | nick | 2006/05/13 03:40 AM |
Hybrid kernel, not NT | Brendan | 2006/05/13 08:48 AM |
Hybrid kernel, not NT | nick | 2006/05/13 06:41 PM |
Hybrid kernel, not NT | Brendan | 2006/05/13 08:51 PM |
Hybrid kernel, not NT | nick | 2006/05/14 04:57 PM |
Hybrid kernel, not NT | Brendan | 2006/05/14 09:40 PM |
Hybrid kernel, not NT | nick | 2006/05/14 10:46 PM |
Hybrid kernel, not NT | Brendan | 2006/05/15 03:00 AM |
Hybrid kernel, not NT | rwessel | 2006/05/15 06:21 AM |
Hybrid kernel, not NT | Brendan | 2006/05/15 07:55 AM |
Hybrid kernel, not NT | Linus Torvalds | 2006/05/15 08:49 AM |
Hybrid kernel, not NT | nick | 2006/05/15 03:41 PM |
Hybrid kernel, not NT | tony roth | 2008/01/31 01:20 PM |
Hybrid kernel, not NT | nick | 2006/05/15 05:33 PM |
Hybrid kernel, not NT | Brendan | 2006/05/16 12:39 AM |
Hybrid kernel, not NT | nick | 2006/05/16 01:53 AM |
Hybrid kernel, not NT | Brendan | 2006/05/16 04:37 AM |
Hybrid kernel, not NT | Anonymous | 2008/05/01 09:31 PM |
Following the structure of the tree | Michael S | 2008/05/02 03:19 AM |
Following the structure of the tree | Dean Kent | 2008/05/02 04:31 AM |
Following the structure of the tree | Michael S | 2008/05/02 05:02 AM |
Following the structure of the tree | David W. Hess | 2008/05/02 05:48 AM |
Following the structure of the tree | Dean Kent | 2008/05/02 08:14 AM |
Following the structure of the tree | David W. Hess | 2008/05/02 09:05 AM |
LOL! | Dean Kent | 2008/05/02 09:33 AM |
Following the structure of the tree | anonymous | 2008/05/02 02:04 PM |
Following the structure of the tree | Dean Kent | 2008/05/02 06:52 PM |
Following the structure of the tree | Foo_ | 2008/05/03 01:01 AM |
Following the structure of the tree | David W. Hess | 2008/05/03 05:54 AM |
Following the structure of the tree | Dean Kent | 2008/05/03 09:06 AM |
Following the structure of the tree | Foo_ | 2008/05/04 12:06 AM |
Following the structure of the tree | Michael S | 2008/05/04 12:22 AM |
Hybrid kernel, not NT | Linus Torvalds | 2006/05/09 04:19 PM |
Microkernel Vs Monolithic Kernel | Kernel_Protector | 2006/05/09 08:41 PM |
Microkernel Vs Monolithic Kernel | David Kanter | 2006/05/09 09:30 PM |
Sigh, Stand back, its slashdotting time. (NT) | Anonymous | 2006/05/09 09:44 PM |
Microkernel Vs Monolithic Kernel | blah | 2006/05/12 07:58 PM |
Microkernel Vs Monolithic Kernel | Rob Thorpe | 2006/05/15 12:41 AM |
Hybrid kernel, not NT | AnalGuy | 2006/05/16 02:10 AM |
Theory versus practice | David Kanter | 2006/05/16 11:55 AM |
Distributed algorithms | Rob Thorpe | 2006/05/16 11:53 PM |
Theory versus practice | Howard Chu | 2006/05/17 01:54 AM |
Theory versus practice | JS | 2006/05/17 03:29 AM |
Play online poker, blackjack !!! | Gamezonex | 2007/08/16 12:49 PM |
Hybrid kernel, not NT (NT) | atle rene mossik | 2020/12/12 08:31 AM |
Hybrid (micro)kernels | philt | 2006/05/14 08:15 PM |
Hybrid (micro)kernels | Linus Torvalds | 2006/05/15 07:20 AM |
Hybrid (micro)kernels | Linus Torvalds | 2006/05/15 10:56 AM |
Hybrid (micro)kernels | Rob Thorpe | 2006/05/16 12:22 AM |
Hybrid (micro)kernels | rwessel | 2006/05/16 10:23 AM |
Hybrid (micro)kernels | Rob Thorpe | 2006/05/16 11:43 PM |
Hybrid (micro)kernels | rwessel | 2006/05/17 12:33 AM |
Hybrid (micro)kernels | Rob Thorpe | 2006/05/19 06:51 AM |
Hybrid (micro)kernels | rwessel | 2006/05/19 11:27 AM |
Hybrid (micro)kernels | techIperson | 2006/05/15 12:25 PM |
Hybrid (micro)kernels | mas | 2006/05/15 04:17 PM |
Hybrid (micro)kernels | Linus Torvalds | 2006/05/15 04:39 PM |
Hybrid (micro)kernels | Colonel Kernel | 2006/05/15 08:17 PM |
Hybrid (micro)kernels | Wink Saville | 2006/05/15 09:31 PM |
Hybrid (micro)kernels | Linus Torvalds | 2006/05/16 09:08 AM |
Hybrid (micro)kernels | Wink Saville | 2006/05/16 08:55 PM |
Hybrid (micro)kernels | rwessel | 2006/05/16 10:31 AM |
Hybrid (micro)kernels | Linus Torvalds | 2006/05/16 11:00 AM |
Hybrid (micro)kernels | Brendan | 2006/05/16 12:36 AM |
Hybrid (micro)kernels | Paul Elliott | 2006/09/03 07:44 AM |
Hybrid (micro)kernels | Rob Thorpe | 2006/09/04 08:25 AM |
Hybrid (micro)kernels | philt | 2006/05/15 11:55 PM |
Hybrid (micro)kernels | pgerassi | 2007/08/16 06:41 PM |
Another questionable entry on Wikipedia? | Chung Leong | 2006/05/18 09:33 AM |
Hybrid (micro)kernels | israel | 2006/05/20 03:25 AM |
Hybrid (micro)kernels | Rob Thorpe | 2006/05/22 07:35 AM |