Hybrid kernel, not NT

By: Brendan (btrotter.delete@this.gmail.com), May 16, 2006 12:39 am
Room: Moderated Discussions
Hi,

nick (anon@anon.com) on 5/15/06 wrote:
>Brendan (btrotter@gmail.com) on 5/15/06 wrote:
>---------------------------

>>>Well are you replecating the text of your servers?

>>No - they are independant processes that use CPU affinity to ensure that they
>are always run on the same NUMA domain...

>So you don't use shared libraries, or share program text?
>OK, now you've lost the same amount of memory as Linux with
>text replication.

I don't use shared libraries and have no intention of supporting them. A large shared library would be implemented as a seperate service and used asynchronously, while a small shared library would be statically linked.

I don't support shared program text yet, but probably will eventually. To be honest I need to figure out the best way to do it and will probably wait until I can benchmark different approaches before deciding (e.g. figure out when it's worth having seperate copies of the shared program text for different NUMA domains). Another option is to spawn a new thread in an existing process instead of creating a new process, but this would only supported by some processes, and would need to depend on CPU load balancing.

>Do you use threads of a single memory space running on
>different nodes?

Yes, but I split user space into "process space" and "thread space", such that thread space can't be accessed from other threads. It's a little like "thread local data" in POSIX, only implemented so that seperation is enforced. The disadvantage is that switching between threads that belong to the same process involves changing address spaces and is as expensive as switching between processes. The advantages are that security of the threads local data is enforced, a thread's data doesn't suffer from cacheline bouncing (or "across NUMA node" access penalties if the process itself isn't tied to a specific NUMA node), the linear memory manager never needs to lock thread space before changing it, and it can help to avoid linear address space size restrictions on some machines (e.g. on 32 bit 80x86 you get 2 GB per thread plus 1 GB of process space, rather than 3 GB for everything).

>>>Considering that nobody in Linux even cares that much
>>>about it except the guys with 1024 CPU systems, I'm
>>>guessing it is completely unmeasurable on your kernel
>>>(outside microbenchmarks, maybe). :-)
>>
>>Given that both AMD and Intel are increasing the number of cores rather than increasing
>>core frequency (and that I predict Intel will be shifting to something like hypertransport/NUMA
>>in the near future), the number of people who care about it is probably going to
>>increase a lot by the time it matters to me.

>Wrong. Number of cores has nothing to do with it, and
>desktops/workstations/small servers will never care much
>about NUMA issues because there just aren't enough sockets
>to make a difference. Improvement on even an 8 socket
>Opteron is probably unmeasurable on Linux, for example.

For Opteron one hop is about 25% slower and 2 hops is about 50% slower. I couldn't find figures for 3 hops (which is necessary for 8 sockets when there's only 3 hypertransport links and something needs to connect to an I/O hub), and the figures I did find vary a fair bit between different sources.

>The systems I'm talking about have local/remote latency
>ratios of 10:1, and going from one end of the interconnect
>to the other takes ~8 router hops over probably 20 or more
>meters.
>
>*Those* guys are just starting to care about it a little
>bit. And not so much because the slowdown is noticable for
>the nodes taking icache faults from remote memory, but
>because the combined effect of all of them saturates node0's
>interconnect.

For Linux (IIRC) the kernel is loaded into the first 16 MB of physical memory which usually corresponds to node0. In this case node0 would have to cope with all accesses to the kernel's code and data (including device driver's, locks, etc) plus the traffic going to/from the first I/O controller. It would be a complete disaster - simply loading the kernel into the highest physical pages would alleviate the pressure on node0 (and possibly shift the pressure to a different node, although traffic to/from the first I/O controller hub wouldn't add to the pressure in this case). Unfortunately, I don't know enough about Linux's NUMA support and might be completely wrong.

In general, fixing the worst performance problem tends to give you a new worst performance problem (that isn't quite as bad as the old worst performance problem). If you fix the pressure on node0 somehow, what would be the new worst performance problem for large NUMA systems?

>>My work consists of a series of prototypes, where each prototype builds on the
>>last. The newest prototype uses a "modular micro-kernel", is 32 bit and 64 bit and
>>is designed to scale to large NUMA systems. I've basically reached the end of the
>>series of prototypes (there's nothing left to add and the worst of the bottlenecks
>>are gone). With some luck, my current prototype will become the basis for an OS.
>>I'm expecting it to take another 3 years before I've got a bare working system running
>>on legacy hardware, but it's too different to port applications (or drivers) to
>>it and it'll probably take 10 years or more before it's actually usable. I knew
>>this before I started, which is why I've spent so much time making sure the kernel design is "right".
>>
>>Anyway, real world benchmarks (like comparing web server and database performance) is a long way off...
>
>So how do you know you've done it right? Are you designing
>based on assumptions, or real testing? Is this work public?

Assumption to begin with, but for each prototype I examine/test it and find "problem areas", and then try to avoid the problems in later prototypes. Of course at this stage I'm only really trying to avoid design problems - implementation problems can be fixed later. The kernel is closed source freeware, while the rest will be a mixture (open source where possible). There is a web-site for it but the project isn't "interesting" yet.

>Kudos for trying, but it still doesn't sound convincing. K42
>claims to be a microkernel, and occasionally they get really
>excited about finding somewhere that Linux doesn't scale too
>well at, and beat it. Which obviously turns out to be a
>place that nobody ever cares about anyway.

For me, the goal isn't to perform better. One of the goals is to do everything asynchronously - for N operations you send N requests and continue working until you receive N replies (rather than doing the operations one at a time, or spawning N threads). For example, for 4 operations on 4 CPUs you might end up using a total of 40000 cycles and complete everything after 10000 cycles (instead of doing the operations one at a time and using a total of 20000 cycles and completing everything after 20000 cycles). In this case performance is halved but you complete the operations twice as fast. Of course this is a complete over-simplification, but hopefully you see what I mean.

In general this is a pain in the neck - something that seems simple (like the standard "fopen()" function) becomes a request where the status is a reply that is received some time later, and you need to use different programming practices (e.g. a central message handling loop and something like a state machine instead of linear programming). It sounds stupid (especially for single-CPU systems) - it's only when you get a pool of computers that starts to make some sense (and yes, I'll admit that how much sense it makes remains to be seen)...


Cheers,

Brendan
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Hybrid (micro)kernelsTzvetan Mikov2006/05/08 03:41 PM
  Hybrid (micro)kernelsS. Rao2006/05/08 05:14 PM
  Hybrid (micro)kernelsBill Todd2006/05/08 05:16 PM
    Hybrid (micro)kernelsTzvetan Mikov2006/05/08 06:21 PM
      Hybrid (micro)kernelsnick2006/05/08 06:50 PM
      Hybrid (micro)kernelsBill Todd2006/05/09 12:26 AM
        There aren't enough words...Rob Thorpe2006/05/09 01:39 AM
          There aren't enough words...Tzvetan Mikov2006/05/09 02:10 PM
            There aren't enough words...Rob Thorpe2006/05/14 11:25 PM
        Hybrid (micro)kernelsTzvetan Mikov2006/05/09 10:17 AM
          Hybrid (micro)kernelsBill Todd2006/05/09 03:05 PM
  Hybrid (micro)kernelsrwessel2006/05/08 10:23 PM
    Hybrid kernel, not NTRichard Urich2006/05/09 05:03 AM
      Hybrid kernel, not NT_Arthur2006/05/09 06:06 AM
        Hybrid kernel, not NTRob Thorpe2006/05/09 06:40 AM
          Hybrid kernel, not NT_Arthur2006/05/09 07:30 AM
            Hybrid kernel, not NTRob Thorpe2006/05/09 08:07 AM
              Hybrid kernel, not NT_Arthur2006/05/09 08:36 AM
                Linux vs MacOSX peformance, debunked_Arthur2006/05/18 06:30 AM
                  Linux vs MacOSX peformance, debunkedRob Thorpe2006/05/18 07:19 AM
                    Linux vs MacOSX peformance, debunkedAnonymous2006/05/18 11:31 AM
        Hybrid kernel, not NTLinus Torvalds2006/05/09 07:16 AM
          Hybrid kernel, not NTAndi Kleen2006/05/09 01:32 PM
            Hybrid kernel, not NTmyself2006/05/09 02:24 PM
              Hybrid kernel, not NTmyself2006/05/09 02:41 PM
              Hybrid kernel, not NTBrendan2006/05/09 04:26 PM
                Hybrid kernel, not NTLinus Torvalds2006/05/09 07:06 PM
                  Hybrid kernel, not NTBrendan2006/05/13 12:35 AM
                    Hybrid kernel, not NTnick2006/05/13 03:40 AM
                      Hybrid kernel, not NTBrendan2006/05/13 08:48 AM
                        Hybrid kernel, not NTnick2006/05/13 06:41 PM
                          Hybrid kernel, not NTBrendan2006/05/13 08:51 PM
                            Hybrid kernel, not NTnick2006/05/14 04:57 PM
                              Hybrid kernel, not NTBrendan2006/05/14 09:40 PM
                                Hybrid kernel, not NTnick2006/05/14 10:46 PM
                                  Hybrid kernel, not NTBrendan2006/05/15 03:00 AM
                                    Hybrid kernel, not NTrwessel2006/05/15 06:21 AM
                                      Hybrid kernel, not NTBrendan2006/05/15 07:55 AM
                                        Hybrid kernel, not NTLinus Torvalds2006/05/15 08:49 AM
                                          Hybrid kernel, not NTnick2006/05/15 03:41 PM
                                          Hybrid kernel, not NTtony roth2008/01/31 01:20 PM
                                    Hybrid kernel, not NTnick2006/05/15 05:33 PM
                                      Hybrid kernel, not NTBrendan2006/05/16 12:39 AM
                                        Hybrid kernel, not NTnick2006/05/16 01:53 AM
                                          Hybrid kernel, not NTBrendan2006/05/16 04:37 AM
                  Hybrid kernel, not NTAnonymous2008/05/01 09:31 PM
                    Following the structure of the treeMichael S2008/05/02 03:19 AM
                      Following the structure of the treeDean Kent2008/05/02 04:31 AM
                        Following the structure of the treeMichael S2008/05/02 05:02 AM
                        Following the structure of the treeDavid W. Hess2008/05/02 05:48 AM
                          Following the structure of the treeDean Kent2008/05/02 08:14 AM
                            Following the structure of the treeDavid W. Hess2008/05/02 09:05 AM
                              LOL!Dean Kent2008/05/02 09:33 AM
                              Following the structure of the treeanonymous2008/05/02 02:04 PM
                                Following the structure of the treeDean Kent2008/05/02 06:52 PM
                                Following the structure of the treeFoo_2008/05/03 01:01 AM
                                  Following the structure of the treeDavid W. Hess2008/05/03 05:54 AM
                                    Following the structure of the treeDean Kent2008/05/03 09:06 AM
                                      Following the structure of the treeFoo_2008/05/04 12:06 AM
                                        Following the structure of the treeMichael S2008/05/04 12:22 AM
            Hybrid kernel, not NTLinus Torvalds2006/05/09 04:19 PM
              Microkernel Vs Monolithic KernelKernel_Protector2006/05/09 08:41 PM
                Microkernel Vs Monolithic KernelDavid Kanter2006/05/09 09:30 PM
                  Sigh, Stand back, its slashdotting time. (NT)Anonymous2006/05/09 09:44 PM
                  Microkernel Vs Monolithic Kernelblah2006/05/12 07:58 PM
                  Microkernel Vs Monolithic KernelRob Thorpe2006/05/15 12:41 AM
          Hybrid kernel, not NTAnalGuy2006/05/16 02:10 AM
            Theory versus practiceDavid Kanter2006/05/16 11:55 AM
              Distributed algorithmsRob Thorpe2006/05/16 11:53 PM
              Theory versus practiceHoward Chu2006/05/17 01:54 AM
                Theory versus practiceJS2006/05/17 03:29 AM
          Play online poker, blackjack !!! Gamezonex2007/08/16 12:49 PM
  Hybrid (micro)kernelsphilt2006/05/14 08:15 PM
    Hybrid (micro)kernelsLinus Torvalds2006/05/15 07:20 AM
      Hybrid (micro)kernelsLinus Torvalds2006/05/15 10:56 AM
        Hybrid (micro)kernelsRob Thorpe2006/05/16 12:22 AM
          Hybrid (micro)kernelsrwessel2006/05/16 10:23 AM
            Hybrid (micro)kernelsRob Thorpe2006/05/16 11:43 PM
              Hybrid (micro)kernelsrwessel2006/05/17 12:33 AM
                Hybrid (micro)kernelsRob Thorpe2006/05/19 06:51 AM
                  Hybrid (micro)kernelsrwessel2006/05/19 11:27 AM
      Hybrid (micro)kernelstechIperson2006/05/15 12:25 PM
      Hybrid (micro)kernelsmas2006/05/15 04:17 PM
        Hybrid (micro)kernelsLinus Torvalds2006/05/15 04:39 PM
          Hybrid (micro)kernelsColonel Kernel2006/05/15 08:17 PM
            Hybrid (micro)kernelsWink Saville2006/05/15 09:31 PM
              Hybrid (micro)kernelsLinus Torvalds2006/05/16 09:08 AM
                Hybrid (micro)kernelsWink Saville2006/05/16 08:55 PM
          Hybrid (micro)kernelsrwessel2006/05/16 10:31 AM
            Hybrid (micro)kernelsLinus Torvalds2006/05/16 11:00 AM
        Hybrid (micro)kernelsBrendan2006/05/16 12:36 AM
        Hybrid (micro)kernelsPaul Elliott2006/09/03 07:44 AM
          Hybrid (micro)kernelsRob Thorpe2006/09/04 08:25 AM
      Hybrid (micro)kernelsphilt2006/05/15 11:55 PM
        Hybrid (micro)kernelspgerassi2007/08/16 06:41 PM
  Another questionable entry on Wikipedia?Chung Leong2006/05/18 09:33 AM
  Hybrid (micro)kernelsisrael2006/05/20 03:25 AM
    Hybrid (micro)kernelsRob Thorpe2006/05/22 07:35 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?