Operating system and driver overhead

By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 29, 2022 10:45 am
Room: Moderated Discussions
Eric L (eric.delete@this.noemail.com) on July 29, 2022 3:44 am wrote:
> XPoint SSDs have 5us read latency while the lowest latency NAND SSDs (Samsung Z-SSD) have 20us read latency.
> After the operating system and driver overhead is added to that, the read latency advantage of XPoint
> SSDs is surely less significant in percentage terms. Does anyone here know how much the operating system
> and driver add to the read latency of an SSD on Linux? I suspect XPoint SSDs are another example of Intel
> providing amazing hardware but not bothering with the software needed to make it shine.

So I think there's several issues here.

One issue is that in 99% of all cases, the overhead of the read/write system calls has nothing to do with hardware, because the data is cached.

People don't think all that much about it, because the "it's already cached" situation is largely invisible. You only really notice read/write when it is slow because of IO, so then the reaction is "oh, IO overhead is a big deal, so let's make it very low".

But once you have good enough solid-state storage, and the IO latency starts to approach being even in the same order of magnitude as the cached overhead is, the whole performance equation changes.

Sure, at that point you still get small improvements. The IO itself gets faster, and what helps even more is "you don't need the double buffering of the cache", but the wins are getting smaller, and it's really really easy to ignore the other costs.

For example, one of the big arguments for Xpoint was "treat the filesystem as just memory", but that was always a fever-dream.

Sure, one thing filesystems and read/write system calls do is abstract out the IO path, and once the "IO path" is just to "access memory", a less-than-gifted woodchuck that has been dropped on its head a few too many times would go "you can just access it directly, no need for system calls".

But that's just stupid. Yes, the system calls do abstract out the IO path, but they do so much more. In particular, open/read/write/close does all the resource allocation and the access control, and no amount of "you can access it as memory" will ever take that away.

And that resource management and access control is a big deal, and a big part of the overhead.

In fact, it's very close to 100% of the overhead when the file contents are cached: and as mentioned, the cached case is actually the common case. It's just that people don't think about it very much, because they take it for granted.

So if you ignore the cached case, you are ignoring a big portion of the picture, and Xpoint didn't really help that at all. Even the "no need for double buffering" wasn't really a big argument, since all the machines that had Xpoint also had a lot of memory, so the double buffering wasn't that big of a deal.

And in fact, while the double buffering of having things in both RAM caches and in something like Xpoint is a real memory cost, there are actually advantages to double buffering too. Sometimes you want buffering, because it gives you a level of indirection, and you can control things like write-back ordering etc.

So I think people oversold the advantages.

The main advantage of Xpoint were for use-cases that really could take advantage of the direct access. Probably mainly database vendors, that really don't want a filesystem in the first place (or, more commonly, really treat it as a very occasional resource allocation layer but then want to do all the IO directly).

But database vendors have literally spent decades building up all their infrastructure to deal with the IO costs, and a lot of them rely on double buffering anyway, where one "buffer" is their own memory, and then they do direct-IO calls to start and order the access to long-term storage.

So a database would want to double-buffer anyway, just change their IO path to use a special "memory copy with cache flushes". But since they've spent all those decades on trying to avoid the IO path, and all their benchmarks are run with lots of memory and high-performance SSD's anyway, it probably ends up not being the huge win people expect it to be.

The other possible big improvement would be the byte addressability, which would free filesystems from one of their big design constraints. But very few filesystems want to leave traditional disks behind, and from an access standpoint you still couldn't treat Xpoint as "just memory", because you still ended up having block sizes in terms of cacheline sizes from a writeback standpoint.

End result: regular flash just works really well, and was much cheaper and more readily available.

And I literally saw people say "you don't need a filesystem at all, you can just use nonvolatime RAM as your filesystem", and that just shows how clueless people are, and how these people were completely ignoring why filesystems exist.

Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
RIP Optane/XPointWes Felter2022/07/28 07:53 PM
  RIP Optane/XPointRayla2022/07/28 08:28 PM
  RIP Optane/XPointDoug S2022/07/28 09:00 PM
    RIP Optane/XPointNoSpammer2022/07/29 01:50 AM
    NVDIMM-NEric L2022/07/29 03:36 AM
    RIP Optane/XPointMichael S2022/07/29 04:02 AM
      RIP Optane/XPointDoug S2022/07/29 10:40 AM
        RIP Optane/XPointDoug S2022/07/29 10:43 AM
          RIP Optane/XPointLinus Torvalds2022/07/29 11:20 AM
            RIP Optane/XPointDavid Hess2022/07/29 08:59 PM
              RIP Optane/XPointDavid Hess2022/07/30 03:44 PM
                RIP Optane/XPointDoug S2022/07/30 10:43 PM
                  RIP Optane/XPointrwessel2022/07/31 05:33 AM
                  RIP Optane/XPointKonrad Schwarz2022/08/02 08:06 AM
                  RIP Optane/XPointDavid Hess2022/08/02 10:24 PM
                    RIP Optane/XPointDavid Hess2022/08/02 10:26 PM
                    RIP Optane/XPointAdrian2022/08/03 01:19 AM
        RIP Optane/XPointanonymou52022/07/29 12:50 PM
    RIP Optane/XPointGionatan Danti2022/07/29 09:09 AM
    RIP Optane/XPointMark Roulo2022/07/29 10:02 AM
      RIP Optane/XPointdmcq2022/07/30 03:42 AM
      RIP Optane/XPointanon32022/07/31 10:19 PM
        RIP Optane/XPointanon22022/07/31 10:55 PM
          RIP Optane/XPointDoug S2022/08/01 08:37 AM
            RIP Optane/XPointGionatan Danti2022/08/01 01:33 PM
              RIP Optane/XPointNoSpammer2022/08/02 03:50 AM
                RIP Optane/XPointDoug S2022/08/02 09:24 AM
                  RIP Optane/XPointGionatan Danti2022/08/02 10:34 AM
                  RIP Optane/XPoint---2022/08/02 10:39 AM
            RIP Optane/XPointDavid Hess2022/08/03 03:48 AM
              RIP Optane/XPointMichael S2022/08/03 06:04 AM
                RIP Optane/XPointDavid Hess2022/08/03 08:56 AM
        RIP Optane/XPointAdrian2022/08/01 02:15 AM
          RIP Optane/XPointGionatan Danti2022/08/01 06:07 AM
            Losses vs not profitable enoughMark Roulo2022/08/01 10:15 AM
              Losses vs not profitable enoughdmcq2022/08/01 11:50 AM
                Losses vs not profitable enoughGionatan Danti2022/08/01 12:34 PM
            RIP Optane/XPointMichael S2022/08/01 02:47 PM
              RIP Optane/XPointAnon2022/08/01 03:09 PM
                RIP Optane/XPointMichael S2022/08/01 03:32 PM
      RIP Optane/XPointGroo2022/08/01 12:28 PM
        RIP Optane/XPointanon32022/08/01 10:33 PM
          RIP Optane/XPointGroo2022/08/03 11:15 AM
            RIP Optane/XPoint---2022/08/03 03:05 PM
    LatencyDavid Kanter2022/07/29 06:35 PM
  Operating system and driver overheadEric L2022/07/29 03:44 AM
    Operating system and driver overheadLinus Torvalds2022/07/29 10:45 AM
  altrernatives?Michael S2022/07/29 05:17 AM
    altrernatives?Rayla2022/07/29 06:49 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊