By: -.- (blarg.delete@this.mailinator.com), June 4, 2022 5:56 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on June 4, 2022 10:17 am wrote:
> Text sequences are usually quite short. The whole "I have
> gigabytes of JSON" seems a very artificial example.
Except having "gigabytes of JSON" was never necessary, and I'm willing to bet their benchmarks don't have gigabyte sized JSON files.
An AVX-512 vector is only 64 bytes, so all you need is enough JSON to cover that, and amortize any startup costs. I don't know where that point lies, but I'd imagine a few KB of JSON is sufficient to see benefits. Whilst there's plenty of smaller payloads, JSON payloads of a few KB are certainly quite common.
(this also misses the point that 512-bit vectors wasn't the only part, AVX-512 adds instructions like VPCOMPRESSB which are quite useful for string processing)
> In most text parsers I've seen, the technical act of parsing the stream itself is the least of the
> problems - building up the resulting data tree (or whatever) with allocations etc tends to be the
> biggest issue. Lots of small allocations, often lots of small data copies for said allocations.
I'd imagine that SIMD helps with data copies as well - even lots of small ones. But in-situ parsing is also a thing, though building some sort of 'data tree' is still needed.
> The simdjson JSON parsing code literally has a special mode for "don't allocate memory for the result" (look
> it up: '-H'), and their performance notes page literally tells people to use that flag (along with largepage
> allocations, which is at least a bit more reasonable) when reporting parsing benchmark numbers.
>
> Read that paragraph above one more time: it's not that the parsing benchmark doesn't
> actually do anything else, and doesn't do any useful work with the result - it's that
> it doesn't even save the results of said "parsing" in the first place, because it
> turns out that that is the more expensive operation than the "scan the text" part.
'Not saving the results', is not how I read it; I understood it as, 'you have to supply the allocated memory'.
Reading their section here, they literally state that you "can use the
The recommendation I see is: "we recommend that you report performance numbers with and without huge pages if possible", which I suppose you could take as a suggestion to use huge pages.
> In other words: don't use the simdjson performance numbers as any kind
> of argument for SIMD processing. They are entirely meaningless.
I see it more like people touting FLOPS figures - there's a certain appeal to it, even if it's not representative of a lot of problems. That's not to say the functionality is useless though.
Fortunately the code is open, so anyone can go and make a benchmark that you'd find to be more realistic. I doubt you'll get the headline figures touted by simdjson, but I wouldn't be surprised if it still beats competing solutions.
> Text sequences are usually quite short. The whole "I have
> gigabytes of JSON" seems a very artificial example.
Except having "gigabytes of JSON" was never necessary, and I'm willing to bet their benchmarks don't have gigabyte sized JSON files.
An AVX-512 vector is only 64 bytes, so all you need is enough JSON to cover that, and amortize any startup costs. I don't know where that point lies, but I'd imagine a few KB of JSON is sufficient to see benefits. Whilst there's plenty of smaller payloads, JSON payloads of a few KB are certainly quite common.
(this also misses the point that 512-bit vectors wasn't the only part, AVX-512 adds instructions like VPCOMPRESSB which are quite useful for string processing)
> In most text parsers I've seen, the technical act of parsing the stream itself is the least of the
> problems - building up the resulting data tree (or whatever) with allocations etc tends to be the
> biggest issue. Lots of small allocations, often lots of small data copies for said allocations.
I'd imagine that SIMD helps with data copies as well - even lots of small ones. But in-situ parsing is also a thing, though building some sort of 'data tree' is still needed.
> The simdjson JSON parsing code literally has a special mode for "don't allocate memory for the result" (look
> it up: '-H'), and their performance notes page literally tells people to use that flag (along with largepage
> allocations, which is at least a bit more reasonable) when reporting parsing benchmark numbers.
>
> Read that paragraph above one more time: it's not that the parsing benchmark doesn't
> actually do anything else, and doesn't do any useful work with the result - it's that
> it doesn't even save the results of said "parsing" in the first place, because it
> turns out that that is the more expensive operation than the "scan the text" part.
'Not saving the results', is not how I read it; I understood it as, 'you have to supply the allocated memory'.
Reading their section here, they literally state that you "can use the
-H
flag to omit the memory allocation cost from the benchmark results", not that it's recommended.The recommendation I see is: "we recommend that you report performance numbers with and without huge pages if possible", which I suppose you could take as a suggestion to use huge pages.
> In other words: don't use the simdjson performance numbers as any kind
> of argument for SIMD processing. They are entirely meaningless.
I see it more like people touting FLOPS figures - there's a certain appeal to it, even if it's not representative of a lot of problems. That's not to say the functionality is useless though.
Fortunately the code is open, so anyone can go and make a benchmark that you'd find to be more realistic. I doubt you'll get the headline figures touted by simdjson, but I wouldn't be surprised if it still beats competing solutions.