By: Wilco (wilco.dijkstra.delete@this.ntlworld.com), June 5, 2022 5:40 am
Room: Moderated Discussions
Eric Fink (eric.delete@this.anon.com) on June 5, 2022 4:31 am wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on June 4, 2022 10:17 am wrote:
> > I have the exact reverse reaction.
> >
> > Text sequences are usually quite short. The whole "I have
> > gigabytes of JSON" seems a very artificial example.
> >
>
> Oh, I fully agree with everything you said, I should have been more precise what I mean with
> "longish". When I was expression my concerns about viability of vector-based ISA for low-latency
> computations, I was thinking about really small data structures, like geometric primitives
> that often fit into a couple of floats. Bulk-processed strings are often at least some orders
> of magnitudes longer, which is what I had in mind when I said "longish".
>
> But then again that might not even be the case. JSON requests can be quite short. Same for UTF-8 validation.
> It's just of you have to validate a 12-byte string, the performance impact might not be large enough
> to justify crazy optimisations. But if you are loading a larger (several KBs+) text file, a vector
> approach will probably help out a lot, even if there is a non-trivial setup cost.
The size of the input does not matter much - the average token size does. Typically you need a big switch statement to perform an action for each token, and that's where you spend most of your time during parsing.
I looked at an earlier version of simdjson and was able to get good speedups by removing unnecessary use of SIMD. It uses very complex SIMD processing to determine the start of every token and then write this out into a big array (if the average token size is small, this results in a huge expansion of the input, creates lots of cachemisses and consumes a lot of memory bandwidth). Then in the 2nd pass a big switch statement reads those offsets, figures out what token it might be for the 2nd time, and finally performs the action for each token. Removing the unnecessary SIMD tokenization step and using traditional parsing gave a 10-15% speedup.
There are certainly cases where SIMD can speedup parsing, for example skipping comments or #ifdef'd out text or UTF-8 processing, but you've got to use SIMD smartly and understand where it helps.
Wilco
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on June 4, 2022 10:17 am wrote:
> > I have the exact reverse reaction.
> >
> > Text sequences are usually quite short. The whole "I have
> > gigabytes of JSON" seems a very artificial example.
> >
>
> Oh, I fully agree with everything you said, I should have been more precise what I mean with
> "longish". When I was expression my concerns about viability of vector-based ISA for low-latency
> computations, I was thinking about really small data structures, like geometric primitives
> that often fit into a couple of floats. Bulk-processed strings are often at least some orders
> of magnitudes longer, which is what I had in mind when I said "longish".
>
> But then again that might not even be the case. JSON requests can be quite short. Same for UTF-8 validation.
> It's just of you have to validate a 12-byte string, the performance impact might not be large enough
> to justify crazy optimisations. But if you are loading a larger (several KBs+) text file, a vector
> approach will probably help out a lot, even if there is a non-trivial setup cost.
The size of the input does not matter much - the average token size does. Typically you need a big switch statement to perform an action for each token, and that's where you spend most of your time during parsing.
I looked at an earlier version of simdjson and was able to get good speedups by removing unnecessary use of SIMD. It uses very complex SIMD processing to determine the start of every token and then write this out into a big array (if the average token size is small, this results in a huge expansion of the input, creates lots of cachemisses and consumes a lot of memory bandwidth). Then in the 2nd pass a big switch statement reads those offsets, figures out what token it might be for the 2nd time, and finally performs the action for each token. Removing the unnecessary SIMD tokenization step and using traditional parsing gave a 10-15% speedup.
There are certainly cases where SIMD can speedup parsing, for example skipping comments or #ifdef'd out text or UTF-8 processing, but you've got to use SIMD smartly and understand where it helps.
Wilco