By: -.- (blarg.delete@this.mailinator.com), June 8, 2022 5:28 am
Room: Moderated Discussions
Wilco (wilco.dijkstra.delete@this.ntlworld.com) on June 5, 2022 5:40 am wrote:
> I looked at an earlier version of simdjson and was able to get good speedups by removing unnecessary use of
> SIMD. It uses very complex SIMD processing to determine the start of every token and then write this out into
> a big array (if the average token size is small, this results in a huge expansion of the input, creates lots
> of cachemisses and consumes a lot of memory bandwidth). Then in the 2nd pass a big switch statement reads those
> offsets, figures out what token it might be for the 2nd time, and finally performs the action for each token.
> Removing the unnecessary SIMD tokenization step and using traditional parsing gave a 10-15% speedup.
Is your code publicly available?
> I looked at an earlier version of simdjson and was able to get good speedups by removing unnecessary use of
> SIMD. It uses very complex SIMD processing to determine the start of every token and then write this out into
> a big array (if the average token size is small, this results in a huge expansion of the input, creates lots
> of cachemisses and consumes a lot of memory bandwidth). Then in the 2nd pass a big switch statement reads those
> offsets, figures out what token it might be for the 2nd time, and finally performs the action for each token.
> Removing the unnecessary SIMD tokenization step and using traditional parsing gave a 10-15% speedup.
Is your code publicly available?