By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), May 30, 2022 7:16 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on May 30, 2022 1:02 am wrote:
> Intuitively, I'd think that most keys that do not fit in 32-bit are variable-length by nature, so
> 1) they don't fit in 128-bit without some none-trivial pre and post processing
> 2) are likely to be almost entirely distinguishable by 32 or 40 MS bits
How about hashing variable-length inputs to 96 or 128 bits? Also, distinguishing by MSB is difficult in the case of a library that cannot specialize on the domain.
> if N is long enough then intuitively I'd expect that radix sort beats quicksort even for >40 bits.
> Also, I would think that all above radix vs quick discussion
> is relevant only for N that is long, but not very long.
I haven't tried it recently, but IIRC the highly-tuned 2010 radix sort reported 624 int32/s on 8 cores (so 312 MB/s), whereas today's vqsort does 612-1100 MB/s per core for N from 1M to 125M.
> When N is such that array does not fit in L2 cache, my intuition suggests that mergesort is unbeatable as a top level algorithm.
hm, I'm curious why? Merge cannot take advantage of already (nearly) sorted data or skewed distributions. Also, very recent work (if I am reading their Table 12 correctly) reports around 512 MB/s for 125M int64.
> Intuitively, I'd think that most keys that do not fit in 32-bit are variable-length by nature, so
> 1) they don't fit in 128-bit without some none-trivial pre and post processing
> 2) are likely to be almost entirely distinguishable by 32 or 40 MS bits
How about hashing variable-length inputs to 96 or 128 bits? Also, distinguishing by MSB is difficult in the case of a library that cannot specialize on the domain.
> if N is long enough then intuitively I'd expect that radix sort beats quicksort even for >40 bits.
> Also, I would think that all above radix vs quick discussion
> is relevant only for N that is long, but not very long.
I haven't tried it recently, but IIRC the highly-tuned 2010 radix sort reported 624 int32/s on 8 cores (so 312 MB/s), whereas today's vqsort does 612-1100 MB/s per core for N from 1M to 125M.
> When N is such that array does not fit in L2 cache, my intuition suggests that mergesort is unbeatable as a top level algorithm.
hm, I'm curious why? Merge cannot take advantage of already (nearly) sorted data or skewed distributions. Also, very recent work (if I am reading their Table 12 correctly) reports around 512 MB/s for 125M int64.