By: Michael S (already5chosen.delete@this.yahoo.com), May 30, 2022 8:36 am
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 30, 2022 7:16 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on May 30, 2022 1:02 am wrote:
> > Intuitively, I'd think that most keys that do not fit in 32-bit are variable-length by nature, so
> > 1) they don't fit in 128-bit without some none-trivial pre and post processing
> > 2) are likely to be almost entirely distinguishable by 32 or 40 MS bits
> How about hashing variable-length inputs to 96 or 128 bits?
Why would I possibly want to sort hashes?
> Also, distinguishing by
> MSB is difficult in the case of a library that cannot specialize on the domain.
I am not too interested in heavily optimized non-specialized libraries.
Somehow I tend to think that majority of real cases fall near one of two extremes:
1. Non-optimized std:sort() is good enough
2. I want it as fast as possible.
Heavily optimized non-specialized library is is either an overkill or not enough.
>
> > if N is long enough then intuitively I'd expect that radix sort beats quicksort even for >40 bits.
> > Also, I would think that all above radix vs quick discussion
> > is relevant only for N that is long, but not very long.
> I haven't tried it recently, but IIRC the highly-tuned 2010 radix sort reported 624 int32/s on 8
> cores (so 312 MB/s), whereas today's vqsort does 612-1100 MB/s per core for N from 1M to 125M.
>
As said in previous post, I don't think that either of the two is good when N is so large.
But radix sort has much worse cache behaviour than quicksort so I fully believe that it suffers more impact when dataset no longer fits in L2$, esp. in parallel multithreading case.
> > When N is such that array does not fit in L2 cache, my intuition
> > suggests that mergesort is unbeatable as a top level algorithm.
> hm, I'm curious why? Merge cannot take advantage of already (nearly) sorted data or skewed distributions.
Only mergesort has predictably good cache access patterns.
But yes, it is the worst algo in terms of taking advantage of lucky opportunities. So is the nature of the beast.
> Also, very recent work (if I am reading their Table 12 correctly) reports around 512 MB/s for 125M int64.
Table 12 ? Where?
> Michael S (already5chosen.delete@this.yahoo.com) on May 30, 2022 1:02 am wrote:
> > Intuitively, I'd think that most keys that do not fit in 32-bit are variable-length by nature, so
> > 1) they don't fit in 128-bit without some none-trivial pre and post processing
> > 2) are likely to be almost entirely distinguishable by 32 or 40 MS bits
> How about hashing variable-length inputs to 96 or 128 bits?
Why would I possibly want to sort hashes?
> Also, distinguishing by
> MSB is difficult in the case of a library that cannot specialize on the domain.
I am not too interested in heavily optimized non-specialized libraries.
Somehow I tend to think that majority of real cases fall near one of two extremes:
1. Non-optimized std:sort() is good enough
2. I want it as fast as possible.
Heavily optimized non-specialized library is is either an overkill or not enough.
>
> > if N is long enough then intuitively I'd expect that radix sort beats quicksort even for >40 bits.
> > Also, I would think that all above radix vs quick discussion
> > is relevant only for N that is long, but not very long.
> I haven't tried it recently, but IIRC the highly-tuned 2010 radix sort reported 624 int32/s on 8
> cores (so 312 MB/s), whereas today's vqsort does 612-1100 MB/s per core for N from 1M to 125M.
>
As said in previous post, I don't think that either of the two is good when N is so large.
But radix sort has much worse cache behaviour than quicksort so I fully believe that it suffers more impact when dataset no longer fits in L2$, esp. in parallel multithreading case.
> > When N is such that array does not fit in L2 cache, my intuition
> > suggests that mergesort is unbeatable as a top level algorithm.
> hm, I'm curious why? Merge cannot take advantage of already (nearly) sorted data or skewed distributions.
Only mergesort has predictably good cache access patterns.
But yes, it is the worst algo in terms of taking advantage of lucky opportunities. So is the nature of the beast.
> Also, very recent work (if I am reading their Table 12 correctly) reports around 512 MB/s for 125M int64.
Table 12 ? Where?