By: Michael S (already5chosen.delete@this.yahoo.com), May 30, 2022 1:02 am
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 29, 2022 10:48 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on May 29, 2022 1:38 pm wrote:
> > A bit too much of templatism to figure out [in this late hour] what exactly
> > is sorted, what is the key and what is the payload. Esp. the later.
> Indeed lots of templates :) All key types between 16-128 bit are supported. Payload
> is whatever fits in the unused lower bits of the key. I'm told that 64+0 and 64+64 are
> very interesting use cases (for example, a columnar DB). If you're interested, there
> is some discussion of use cases in our paper: https://arxiv.org/abs/2205.05982.
>
> > More often than not it is soundly beaten by variations of radix sort.
> I wrote such a thing in 2010 :) Time flies. https://arxiv.org/abs/1008.2849
> Turns out 32-bit keys are not enough for many use cases, and virtual
> memory tricks to reduce/avoid branching are not very practical.
>
Intuitively, I'd think that most keys that do not fit in 32-bit are variable-length by nature, so
1) they don't fit in 128-bit without some none-trivial pre and post processing
2) are likely to be almost entirely distinguishable by 32 or 40 MS bits
So, for such cases radix sort still appears as a good plan for doing majority of work.
Also, if N is long enough then intuitively I'd expect that radix sort beats quicksort even for >40 bits.
What exactly is faster for, say, 80 bits likely depends both on N and on distribution.
Also, I would think that all above radix vs quick discussion is relevant only for N that is long, but not very long.
When N is such that array does not fit in L2 cache, my intuition suggests that mergesort is unbeatable as a top level algorithm. In this scenario a choice between quick and radix for "inner-level" algorithm for sorting shorter sub-arrays ahead of merging is also of some importance, but not of huge importance.
> Michael S (already5chosen.delete@this.yahoo.com) on May 29, 2022 1:38 pm wrote:
> > A bit too much of templatism to figure out [in this late hour] what exactly
> > is sorted, what is the key and what is the payload. Esp. the later.
> Indeed lots of templates :) All key types between 16-128 bit are supported. Payload
> is whatever fits in the unused lower bits of the key. I'm told that 64+0 and 64+64 are
> very interesting use cases (for example, a columnar DB). If you're interested, there
> is some discussion of use cases in our paper: https://arxiv.org/abs/2205.05982.
>
> > More often than not it is soundly beaten by variations of radix sort.
> I wrote such a thing in 2010 :) Time flies. https://arxiv.org/abs/1008.2849
> Turns out 32-bit keys are not enough for many use cases, and virtual
> memory tricks to reduce/avoid branching are not very practical.
>
Intuitively, I'd think that most keys that do not fit in 32-bit are variable-length by nature, so
1) they don't fit in 128-bit without some none-trivial pre and post processing
2) are likely to be almost entirely distinguishable by 32 or 40 MS bits
So, for such cases radix sort still appears as a good plan for doing majority of work.
Also, if N is long enough then intuitively I'd expect that radix sort beats quicksort even for >40 bits.
What exactly is faster for, say, 80 bits likely depends both on N and on distribution.
Also, I would think that all above radix vs quick discussion is relevant only for N that is long, but not very long.
When N is such that array does not fit in L2 cache, my intuition suggests that mergesort is unbeatable as a top level algorithm. In this scenario a choice between quick and radix for "inner-level" algorithm for sorting shorter sub-arrays ahead of merging is also of some importance, but not of huge importance.