By: rwessel (rwessel.delete@this.yahoo.com), May 30, 2022 9:36 am
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 30, 2022 7:16 am wrote:
> Merge cannot take advantage of already (nearly) sorted data or skewed distributions.
It depends on how you generate the initial runs. Its common, especially with external sorts, to accept already sorted data as a single run. It's true that the textbook internal mergesort starts with chopping up the input into runs of one, but that isn't the only possibility.
If you allow variable sized initial runs, the merging process becomes a bit more complex, usually using a priority queue, or some-such, to determine the order of merges, (you generally always want to merge the smallest pairs of runs first). You're really doing the same thing with fixed size runs, but the merge order then becomes a fixed pattern.
> Merge cannot take advantage of already (nearly) sorted data or skewed distributions.
It depends on how you generate the initial runs. Its common, especially with external sorts, to accept already sorted data as a single run. It's true that the textbook internal mergesort starts with chopping up the input into runs of one, but that isn't the only possibility.
If you allow variable sized initial runs, the merging process becomes a bit more complex, usually using a priority queue, or some-such, to determine the order of merges, (you generally always want to merge the smallest pairs of runs first). You're really doing the same thing with fixed size runs, but the merge order then becomes a fixed pattern.