By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), November 4, 2006 12:29 pm
Room: Moderated Discussions
Linus Torvalds (torvalds@osdl.org) on 11/3/06 wrote:
---------------------------
>Just as an example of why the whole "everything is aligned"
>mentality is wrong, look at something like malloc.
>
>You actually pay a rather high cost for that "things are
>aligned" mentality. You're just so used to it that you
>take it for granted. But what you don't think about is that
>it also makes it much more expensive to do things like
>small allocations, because the allocator tends to be
>forced to always return aligned allocations..
>
>So in situations where size matters, I've several times
>ended up havign to write my own allocators, just because
>the default system "malloc()" was too wasteful with memory.
I don't think malloc is a good example, there is just so much wrong with it... Even if you relax the alignment of malloc, it will still be very wasteful. Allocated block overhead is often 8-bytes and minimum blocksize can be as large as 16-bytes! Apart from that many systems use very inefficient algorithms and are lazy in reusing memory and minimizing fragmentation. VC++ libraries are among the worst - it's no surprise that C++ programs on Windows are often very slow as a result. I've seen STL string manipulations being 250 times slower than the equivalent strcat in C - mostly due to the new/delete overhead.
Any good memory allocator has no per-block overhead for small blocks and constant time allocation (using pointer increment). Malloc simply cannot provide this.
>And btw, "size matters" does not mean "small machines".
>The last time I did this, it was on a machine where I had
>2GB of memory, but doing so shrank the working set size
>of a particular problem from just over half a gigabyte
>to (eventually, thanks to that and other changes)
>to less than half its original size.
I got 2x speedup by using region based custom allocators while memory usage was 30-40% lower. But I think all this proves there is something seriously wrong with malloc.
>Then there are the architectures that are truly
>braindead, and where an unaligned access is silently
>accepted and the low bits are just dropped. Early ARM in
>particular. That's just crap, crap, crap. Now you have
>potential security and debuggability issues on top of all
>the other issues you had.
To be fair, the original ARMs allowed the external memory controller to do an unaligned access by doing 2 word accesses and masking while the ARM CPU would do the final rotation. The original memory controller (MEMC) just didn't ever implement that. Others have implemented memory controllers supporting unaligned access so it is feasible. However I agree it's better to either trap or do the right thing fully in hardware.
>So instead of arguing against unaligned accesses, just
>face the facts: people use them, and people want them. They
>are often the "natural" way of doing things, and doing
>them in hardware is not only possible, it's a fact of life.
>
>The good news, of course, is that the architecture that
>matters most from a general-purpose sw standpoint handles
>them quite well, and the way things are going, the "no
>unaligned" people will be left behind soon enough.
Many other architectures handle them well too nowadays. However unaligned accesses have a cost, and that cost is too high if you do too many of them. IMO the ideal situation would be:
1. Programming languages use natural alignment by default, so p - q is well defined etc
2. Users can explicitly mark types as unaligned for whatever reason (packing, performance etc)
3. Unaligned accesses can use hardware support if available, or expand into a sequence of instructions
4. Other unaligned accesses are trapped and emulated if required (hopefully these are rare)
This is what happens on ARM and it works very well, including on CPUs that don't have unaligned hardware support. Emulation traps are disliked in the embedded world, so most systems just abort programs that do spurious unaligned accesses that weren't declared to the compiler. Note that declaring unaligned types allows for better compiler optimization and the use of special instructions. For example you could have an unaligned hint in load/store instructions.
Wilco
---------------------------
>Just as an example of why the whole "everything is aligned"
>mentality is wrong, look at something like malloc.
>
>You actually pay a rather high cost for that "things are
>aligned" mentality. You're just so used to it that you
>take it for granted. But what you don't think about is that
>it also makes it much more expensive to do things like
>small allocations, because the allocator tends to be
>forced to always return aligned allocations..
>
>So in situations where size matters, I've several times
>ended up havign to write my own allocators, just because
>the default system "malloc()" was too wasteful with memory.
I don't think malloc is a good example, there is just so much wrong with it... Even if you relax the alignment of malloc, it will still be very wasteful. Allocated block overhead is often 8-bytes and minimum blocksize can be as large as 16-bytes! Apart from that many systems use very inefficient algorithms and are lazy in reusing memory and minimizing fragmentation. VC++ libraries are among the worst - it's no surprise that C++ programs on Windows are often very slow as a result. I've seen STL string manipulations being 250 times slower than the equivalent strcat in C - mostly due to the new/delete overhead.
Any good memory allocator has no per-block overhead for small blocks and constant time allocation (using pointer increment). Malloc simply cannot provide this.
>And btw, "size matters" does not mean "small machines".
>The last time I did this, it was on a machine where I had
>2GB of memory, but doing so shrank the working set size
>of a particular problem from just over half a gigabyte
>to (eventually, thanks to that and other changes)
>to less than half its original size.
I got 2x speedup by using region based custom allocators while memory usage was 30-40% lower. But I think all this proves there is something seriously wrong with malloc.
>Then there are the architectures that are truly
>braindead, and where an unaligned access is silently
>accepted and the low bits are just dropped. Early ARM in
>particular. That's just crap, crap, crap. Now you have
>potential security and debuggability issues on top of all
>the other issues you had.
To be fair, the original ARMs allowed the external memory controller to do an unaligned access by doing 2 word accesses and masking while the ARM CPU would do the final rotation. The original memory controller (MEMC) just didn't ever implement that. Others have implemented memory controllers supporting unaligned access so it is feasible. However I agree it's better to either trap or do the right thing fully in hardware.
>So instead of arguing against unaligned accesses, just
>face the facts: people use them, and people want them. They
>are often the "natural" way of doing things, and doing
>them in hardware is not only possible, it's a fact of life.
>
>The good news, of course, is that the architecture that
>matters most from a general-purpose sw standpoint handles
>them quite well, and the way things are going, the "no
>unaligned" people will be left behind soon enough.
Many other architectures handle them well too nowadays. However unaligned accesses have a cost, and that cost is too high if you do too many of them. IMO the ideal situation would be:
1. Programming languages use natural alignment by default, so p - q is well defined etc
2. Users can explicitly mark types as unaligned for whatever reason (packing, performance etc)
3. Unaligned accesses can use hardware support if available, or expand into a sequence of instructions
4. Other unaligned accesses are trapped and emulated if required (hopefully these are rare)
This is what happens on ARM and it works very well, including on CPUs that don't have unaligned hardware support. Emulation traps are disliked in the embedded world, so most systems just abort programs that do spurious unaligned accesses that weren't declared to the compiler. Note that declaring unaligned types allows for better compiler optimization and the use of special instructions. For example you could have an unaligned hint in load/store instructions.
Wilco