By: dmcq (dmcq.delete@this.fano.co.uk), July 18, 2015 4:07 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 17, 2015 4:29 pm wrote:
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 17, 2015 1:57 pm wrote:
> >
> > Has the ancient trick of prepadding with 2 bytes been forgotten? That allows for 8 byte alignment of
> > both header and payload. Prepadding with 10 bytes would allow 16-byte alignment of the payload.
>
> It doesn't necessarily work well in the general case. It breaks in the face of various encapsulation
> things. It can also be absolutely horrible when you have other (bigger) alignment concerns,
> like making sure PCI DMA writes from the network card are aligned to 16-byte boundaries or
> whatever. Some network cards can't even do unaligned packet writes etc etc.
>
> That is, btw, one of the best examples of why special instructions (or worse, instruction
> sequences) for unaligned handling is completely broken. Because it is indeed possible
> to often set things up so that in practice, 100% of all accesses are aligned.
>
> But that "100%" is still not a guarantee. It's just a "under normal circumstances, we have laid out the data
> structures so that all the important accesses are perfectly aligned, and you always hit the good case".
>
> .. but then you have the odd cases when that doesn't work out. Either because of some
> encapsulation issue or because some particular hardware had other alignment concerns,
> or whatever. It may never ever happen for some particular common setup (like an important
> benchmark), but the unaligned case still needs to be handled correctly.
>
> And trapping doesn't work either. Well, it "works". But the problem is that it's so
> expensive, that if there are situations where the unaligned case goes from "never happens"
> to "when you encapsulate the ethernet packets using xyz, it happens for every packet",
> you went from good performance to absolutely unacceptable performance.
>
> The whole "aligned data is the usual case by far, but we cannot
> guarantee it absolutely" is not that unusual in the end.
>
> Not that dissimilar from things like denormals in FP. They also "never" happen in practice. But it's
> usually something you can't absolutely guarantee, and when they do, they end up often happening a lot
> (ie once you see one, you often see thousands), and you can't afford to suck too much at it.
>
> Linus
I'd consider somebody who wrote code like that without telling people a menace and such code would have to have very strong justification and a marker put on it. It is a pity C doesn't have a standards compliant way of describing packed external data structures and their alignment properly, in fact that is a bit of a mess in the language. That sort of code being generated by the compiler because the processor supports it is fine - not declaring it is the nasty part.
I've no problems with even having a couple of instructions instead of one for accessing unaligned data, it'll hardly make any difference to the access time if there are going to be a lot of such accesses. The big problem in that area is getting good code without a branch as the next byte after it might not be accessible. That's why instructions to load a short sting into a register might be good but unfortunately the problem with specifying 3 byte fields in C stops that - it is like wanting to use staples when our main tool C is a hammer.
As to denormals they're fine and they allow some reasoning about the code. I can't think why Intel took such a long time to support them with reasonable speed rather than having an interrupt.
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 17, 2015 1:57 pm wrote:
> >
> > Has the ancient trick of prepadding with 2 bytes been forgotten? That allows for 8 byte alignment of
> > both header and payload. Prepadding with 10 bytes would allow 16-byte alignment of the payload.
>
> It doesn't necessarily work well in the general case. It breaks in the face of various encapsulation
> things. It can also be absolutely horrible when you have other (bigger) alignment concerns,
> like making sure PCI DMA writes from the network card are aligned to 16-byte boundaries or
> whatever. Some network cards can't even do unaligned packet writes etc etc.
>
> That is, btw, one of the best examples of why special instructions (or worse, instruction
> sequences) for unaligned handling is completely broken. Because it is indeed possible
> to often set things up so that in practice, 100% of all accesses are aligned.
>
> But that "100%" is still not a guarantee. It's just a "under normal circumstances, we have laid out the data
> structures so that all the important accesses are perfectly aligned, and you always hit the good case".
>
> .. but then you have the odd cases when that doesn't work out. Either because of some
> encapsulation issue or because some particular hardware had other alignment concerns,
> or whatever. It may never ever happen for some particular common setup (like an important
> benchmark), but the unaligned case still needs to be handled correctly.
>
> And trapping doesn't work either. Well, it "works". But the problem is that it's so
> expensive, that if there are situations where the unaligned case goes from "never happens"
> to "when you encapsulate the ethernet packets using xyz, it happens for every packet",
> you went from good performance to absolutely unacceptable performance.
>
> The whole "aligned data is the usual case by far, but we cannot
> guarantee it absolutely" is not that unusual in the end.
>
> Not that dissimilar from things like denormals in FP. They also "never" happen in practice. But it's
> usually something you can't absolutely guarantee, and when they do, they end up often happening a lot
> (ie once you see one, you often see thousands), and you can't afford to suck too much at it.
>
> Linus
I'd consider somebody who wrote code like that without telling people a menace and such code would have to have very strong justification and a marker put on it. It is a pity C doesn't have a standards compliant way of describing packed external data structures and their alignment properly, in fact that is a bit of a mess in the language. That sort of code being generated by the compiler because the processor supports it is fine - not declaring it is the nasty part.
I've no problems with even having a couple of instructions instead of one for accessing unaligned data, it'll hardly make any difference to the access time if there are going to be a lot of such accesses. The big problem in that area is getting good code without a branch as the next byte after it might not be accessible. That's why instructions to load a short sting into a register might be good but unfortunately the problem with specifying 3 byte fields in C stops that - it is like wanting to use staples when our main tool C is a hammer.
As to denormals they're fine and they allow some reasoning about the code. I can't think why Intel took such a long time to support them with reasonable speed rather than having an interrupt.