By: Maynard Handley (name99.delete@this.name99.org), August 17, 2014 10:31 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on August 17, 2014 9:52 am wrote:
> Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 16, 2014 10:25 pm wrote:
> > Maynard Handley (name99.delete@this.name99.org) on August 16, 2014 2:24 pm wrote:
> > > Likewise for synchronization primitives. The consensus as I read the literature is
> > > that load locked/store conditional is substantially easier to implement and get right
> > > than LOCK prefixes and the random mix of other things that Intel has. I'm guessing
> > > it's then also easier to build HW TM on top of load locked/store conditional.
> > > Beyond this, I'm guessing it's substantially harder to design for and verify
> > > the Intel memory model than the looser POWER and ARM memory models.
> > >
> > Actually, many software people have come to mostly loathe LL/SC and its been
> > involved an lots of bugs of either software or hardware over the years. The
> > truth is that LL/SC is pretty much only used for effectively CMPXHG.
> >
> > And its generally easier to design and verify for the x86 MOM than for any of the more relaxed
> > memory models. Esp from a software perspective. And from a performance perspective, because of
> > the software issues, the stricter MOMs tend to have better performance. AKA with a weaker MOM,
> > software developers tend to be much more cautious which leads to lower performance. So far, the
> > performance/strictness relationship with MOMs has been a case of Theory != Practice.
> >
> > Maynard, you've been around long enough that you've almost certainly seen Linus
> > rant #X on this topic. This is definitely one of those areas where I agree with
> > Linus. Make it easier for the programmers, hardware designers be dammed.
>
> Does it matter much that the only thing LL/SC is used for is CMPXHG? I always thought that WAS pretty much
> the entire point --- it gives the HW an easier way to implement CMPXHG, and if you want to go off and do anything
> extra with them, go right ahead, but the HW was not designed with those extra capabilities in mind.
>
> Linus is welcome to his rants; I've ranted plenty in my time. But I think we have to distinguish between
> - this is optimal for ME AND MY TEAM because we've created a particular
> OS model which is centered around x86 capabilities and
> - this is optimal for ANYONE working at the OS level and
> - this is optimal for the HW (and let's face it, OS developers and compiler writers are a very small
> fraction of developers, let alone users, so if doing things this way helps everyone else it's a win)
>
> Certainly I was not convinced, after we went through the last version of "inverted page tables suck"
> that the CONCEPT was intrinsically bad. I read a lot of material about it from various sources, and
> the pattern I saw was that the Linux guys had (for defensible reasons) a very specific idea in mind
> of how they wanted to implement page tables, and their complaints seemed more to come down to "we can't
> efficiently implement the x86 design pattern on this HW" than anything deeper than that.
>
> So getting back to LL/SC, was it considered in, say, the context of Alpha
> and Ultrix a terrible idea? Would the alternatives have been better?
To add to what I've said, perhaps our various different opinions about this issue reflect our real world experiences? If you come from a Windows or Linux background where it's long been the case that, let's say, inexperienced programmers are expected to write device drivers and similar sophisticated code, then your experience is that such code is problematic and generally sucks, and that anything the HW can do to help is welcome.
If, on the other hand, you come from the Apple tradition (or, equivalently the IBM POWER tradition) the general history has been that competent developers inside the company handle the drivers (and everything else from the OS to the dev tools), that 3rd party developers are encouraged/required to use provided APIs/SPIs to express their intent (rather than going to the metal), and there just isn't this history of problems.
Which suggests that Apple is probably just fine sticking with a version of the ARM memory model that meets spec and is no stronger, whereas the other ARM vendors may encounter difficulties, unless they can persuade Linaro to provide APIs/SPIs that do everything required, and can persuade 3rd party developers to move to a more modern way of coding.
How does this play out in the Android world? On the one hand you have as big a collection of inexperienced cowboys as you might imagine in the various phone vendors; on the other you hand Google at the top providing a set of APIs and best practices and basically acting like MS, only the MS of today (which has learned through bitter experience the consequences of leaving low-level driver development to others), not the MS of 1993.
I admit to not following Android obsessively, but I love and look for a good schadenfreude experience as much as anyone, and I'm unaware of any major (or even minor) disasters in the Android phone space that can be traced back to the ARM weaker memory model and developers not properly handling it.
Which suggests that the problem was perhaps more a consequence of extremely constrained resources in the 90s (leading to metal-level driver development on Windows), and perhaps an ongoing unhealthy level of obsession with performance in mainstream Linux (as opposed to Android) today, leading to a culture that disdains using APIs/SPIs and the sort of abstraction in driver development that we see in Apple's IOKit and Windows' and Android's equivalents.
> Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 16, 2014 10:25 pm wrote:
> > Maynard Handley (name99.delete@this.name99.org) on August 16, 2014 2:24 pm wrote:
> > > Likewise for synchronization primitives. The consensus as I read the literature is
> > > that load locked/store conditional is substantially easier to implement and get right
> > > than LOCK prefixes and the random mix of other things that Intel has. I'm guessing
> > > it's then also easier to build HW TM on top of load locked/store conditional.
> > > Beyond this, I'm guessing it's substantially harder to design for and verify
> > > the Intel memory model than the looser POWER and ARM memory models.
> > >
> > Actually, many software people have come to mostly loathe LL/SC and its been
> > involved an lots of bugs of either software or hardware over the years. The
> > truth is that LL/SC is pretty much only used for effectively CMPXHG.
> >
> > And its generally easier to design and verify for the x86 MOM than for any of the more relaxed
> > memory models. Esp from a software perspective. And from a performance perspective, because of
> > the software issues, the stricter MOMs tend to have better performance. AKA with a weaker MOM,
> > software developers tend to be much more cautious which leads to lower performance. So far, the
> > performance/strictness relationship with MOMs has been a case of Theory != Practice.
> >
> > Maynard, you've been around long enough that you've almost certainly seen Linus
> > rant #X on this topic. This is definitely one of those areas where I agree with
> > Linus. Make it easier for the programmers, hardware designers be dammed.
>
> Does it matter much that the only thing LL/SC is used for is CMPXHG? I always thought that WAS pretty much
> the entire point --- it gives the HW an easier way to implement CMPXHG, and if you want to go off and do anything
> extra with them, go right ahead, but the HW was not designed with those extra capabilities in mind.
>
> Linus is welcome to his rants; I've ranted plenty in my time. But I think we have to distinguish between
> - this is optimal for ME AND MY TEAM because we've created a particular
> OS model which is centered around x86 capabilities and
> - this is optimal for ANYONE working at the OS level and
> - this is optimal for the HW (and let's face it, OS developers and compiler writers are a very small
> fraction of developers, let alone users, so if doing things this way helps everyone else it's a win)
>
> Certainly I was not convinced, after we went through the last version of "inverted page tables suck"
> that the CONCEPT was intrinsically bad. I read a lot of material about it from various sources, and
> the pattern I saw was that the Linux guys had (for defensible reasons) a very specific idea in mind
> of how they wanted to implement page tables, and their complaints seemed more to come down to "we can't
> efficiently implement the x86 design pattern on this HW" than anything deeper than that.
>
> So getting back to LL/SC, was it considered in, say, the context of Alpha
> and Ultrix a terrible idea? Would the alternatives have been better?
To add to what I've said, perhaps our various different opinions about this issue reflect our real world experiences? If you come from a Windows or Linux background where it's long been the case that, let's say, inexperienced programmers are expected to write device drivers and similar sophisticated code, then your experience is that such code is problematic and generally sucks, and that anything the HW can do to help is welcome.
If, on the other hand, you come from the Apple tradition (or, equivalently the IBM POWER tradition) the general history has been that competent developers inside the company handle the drivers (and everything else from the OS to the dev tools), that 3rd party developers are encouraged/required to use provided APIs/SPIs to express their intent (rather than going to the metal), and there just isn't this history of problems.
Which suggests that Apple is probably just fine sticking with a version of the ARM memory model that meets spec and is no stronger, whereas the other ARM vendors may encounter difficulties, unless they can persuade Linaro to provide APIs/SPIs that do everything required, and can persuade 3rd party developers to move to a more modern way of coding.
How does this play out in the Android world? On the one hand you have as big a collection of inexperienced cowboys as you might imagine in the various phone vendors; on the other you hand Google at the top providing a set of APIs and best practices and basically acting like MS, only the MS of today (which has learned through bitter experience the consequences of leaving low-level driver development to others), not the MS of 1993.
I admit to not following Android obsessively, but I love and look for a good schadenfreude experience as much as anyone, and I'm unaware of any major (or even minor) disasters in the Android phone space that can be traced back to the ARM weaker memory model and developers not properly handling it.
Which suggests that the problem was perhaps more a consequence of extremely constrained resources in the 90s (leading to metal-level driver development on Windows), and perhaps an ongoing unhealthy level of obsession with performance in mainstream Linux (as opposed to Android) today, leading to a culture that disdains using APIs/SPIs and the sort of abstraction in driver development that we see in Apple's IOKit and Windows' and Android's equivalents.