Article: AMD's Mobile Strategy
By: Exophase (exophase.delete@this.gmail.com), December 21, 2011 10:10 am
Room: Moderated Discussions
Michael S (already5chosen@yahoo.com) on 12/21/11 wrote:
---------------------------
>Asm coding (at least in my experience) tends to be relatively small performance-critical
>modules, typically coded in somewhat-generic manner with input/output delivered through pointers.
>Such modules tend to not use globals at all and to use static far rarer than the rest of application.
>
>Did you ever code the whole big application in ASM? ;-)
I coded a kernel in x86 ASM once, it wasn't pretty >_> (this was a long time ago, for undergrad) Also coded a recompiling emulator for a simple toy arch. This was for an actual class in x86 assembly, I'm not that masochistic ;)
It's not that I didn't use global variables, I just don't think they come up much during the more critical parts of the code. But it was so long ago (and I don't think I have the source anymore?) that I don't remember exactly. I've coded a decent amount of x86 more recently; not really as performance-centric code but more stub functions that get called from dynamically generated code.
ASM I've done for other stuff often does fall more towards the type of category you're referring to, or needing to be more low level for various reasons. Although some of it still ends up being pretty sprawling, like CPU interpreters where things don't really fall into nice hot spots.
Nothing close to doing the whole program in ARM, but I've done enough to where the -O0 binary isn't that much slower than the -O3 binary. The salient point is really what's being executed, not what's present in the source code. But I do understand that not everything follows the 80/20 rule.
And it's not like I don't have to use global variables here either, they're particularly inevitable in ISRs. But I'm not constantly generating addresses to each individual one. I'm much more likely to generate one address then access a bunch of variables relative to it. Maybe C/C++ programs are often littered with stray globals instead of larger data structures, global or not, but there should still be a lot of spatial locality of reference to their accesses. The compiler should be able to group them. If global variables are being accessed constantly with poor spatial locality then that's a hit against cache performance.
Using x86 more today would probably make me think less about organizing my data to be more efficiently accessed. It might make compilers think less too, and compilers that have historically optimized for x86 have needed or need time to adjust to being friendlier for other archs. But that doesn't mean that the ISA is necessarily at a huge disadvantage that can't be overcome.
---------------------------
>Asm coding (at least in my experience) tends to be relatively small performance-critical
>modules, typically coded in somewhat-generic manner with input/output delivered through pointers.
>Such modules tend to not use globals at all and to use static far rarer than the rest of application.
>
>Did you ever code the whole big application in ASM? ;-)
I coded a kernel in x86 ASM once, it wasn't pretty >_> (this was a long time ago, for undergrad) Also coded a recompiling emulator for a simple toy arch. This was for an actual class in x86 assembly, I'm not that masochistic ;)
It's not that I didn't use global variables, I just don't think they come up much during the more critical parts of the code. But it was so long ago (and I don't think I have the source anymore?) that I don't remember exactly. I've coded a decent amount of x86 more recently; not really as performance-centric code but more stub functions that get called from dynamically generated code.
ASM I've done for other stuff often does fall more towards the type of category you're referring to, or needing to be more low level for various reasons. Although some of it still ends up being pretty sprawling, like CPU interpreters where things don't really fall into nice hot spots.
Nothing close to doing the whole program in ARM, but I've done enough to where the -O0 binary isn't that much slower than the -O3 binary. The salient point is really what's being executed, not what's present in the source code. But I do understand that not everything follows the 80/20 rule.
And it's not like I don't have to use global variables here either, they're particularly inevitable in ISRs. But I'm not constantly generating addresses to each individual one. I'm much more likely to generate one address then access a bunch of variables relative to it. Maybe C/C++ programs are often littered with stray globals instead of larger data structures, global or not, but there should still be a lot of spatial locality of reference to their accesses. The compiler should be able to group them. If global variables are being accessed constantly with poor spatial locality then that's a hit against cache performance.
Using x86 more today would probably make me think less about organizing my data to be more efficiently accessed. It might make compilers think less too, and compilers that have historically optimized for x86 have needed or need time to adjust to being friendlier for other archs. But that doesn't mean that the ISA is necessarily at a huge disadvantage that can't be overcome.