Article: ARM Goes 64-bit
By: VMguy (vmguy.not.here.delete@this.unknown.net), November 22, 2012 3:21 am
Room: Moderated Discussions
Hi,
Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on November 21, 2012 7:44 am wrote:
> EduardoS (no.delete@this.spam.com) on November 19, 2012 11:41 am wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on November 19, 2012 9:31 am wrote:
> > > Cache locality is certainly important, but GC doesn't solve that - in reality it actually makes it worse.
> >
> > Well... Cache locality improvements with GC is measurable... No need to discuss.
>
> Rubbish. GC is memory inefficient by definition, so claiming it is better for locality is just wishful
> thinking. Compacting GC's typically need 2-3 times more memory than a non-compacting GC, so are worse on
> average. Also the much higher memory allocation rate and resulting collections are bad for locality.
As for locality advantages, there is lot of research in that area refuting your claims:
Creating and preserving locality of java applications at allocation and garbage collection times
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.7067
Profile-guided proactive garbage collection for locality optimization
http://research.microsoft.com/en-us/um/people/trishulc/papers/halo.pdf
The garbage collection advantage: improving program locality
http://users.cecs.anu.edu.au/~steveb/downloads/pdf/oor-oopsla-2004.pdf
and many more which you can find either by using a dedicated portal (acm.org) or simply a search engine of your preference. Possible locality advantages due to the capability of moving around objects at will are straightforward to think of too.
"GC is by definition inefficient": you are right that GC by definition imposes some overhead, but you always have to weigh it against other factors like ease of programming and that it enables other optimizations like the above which may outweigh the disadvantages.
Compacting GCs are always more memory efficient than non-compacting GCs because in the worst case they can compact the heap perfectly at any time while non-compacting obviously cannot and suffer (sometimes a lot) from fragmentation - just like allocation in a non-gc'ed language. Compacting GC in the worst case need to do that at a ridiculous cost in performance of course.
The 2-3 times number you mention I only know as general guideline for sizing the available heap in a GC'ed VM, i.e. the overhead of gc starts getting negligible when you size the heap to 2-3 times the maximum amount of used memory.
This also depends a lot on your application, the language and the VM and the configuration used.
I do not think this figure is too far off from what you need for a non-trivial non-gc'ed program.
Allocation rates are not so much dependent on the JIT or VM but mostly influenced by the programming language (and the program itself): since eg. Java only knows heap allocated objects (which can be elided using program analysis to a certain degree). If it were possible to e.g. allocate on stack as it is with C++, a well written program would likely have a similar allocation rate.
I'm not talking about custom memory managers tailored for your application here - they are always better than any generic allocator if written reasonably, but really, allocation itself is not the problem for gc'ed languages you single them out.
> > > And that is before we consider the actual overhead of the
> > > collection itself, often having to stop all threads
> > > for long periods.
> >
> > No,
> >
> > 1) A background GC can run on another thread, specially usefull on not threaded software;
>
> Concurrent GC has even larger overheads. It stops threads for shorter periods but stops them more often, so
> takes far longer overall.
That I agree with, however as the original text said, in case of single threaded programs the GC can be done using extra resources. Of course, you also use more resources, but the net effect is that the program runs faster.
> And then we haven't considered the far higher overheads on the generated code.
>
You mean e.g. write barrier code? The overhead for them in the best case is negligible (depending on the type, e.g. "Barriers reconsidered, friendlier still" users.cecs.anu.edu.au/~steveb/downloads/pdf/barrier-ismm-2012.pdf; it presents lots of numbers), and a lot of work has been going into trying to avoid their generation as much as possible.
> > 2) A full stop still the less resource hungry GC, but don't look at it without
> > considering that, thanks to this GC allocations and deallocations are much faster
> > and heap is compacted periodically, in the end, often it is a win.
> >
> > > Then there is the optimization overhead and extra tables causing code bloat.
Not sure what you're mentioning here: gc implementers are quite aware of any extra overhead due to the gc helper data structures, both in extra memory and cacheability. This memory overhead typically ranges in the low single digit percent of total memory, and given enough memory the total performance overhead is not that much more either.
Depending on the GC they may have different (unacceptable to you) timing characteristics though.
> >
> > And C++ allocators waste space to avoid memory fragmentation... But doing so also hits locality.
>
> No, no space is wasted, unlike GC which requires descriptors for every object.
I do not understand what you mean with "descriptors"; the per-object overhead typically consists of a single word used for internal purposes, and another one that contains a reference to the class metadata (i.e. vtable).
The latter is similar to what any other OO language has, and the former is comparable to extra per-memory block information of your allocator. Typically normal allocators actually use a few words per block to store internal data, e.g. block length (at minimum), other data and various markers to detect invalid memory accesses. And certainly they do suffer from fragmentation.
Feel free to direct me to some information that shows the contrary.
I do not think a GC'ed system is at a disadvantage here; in the worst case you could use the same mechanism.
> > > array bounds checks,
> >
> > I was thinking about this one when you mentioned "other features", sometimes the
> > compiler is able to optimize it away and when not, last part of the post.
> >
> > > null pointer checks, assuming
> > > any pointer access may cause an exception,
> >
> > In x86 .Net this check is "cmp eax, [eax]" with the pointer in eax, on field access there
> > is no check at all since it will raise an exception anyway in the case of a null pointer.
> >
> > Since null pointer checks are so cheap it is not clear wich optimizations are disabled
> > by them, just put the check where it is needed to keep the correct order.
>
> Since when is a memory access cheap? Every unnecesary instruction has a cost.
>
> > > multithreading support etc etc.
> >
> > How exactly this lowers performance?
>
> The barriers and other checks for concurrent GC or multithreaded access
> to fields are not exactly zero-cost and block many optimizations.
>
> > > It is significantly harder to write a good compiler
> > > for them, and even then you can never get close to C++ performance. Many optimizations have to be
> > > disabled or turned extremely conservative as an exception or GC may occur at any time.
I am not sure whether it is significantly harder to write a good compiler for them: I agree that you need to spend more time to get to the same performance, but also because there are more components involved (e.g. VM/memory management).
However, there is already a body of previous work/research in that area that should give good results already, and you can also apply much of the existing research in static compilation optimizations.
Especially the compiler can be made a reasonably well encapsulated component of the whole system, while it's harder for other parts.
> managed languages are usually
> > more strict about ordering, and it is not obvious weak ordering improves performance by that much.
>
> The problem is not just the ordering, but the fact that more operations can cause exceptions. That alone
> creates a lot of overhead as you need to model flows from every possible exception to all possible exception
> handlers. Local variable values need to be preserved for example, severely limiting optimizations.
The exceptions can only occur at well-defined points, and while it may lead to scaling back optimizations, typically they are not. To a very large degree, after an exception many values are not of any interest to the program at all (i.e. unused in the remaining program) and so you need not keep them around or generate code that keeps them around at all.
(When really needed, e.g. when debugging, you typically fall back to an interpreter.)
In the contrary, optimizations are typically extremely aggressive to the point of removing seemingly unused code parts using lots of different types of information, keeping only a check and calls to recompile if even necessary. E.g. in the example of the logging/diagnosing functionality you can decide from the information of available loaded classes that there cannot be multiple different subclasses.
A VM may even change these compilation decisions during runtime, e.g. noticing that a given branch that has been used a lot earlier is not used anymore.
> > And finally, back to array checks, yes, they reduce performance, and yes, it is a different discussion,
> > but frankly, the performance reduction is pretty small and a lot of security bugs would be avoided by
> > array bound checks, it is not something I would left behind even if performance was a big concern.
>
> The performance cost is high if you happen to use arrays a lot. Even if you think it is worth it,
> and ignore the overhead as small enough, many of such costs add up to something quite large.
That may be true, although a lot of effort has been put into hoisting these checks out of critical loops, making them negligible. They still add up, but in total I believe you can get performance that is at least not too far off to a statically compiled program for a JIT'ed one (in general).
Such JIT'ed/VMs often also provide more features than what a statical language (can) provide, e.g. dynamic class loading and more advanced introspection/reflection, so a comparison may not be completely valid.
What is true, is that performance tuning in such a system is much harder, as it now depends on much more factors than in a statically compiled program.
V
Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on November 21, 2012 7:44 am wrote:
> EduardoS (no.delete@this.spam.com) on November 19, 2012 11:41 am wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on November 19, 2012 9:31 am wrote:
> > > Cache locality is certainly important, but GC doesn't solve that - in reality it actually makes it worse.
> >
> > Well... Cache locality improvements with GC is measurable... No need to discuss.
>
> Rubbish. GC is memory inefficient by definition, so claiming it is better for locality is just wishful
> thinking. Compacting GC's typically need 2-3 times more memory than a non-compacting GC, so are worse on
> average. Also the much higher memory allocation rate and resulting collections are bad for locality.
As for locality advantages, there is lot of research in that area refuting your claims:
Creating and preserving locality of java applications at allocation and garbage collection times
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.7067
Profile-guided proactive garbage collection for locality optimization
http://research.microsoft.com/en-us/um/people/trishulc/papers/halo.pdf
The garbage collection advantage: improving program locality
http://users.cecs.anu.edu.au/~steveb/downloads/pdf/oor-oopsla-2004.pdf
and many more which you can find either by using a dedicated portal (acm.org) or simply a search engine of your preference. Possible locality advantages due to the capability of moving around objects at will are straightforward to think of too.
"GC is by definition inefficient": you are right that GC by definition imposes some overhead, but you always have to weigh it against other factors like ease of programming and that it enables other optimizations like the above which may outweigh the disadvantages.
Compacting GCs are always more memory efficient than non-compacting GCs because in the worst case they can compact the heap perfectly at any time while non-compacting obviously cannot and suffer (sometimes a lot) from fragmentation - just like allocation in a non-gc'ed language. Compacting GC in the worst case need to do that at a ridiculous cost in performance of course.
The 2-3 times number you mention I only know as general guideline for sizing the available heap in a GC'ed VM, i.e. the overhead of gc starts getting negligible when you size the heap to 2-3 times the maximum amount of used memory.
This also depends a lot on your application, the language and the VM and the configuration used.
I do not think this figure is too far off from what you need for a non-trivial non-gc'ed program.
Allocation rates are not so much dependent on the JIT or VM but mostly influenced by the programming language (and the program itself): since eg. Java only knows heap allocated objects (which can be elided using program analysis to a certain degree). If it were possible to e.g. allocate on stack as it is with C++, a well written program would likely have a similar allocation rate.
I'm not talking about custom memory managers tailored for your application here - they are always better than any generic allocator if written reasonably, but really, allocation itself is not the problem for gc'ed languages you single them out.
> > > And that is before we consider the actual overhead of the
> > > collection itself, often having to stop all threads
> > > for long periods.
> >
> > No,
> >
> > 1) A background GC can run on another thread, specially usefull on not threaded software;
>
> Concurrent GC has even larger overheads. It stops threads for shorter periods but stops them more often, so
> takes far longer overall.
That I agree with, however as the original text said, in case of single threaded programs the GC can be done using extra resources. Of course, you also use more resources, but the net effect is that the program runs faster.
> And then we haven't considered the far higher overheads on the generated code.
>
You mean e.g. write barrier code? The overhead for them in the best case is negligible (depending on the type, e.g. "Barriers reconsidered, friendlier still" users.cecs.anu.edu.au/~steveb/downloads/pdf/barrier-ismm-2012.pdf; it presents lots of numbers), and a lot of work has been going into trying to avoid their generation as much as possible.
> > 2) A full stop still the less resource hungry GC, but don't look at it without
> > considering that, thanks to this GC allocations and deallocations are much faster
> > and heap is compacted periodically, in the end, often it is a win.
> >
> > > Then there is the optimization overhead and extra tables causing code bloat.
Not sure what you're mentioning here: gc implementers are quite aware of any extra overhead due to the gc helper data structures, both in extra memory and cacheability. This memory overhead typically ranges in the low single digit percent of total memory, and given enough memory the total performance overhead is not that much more either.
Depending on the GC they may have different (unacceptable to you) timing characteristics though.
> >
> > And C++ allocators waste space to avoid memory fragmentation... But doing so also hits locality.
>
> No, no space is wasted, unlike GC which requires descriptors for every object.
I do not understand what you mean with "descriptors"; the per-object overhead typically consists of a single word used for internal purposes, and another one that contains a reference to the class metadata (i.e. vtable).
The latter is similar to what any other OO language has, and the former is comparable to extra per-memory block information of your allocator. Typically normal allocators actually use a few words per block to store internal data, e.g. block length (at minimum), other data and various markers to detect invalid memory accesses. And certainly they do suffer from fragmentation.
Feel free to direct me to some information that shows the contrary.
I do not think a GC'ed system is at a disadvantage here; in the worst case you could use the same mechanism.
> > > array bounds checks,
> >
> > I was thinking about this one when you mentioned "other features", sometimes the
> > compiler is able to optimize it away and when not, last part of the post.
> >
> > > null pointer checks, assuming
> > > any pointer access may cause an exception,
> >
> > In x86 .Net this check is "cmp eax, [eax]" with the pointer in eax, on field access there
> > is no check at all since it will raise an exception anyway in the case of a null pointer.
> >
> > Since null pointer checks are so cheap it is not clear wich optimizations are disabled
> > by them, just put the check where it is needed to keep the correct order.
>
> Since when is a memory access cheap? Every unnecesary instruction has a cost.
>
> > > multithreading support etc etc.
> >
> > How exactly this lowers performance?
>
> The barriers and other checks for concurrent GC or multithreaded access
> to fields are not exactly zero-cost and block many optimizations.
>
> > > It is significantly harder to write a good compiler
> > > for them, and even then you can never get close to C++ performance. Many optimizations have to be
> > > disabled or turned extremely conservative as an exception or GC may occur at any time.
I am not sure whether it is significantly harder to write a good compiler for them: I agree that you need to spend more time to get to the same performance, but also because there are more components involved (e.g. VM/memory management).
However, there is already a body of previous work/research in that area that should give good results already, and you can also apply much of the existing research in static compilation optimizations.
Especially the compiler can be made a reasonably well encapsulated component of the whole system, while it's harder for other parts.
> managed languages are usually
> > more strict about ordering, and it is not obvious weak ordering improves performance by that much.
>
> The problem is not just the ordering, but the fact that more operations can cause exceptions. That alone
> creates a lot of overhead as you need to model flows from every possible exception to all possible exception
> handlers. Local variable values need to be preserved for example, severely limiting optimizations.
The exceptions can only occur at well-defined points, and while it may lead to scaling back optimizations, typically they are not. To a very large degree, after an exception many values are not of any interest to the program at all (i.e. unused in the remaining program) and so you need not keep them around or generate code that keeps them around at all.
(When really needed, e.g. when debugging, you typically fall back to an interpreter.)
In the contrary, optimizations are typically extremely aggressive to the point of removing seemingly unused code parts using lots of different types of information, keeping only a check and calls to recompile if even necessary. E.g. in the example of the logging/diagnosing functionality you can decide from the information of available loaded classes that there cannot be multiple different subclasses.
A VM may even change these compilation decisions during runtime, e.g. noticing that a given branch that has been used a lot earlier is not used anymore.
> > And finally, back to array checks, yes, they reduce performance, and yes, it is a different discussion,
> > but frankly, the performance reduction is pretty small and a lot of security bugs would be avoided by
> > array bound checks, it is not something I would left behind even if performance was a big concern.
>
> The performance cost is high if you happen to use arrays a lot. Even if you think it is worth it,
> and ignore the overhead as small enough, many of such costs add up to something quite large.
That may be true, although a lot of effort has been put into hoisting these checks out of critical loops, making them negligible. They still add up, but in total I believe you can get performance that is at least not too far off to a statically compiled program for a JIT'ed one (in general).
Such JIT'ed/VMs often also provide more features than what a statical language (can) provide, e.g. dynamic class loading and more advanced introspection/reflection, so a comparison may not be completely valid.
What is true, is that performance tuning in such a system is much harder, as it now depends on much more factors than in a statically compiled program.
V
Topic | Posted By | Date |
---|---|---|
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 12:04 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 12:44 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 01:04 AM |
MIPS MT-ASE | Paul A. Clayton | 2012/08/14 09:01 AM |
MONITOR/MWAIT | EduardoS | 2012/08/14 10:08 AM |
MWAIT not specifically MT | Paul A. Clayton | 2012/08/14 10:36 AM |
MWAIT not specifically MT | EduardoS | 2012/08/15 03:16 PM |
MONITOR/MWAIT | anonymou5 | 2012/08/14 11:07 AM |
MONITOR/MWAIT | EduardoS | 2012/08/15 03:20 PM |
MIPS MT-ASE | rwessel | 2012/08/14 10:14 AM |
New Article: ARM Goes 64-bit | SHK | 2012/08/14 02:01 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 02:37 AM |
New Article: ARM Goes 64-bit | Richard Cownie | 2012/08/14 03:57 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 04:29 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 04:44 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 05:28 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 05:32 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/08/14 06:06 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 05:40 AM |
AArch64 select better than cmov | Paul A. Clayton | 2012/08/14 06:08 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 06:12 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 06:25 AM |
Predicated ld/store are useful | Paul A. Clayton | 2012/08/14 06:48 AM |
Predicated ld/store are useful | none | 2012/08/14 06:56 AM |
Predicated ld/store are useful | anon | 2012/08/14 07:07 AM |
Predicated stores might not be that bad | Paul A. Clayton | 2012/08/14 07:27 AM |
Predicated stores might not be that bad | David Kanter | 2012/08/15 01:14 AM |
Predicated stores might not be that bad | Michael S | 2012/08/15 11:41 AM |
Predicated stores might not be that bad | R Byron | 2012/08/17 04:09 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 06:54 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 07:04 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 07:43 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/08/14 06:07 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 06:20 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 06:29 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 07:00 AM |
New Article: ARM Goes 64-bit | Michael S | 2012/08/14 03:43 PM |
New Article: ARM Goes 64-bit | Richard Cownie | 2012/08/14 06:53 AM |
OT: Conrad's "Youth" | Richard Cownie | 2012/08/14 07:20 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/08/14 06:04 AM |
New Article: ARM Goes 64-bit | mpx | 2012/08/14 08:59 AM |
New Article: ARM Goes 64-bit | Antti-Ville Tuunainen | 2012/08/14 09:16 AM |
New Article: ARM Goes 64-bit | anonymou5 | 2012/08/14 11:03 AM |
New Article: ARM Goes 64-bit | name99 | 2012/11/17 03:31 PM |
Microarchitecting a counter register | Paul A. Clayton | 2012/11/17 07:37 PM |
New Article: ARM Goes 64-bit | bakaneko | 2012/08/14 04:21 AM |
New Article: ARM Goes 64-bit | name99 | 2012/11/17 03:40 PM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/17 04:52 PM |
New Article: ARM Goes 64-bit | Doug S | 2012/11/17 05:48 PM |
New Article: ARM Goes 64-bit | bakaneko | 2012/11/18 05:40 PM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/19 07:59 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/19 08:23 AM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/19 09:31 AM |
Downloading µarch-specific binaries? | Paul A. Clayton | 2012/11/19 11:21 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/19 11:41 AM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/21 07:44 AM |
JIT vs. static compilation (Was: New Article: ARM Goes 64-bit) | VMguy | 2012/11/22 03:21 AM |
JIT vs. static compilation (Was: New Article: ARM Goes 64-bit) | David Kanter | 2012/11/22 12:12 PM |
JIT vs. static compilation (Was: New Article: ARM Goes 64-bit) | Gabriele Svelto | 2012/11/23 03:50 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/23 10:09 AM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/26 01:24 AM |
New Article: ARM Goes 64-bit | Gabriele Svelto | 2012/11/26 03:33 AM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/27 11:17 PM |
New Article: ARM Goes 64-bit | Gabriele Svelto | 2012/11/28 02:32 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/26 12:16 PM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/28 12:33 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 05:53 AM |
New Article: ARM Goes 64-bit | Michael S | 2012/11/28 06:15 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 07:33 AM |
New Article: ARM Goes 64-bit | Michael S | 2012/11/28 09:16 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 09:53 AM |
New Article: ARM Goes 64-bit | Eugene Nalimov | 2012/11/28 05:58 PM |
Amazing! | EduardoS | 2012/11/28 07:25 PM |
Amazing! (non-italic response) | EduardoS | 2012/11/28 07:25 PM |
Amazing! | EBFE | 2012/11/28 08:20 PM |
Undefined behaviour doubles down | EduardoS | 2012/11/28 09:10 PM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/28 07:54 PM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 09:21 PM |
Have you heard of Transmeta? | David Kanter | 2012/11/19 03:47 PM |
New Article: ARM Goes 64-bit | bakaneko | 2012/11/19 09:08 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/11/19 03:40 PM |
Semantic Dictionary Encoding | Ray | 2012/11/19 10:37 PM |
New Article: ARM Goes 64-bit | Rohit | 2012/11/20 04:48 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/11/20 11:07 PM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/21 06:41 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/11/21 10:12 AM |
A JIT example | Mark Roulo | 2012/11/21 10:30 AM |
A JIT example | Wilco | 2012/11/21 07:04 PM |
A JIT example | rwessel | 2012/11/21 09:05 PM |
A JIT example | Gabriele Svelto | 2012/11/23 03:53 AM |
A JIT example | EduardoS | 2012/11/23 10:13 AM |
A JIT example | Wilco | 2012/11/23 01:41 PM |
A JIT example | EduardoS | 2012/11/23 02:06 PM |
A JIT example | Gabriele Svelto | 2012/11/23 04:09 PM |
A JIT example | Symmetry | 2012/11/26 05:58 AM |
New Article: ARM Goes 64-bit | Ray | 2012/11/19 10:27 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 09:11 AM |
v7-M is Thumb-only | Paul A. Clayton | 2012/08/14 06:58 AM |
Minor suggested correction | Paul A. Clayton | 2012/08/14 08:33 AM |
Minor suggested correction | anon | 2012/08/14 08:57 AM |
New Article: ARM Goes 64-bit | Exophase | 2012/08/14 08:33 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 09:16 AM |
New Article: ARM Goes 64-bit | jigal | 2012/08/15 01:49 PM |
Correction re ARM and BBC Micro | Paul | 2012/08/14 08:59 PM |
Correction re ARM and BBC Micro | Per Hesselgren | 2012/08/15 03:27 AM |
Memory BW so low | Per Hesselgren | 2012/08/15 03:14 AM |
Memory BW so low | none | 2012/08/15 11:16 AM |
New Article: ARM Goes 64-bit | dado | 2012/08/15 10:25 AM |
Number of GPRs | Kenneth Jonsson | 2012/08/16 02:35 PM |
Number of GPRs | Exophase | 2012/08/16 02:52 PM |
Number of GPRs | Kenneth Jonsson | 2012/08/17 02:41 AM |
Ooops, missing link... | Kenneth Jonsson | 2012/08/17 02:44 AM |
64-bit pointers eat some performance | Paul A. Clayton | 2012/08/17 06:19 AM |
64-bit pointers eat some performance | bakaneko | 2012/08/17 08:37 AM |
Brute force seems to work | Paul A. Clayton | 2012/08/17 10:08 AM |
Brute force seems to work | bakaneko | 2012/08/17 11:15 AM |
64-bit pointers eat some performance | Richard Cownie | 2012/08/17 08:46 AM |
Pointer compression is atypical | Paul A. Clayton | 2012/08/17 10:43 AM |
Pointer compression is atypical | Richard Cownie | 2012/08/17 12:57 PM |
Pointer compression is atypical | Howard Chu | 2012/08/22 10:17 PM |
Pointer compression is atypical | Richard Cownie | 2012/08/23 04:48 AM |
Pointer compression is atypical | Howard Chu | 2012/08/23 06:51 AM |
Pointer compression is atypical | Wilco | 2012/08/17 02:41 PM |
Pointer compression is atypical | Richard Cownie | 2012/08/17 04:13 PM |
Pointer compression is atypical | Ricardo B | 2012/08/19 10:44 AM |
Pointer compression is atypical | Howard Chu | 2012/08/22 10:08 PM |
Unified libraries? | Paul A. Clayton | 2012/08/23 07:49 AM |
Pointer compression is atypical | Richard Cownie | 2012/08/23 08:44 AM |
Pointer compression is atypical | Howard Chu | 2012/08/23 05:17 PM |
Pointer compression is atypical | anon | 2012/08/23 08:15 PM |
Pointer compression is atypical | Howard Chu | 2012/08/23 09:33 PM |
64-bit pointers eat some performance | Foo_ | 2012/08/18 12:09 PM |
64-bit pointers eat some performance | Richard Cownie | 2012/08/18 05:25 PM |
64-bit pointers eat some performance | Richard Cownie | 2012/08/18 05:32 PM |
Page-related benefit of small pointers | Paul A. Clayton | 2012/08/23 08:36 AM |
Number of GPRs | Wilco | 2012/08/17 06:31 AM |
Number of GPRs | Kenneth Jonsson | 2012/08/17 11:54 AM |
Number of GPRs | Exophase | 2012/08/17 12:44 PM |
Number of GPRs | Kenneth Jonsson | 2012/08/17 01:22 PM |
Number of GPRs | Wilco | 2012/08/17 02:53 PM |
What about dynamic utilization? | Exophase | 2012/08/17 09:30 AM |
Compiler vs. assembly aliasing knowledge? | Paul A. Clayton | 2012/08/17 10:20 AM |
Compiler vs. assembly aliasing knowledge? | Exophase | 2012/08/17 11:09 AM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/18 02:23 AM |
Compiler vs. assembly aliasing knowledge? | Ricardo B | 2012/08/19 11:02 AM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/19 06:07 PM |
Compiler vs. assembly aliasing knowledge? | Ricardo B | 2012/08/19 07:26 PM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/19 10:03 PM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/20 01:59 AM |
Number of GPRs | David Kanter | 2012/08/17 12:46 PM |
RAT issues as part of reason 1 | Paul A. Clayton | 2012/08/17 02:18 PM |
Number of GPRs | name99 | 2012/11/17 06:37 PM |
Large ARFs increase renaming cost | Paul A. Clayton | 2012/11/17 09:23 PM |
Number of GPRs | David Kanter | 2012/08/16 03:31 PM |
Number of GPRs | Richard Cownie | 2012/08/16 05:17 PM |
32 GPRs ~2-3% | Paul A. Clayton | 2012/08/16 06:27 PM |
Oops, Message-ID: aaed6e38-c7bd-467e-ba41-f40cf1020e5e@googlegroups.com (NT) | Paul A. Clayton | 2012/08/16 06:29 PM |
32 GPRs ~2-3% | Exophase | 2012/08/16 10:06 PM |
R31 as SP/zero is kind of neat (NT) | Paul A. Clayton | 2012/08/17 06:23 AM |
32 GPRs ~2-3% | rwessel | 2012/08/17 08:24 AM |
32 GPRs ~2-3% | Exophase | 2012/08/17 09:16 AM |
32 GPRs ~2-3% | Max | 2012/08/17 04:19 PM |
32 GPRs ~2-3% | name99 | 2012/11/17 07:43 PM |
Number of GPRs | mpx | 2012/08/17 01:11 AM |
Latency and power | Paul A. Clayton | 2012/08/17 06:54 AM |
Number of GPRs | bakaneko | 2012/08/17 03:09 AM |
New Article: ARM Goes 64-bit | Steve | 2012/08/17 02:12 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/19 12:42 PM |
New Article: ARM Goes 64-bit | Doug S | 2012/08/19 02:02 PM |
New Article: ARM Goes 64-bit | Anon | 2012/08/19 07:16 PM |
New Article: ARM Goes 64-bit | Steve | 2012/08/30 07:51 AM |
Scalar vs Vector registers | Robert David Graham | 2012/08/19 05:19 PM |
Scalar vs Vector registers | David Kanter | 2012/08/19 05:29 PM |
New Article: ARM Goes 64-bit | Baserock ARM servers | 2012/08/21 04:13 PM |
Baserock ARM servers | Sysanon | 2012/08/21 04:14 PM |
A-15 virtualization and LPAE? | Paul A. Clayton | 2012/08/21 06:13 PM |
A-15 virtualization and LPAE? | Anon | 2012/08/21 07:13 PM |
Half-depth advantages? | Paul A. Clayton | 2012/08/21 08:42 PM |
Half-depth advantages? | Anon | 2012/08/22 03:33 PM |
Thanks for the information (NT) | Paul A. Clayton | 2012/08/22 04:04 PM |
A-15 virtualization and LPAE? | C. Ladisch | 2012/08/23 11:12 AM |
A-15 virtualization and LPAE? | Paul | 2012/08/23 03:17 PM |
Excessive pessimism | Paul A. Clayton | 2012/08/23 04:08 PM |
Excessive pessimism | David Kanter | 2012/08/23 05:05 PM |
New Article: ARM Goes 64-bit | Michael S | 2012/08/22 07:12 AM |
BTW, Baserock==product, Codethink==company (NT) | Paul A. Clayton | 2012/08/22 08:56 AM |
New Article: ARM Goes 64-bit | Reinoud Zandijk | 2012/08/21 11:27 PM |
New Article: ARM Goes 64-bit | Robert Pearson | 2021/07/26 09:11 AM |
New Article: ARM Goes 64-bit | anon | 2021/07/26 11:03 AM |
New Article: ARM Goes 64-bit | none | 2021/07/26 11:45 PM |
New Article: ARM Goes 64-bit | dmcq | 2021/07/27 07:36 AM |
New Article: ARM Goes 64-bit | Chester | 2021/07/27 01:21 PM |
New Article: ARM Goes 64-bit | none | 2021/07/27 10:37 PM |
New Article: ARM Goes 64-bit | anon | 2021/07/26 11:04 AM |