Article: ARM Goes 64-bit
By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), November 19, 2012 9:31 am
Room: Moderated Discussions
EduardoS (no.delete@this.spam.com) on November 19, 2012 8:23 am wrote:
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on November 19, 2012 7:59 am wrote:
> > As a compiler expert I'm sceptical that virtual machines can ever deliver close to native performance. The
> > key issue with JIT compilation is that you just do not have the resources of a native compiler. It doesn't
> > help that many intermediate bytecodes are completely unoptimized, making the problem far worse (what is the
> > point of Java compilers emitting x * 0 in byte codes?!?).
> > What you actually want to do is to perform as much
> > work as possible in the static compiler and end up with
> > a highly optimized intermediate code that just needs
> > to be converted into the final target assembly with a quick
> > peephole pass. For example you want to do register
> > allocation to a virtual 16-register target (so it maps trivially to Thumb-2, ARM64 and x64).
>
> Yep, JITs will hardly outperform time consuming compilers, but what prevents a virtual machine
> from using a background compiling service to store native images like .Net ngen does?
That certainly helps, but you're still doing it multiple times on targets which may not have a lot of resources. What is more efficient - every phone compiling every bit of software you download, or just run a highly optimized binary which was compiled once with optimal settings on fast hardware?
So with JIT compilation you get the double whammy of having to waste energy on compilation as well as waste energy due to this compilation not being optimal.
> > The other issue is C++ vs managed languages. GC, exceptions and other features have a lot of overhead
> > (even when unused), so you're never going to be anywhere near as fast as a well written C++ program.
>
> First, C++ also have exceptions, they does not reduce performance but
> also remove the exception handling code from the critical path.
Exceptions most definitely reduce performance, even for code that doesn't use it. And that is true for C++ too, but to a lesser extent. At ARM we put a lot of effort into compiling C++ with exceptions almost as efficiently as without, so it is possible to get close. But in some languages it gets really difficult as just about any operation can cause an exception, blocking most optimizations.
> Second, today, in the real world, where cache locality becomes more important than micro-optimizations
> GC actually improves performance in some cases over simple memory management, sure, one can
> implement a more complex memory management in C++, one can even implement GC in C++, it can
> be done in assembly as well, but it is so complex that nobody does.
Cache locality is certainly important, but GC doesn't solve that - in reality it actually makes it worse. And that is before we consider the actual overhead of the collection itself, often having to stop all threads for long periods. Then there is the optimization overhead and extra tables causing code bloat.
> So only left is the overhead of other features wich you didn't even listed.
For example arithmetic with overflow, array bounds checks, null pointer checks, assuming any pointer access may cause an exception, multithreading support etc etc.
> And you completly forgot to mention the uncomparable effort already spent to make C++ compilers.
It's not like languages like Java or C# are new. It is significantly harder to write a good compiler for them, and even then you can never get close to C++ performance. Many optimizations have to be disabled or turned extremely conservative as an exception or GC may occur at any time.
Obviously you could argue all the overhead is worth it as some of the features allow programmers to write code faster. Whether that is a good or a bad thing is a different discussion altogether...
Wilco
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on November 19, 2012 7:59 am wrote:
> > As a compiler expert I'm sceptical that virtual machines can ever deliver close to native performance. The
> > key issue with JIT compilation is that you just do not have the resources of a native compiler. It doesn't
> > help that many intermediate bytecodes are completely unoptimized, making the problem far worse (what is the
> > point of Java compilers emitting x * 0 in byte codes?!?).
> > What you actually want to do is to perform as much
> > work as possible in the static compiler and end up with
> > a highly optimized intermediate code that just needs
> > to be converted into the final target assembly with a quick
> > peephole pass. For example you want to do register
> > allocation to a virtual 16-register target (so it maps trivially to Thumb-2, ARM64 and x64).
>
> Yep, JITs will hardly outperform time consuming compilers, but what prevents a virtual machine
> from using a background compiling service to store native images like .Net ngen does?
That certainly helps, but you're still doing it multiple times on targets which may not have a lot of resources. What is more efficient - every phone compiling every bit of software you download, or just run a highly optimized binary which was compiled once with optimal settings on fast hardware?
So with JIT compilation you get the double whammy of having to waste energy on compilation as well as waste energy due to this compilation not being optimal.
> > The other issue is C++ vs managed languages. GC, exceptions and other features have a lot of overhead
> > (even when unused), so you're never going to be anywhere near as fast as a well written C++ program.
>
> First, C++ also have exceptions, they does not reduce performance but
> also remove the exception handling code from the critical path.
Exceptions most definitely reduce performance, even for code that doesn't use it. And that is true for C++ too, but to a lesser extent. At ARM we put a lot of effort into compiling C++ with exceptions almost as efficiently as without, so it is possible to get close. But in some languages it gets really difficult as just about any operation can cause an exception, blocking most optimizations.
> Second, today, in the real world, where cache locality becomes more important than micro-optimizations
> GC actually improves performance in some cases over simple memory management, sure, one can
> implement a more complex memory management in C++, one can even implement GC in C++, it can
> be done in assembly as well, but it is so complex that nobody does.
Cache locality is certainly important, but GC doesn't solve that - in reality it actually makes it worse. And that is before we consider the actual overhead of the collection itself, often having to stop all threads for long periods. Then there is the optimization overhead and extra tables causing code bloat.
> So only left is the overhead of other features wich you didn't even listed.
For example arithmetic with overflow, array bounds checks, null pointer checks, assuming any pointer access may cause an exception, multithreading support etc etc.
> And you completly forgot to mention the uncomparable effort already spent to make C++ compilers.
It's not like languages like Java or C# are new. It is significantly harder to write a good compiler for them, and even then you can never get close to C++ performance. Many optimizations have to be disabled or turned extremely conservative as an exception or GC may occur at any time.
Obviously you could argue all the overhead is worth it as some of the features allow programmers to write code faster. Whether that is a good or a bad thing is a different discussion altogether...
Wilco
Topic | Posted By | Date |
---|---|---|
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 12:04 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 12:44 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 01:04 AM |
MIPS MT-ASE | Paul A. Clayton | 2012/08/14 09:01 AM |
MONITOR/MWAIT | EduardoS | 2012/08/14 10:08 AM |
MWAIT not specifically MT | Paul A. Clayton | 2012/08/14 10:36 AM |
MWAIT not specifically MT | EduardoS | 2012/08/15 03:16 PM |
MONITOR/MWAIT | anonymou5 | 2012/08/14 11:07 AM |
MONITOR/MWAIT | EduardoS | 2012/08/15 03:20 PM |
MIPS MT-ASE | rwessel | 2012/08/14 10:14 AM |
New Article: ARM Goes 64-bit | SHK | 2012/08/14 02:01 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 02:37 AM |
New Article: ARM Goes 64-bit | Richard Cownie | 2012/08/14 03:57 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 04:29 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 04:44 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 05:28 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 05:32 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/08/14 06:06 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 05:40 AM |
AArch64 select better than cmov | Paul A. Clayton | 2012/08/14 06:08 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 06:12 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 06:25 AM |
Predicated ld/store are useful | Paul A. Clayton | 2012/08/14 06:48 AM |
Predicated ld/store are useful | none | 2012/08/14 06:56 AM |
Predicated ld/store are useful | anon | 2012/08/14 07:07 AM |
Predicated stores might not be that bad | Paul A. Clayton | 2012/08/14 07:27 AM |
Predicated stores might not be that bad | David Kanter | 2012/08/15 01:14 AM |
Predicated stores might not be that bad | Michael S | 2012/08/15 11:41 AM |
Predicated stores might not be that bad | R Byron | 2012/08/17 04:09 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 06:54 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 07:04 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 07:43 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/08/14 06:07 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 06:20 AM |
New Article: ARM Goes 64-bit | none | 2012/08/14 06:29 AM |
New Article: ARM Goes 64-bit | anon | 2012/08/14 07:00 AM |
New Article: ARM Goes 64-bit | Michael S | 2012/08/14 03:43 PM |
New Article: ARM Goes 64-bit | Richard Cownie | 2012/08/14 06:53 AM |
OT: Conrad's "Youth" | Richard Cownie | 2012/08/14 07:20 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/08/14 06:04 AM |
New Article: ARM Goes 64-bit | mpx | 2012/08/14 08:59 AM |
New Article: ARM Goes 64-bit | Antti-Ville Tuunainen | 2012/08/14 09:16 AM |
New Article: ARM Goes 64-bit | anonymou5 | 2012/08/14 11:03 AM |
New Article: ARM Goes 64-bit | name99 | 2012/11/17 03:31 PM |
Microarchitecting a counter register | Paul A. Clayton | 2012/11/17 07:37 PM |
New Article: ARM Goes 64-bit | bakaneko | 2012/08/14 04:21 AM |
New Article: ARM Goes 64-bit | name99 | 2012/11/17 03:40 PM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/17 04:52 PM |
New Article: ARM Goes 64-bit | Doug S | 2012/11/17 05:48 PM |
New Article: ARM Goes 64-bit | bakaneko | 2012/11/18 05:40 PM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/19 07:59 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/19 08:23 AM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/19 09:31 AM |
Downloading µarch-specific binaries? | Paul A. Clayton | 2012/11/19 11:21 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/19 11:41 AM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/21 07:44 AM |
JIT vs. static compilation (Was: New Article: ARM Goes 64-bit) | VMguy | 2012/11/22 03:21 AM |
JIT vs. static compilation (Was: New Article: ARM Goes 64-bit) | David Kanter | 2012/11/22 12:12 PM |
JIT vs. static compilation (Was: New Article: ARM Goes 64-bit) | Gabriele Svelto | 2012/11/23 03:50 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/23 10:09 AM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/26 01:24 AM |
New Article: ARM Goes 64-bit | Gabriele Svelto | 2012/11/26 03:33 AM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/27 11:17 PM |
New Article: ARM Goes 64-bit | Gabriele Svelto | 2012/11/28 02:32 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/26 12:16 PM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/28 12:33 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 05:53 AM |
New Article: ARM Goes 64-bit | Michael S | 2012/11/28 06:15 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 07:33 AM |
New Article: ARM Goes 64-bit | Michael S | 2012/11/28 09:16 AM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 09:53 AM |
New Article: ARM Goes 64-bit | Eugene Nalimov | 2012/11/28 05:58 PM |
Amazing! | EduardoS | 2012/11/28 07:25 PM |
Amazing! (non-italic response) | EduardoS | 2012/11/28 07:25 PM |
Amazing! | EBFE | 2012/11/28 08:20 PM |
Undefined behaviour doubles down | EduardoS | 2012/11/28 09:10 PM |
New Article: ARM Goes 64-bit | EBFE | 2012/11/28 07:54 PM |
New Article: ARM Goes 64-bit | EduardoS | 2012/11/28 09:21 PM |
Have you heard of Transmeta? | David Kanter | 2012/11/19 03:47 PM |
New Article: ARM Goes 64-bit | bakaneko | 2012/11/19 09:08 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/11/19 03:40 PM |
Semantic Dictionary Encoding | Ray | 2012/11/19 10:37 PM |
New Article: ARM Goes 64-bit | Rohit | 2012/11/20 04:48 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/11/20 11:07 PM |
New Article: ARM Goes 64-bit | Wilco | 2012/11/21 06:41 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/11/21 10:12 AM |
A JIT example | Mark Roulo | 2012/11/21 10:30 AM |
A JIT example | Wilco | 2012/11/21 07:04 PM |
A JIT example | rwessel | 2012/11/21 09:05 PM |
A JIT example | Gabriele Svelto | 2012/11/23 03:53 AM |
A JIT example | EduardoS | 2012/11/23 10:13 AM |
A JIT example | Wilco | 2012/11/23 01:41 PM |
A JIT example | EduardoS | 2012/11/23 02:06 PM |
A JIT example | Gabriele Svelto | 2012/11/23 04:09 PM |
A JIT example | Symmetry | 2012/11/26 05:58 AM |
New Article: ARM Goes 64-bit | Ray | 2012/11/19 10:27 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 09:11 AM |
v7-M is Thumb-only | Paul A. Clayton | 2012/08/14 06:58 AM |
Minor suggested correction | Paul A. Clayton | 2012/08/14 08:33 AM |
Minor suggested correction | anon | 2012/08/14 08:57 AM |
New Article: ARM Goes 64-bit | Exophase | 2012/08/14 08:33 AM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/14 09:16 AM |
New Article: ARM Goes 64-bit | jigal | 2012/08/15 01:49 PM |
Correction re ARM and BBC Micro | Paul | 2012/08/14 08:59 PM |
Correction re ARM and BBC Micro | Per Hesselgren | 2012/08/15 03:27 AM |
Memory BW so low | Per Hesselgren | 2012/08/15 03:14 AM |
Memory BW so low | none | 2012/08/15 11:16 AM |
New Article: ARM Goes 64-bit | dado | 2012/08/15 10:25 AM |
Number of GPRs | Kenneth Jonsson | 2012/08/16 02:35 PM |
Number of GPRs | Exophase | 2012/08/16 02:52 PM |
Number of GPRs | Kenneth Jonsson | 2012/08/17 02:41 AM |
Ooops, missing link... | Kenneth Jonsson | 2012/08/17 02:44 AM |
64-bit pointers eat some performance | Paul A. Clayton | 2012/08/17 06:19 AM |
64-bit pointers eat some performance | bakaneko | 2012/08/17 08:37 AM |
Brute force seems to work | Paul A. Clayton | 2012/08/17 10:08 AM |
Brute force seems to work | bakaneko | 2012/08/17 11:15 AM |
64-bit pointers eat some performance | Richard Cownie | 2012/08/17 08:46 AM |
Pointer compression is atypical | Paul A. Clayton | 2012/08/17 10:43 AM |
Pointer compression is atypical | Richard Cownie | 2012/08/17 12:57 PM |
Pointer compression is atypical | Howard Chu | 2012/08/22 10:17 PM |
Pointer compression is atypical | Richard Cownie | 2012/08/23 04:48 AM |
Pointer compression is atypical | Howard Chu | 2012/08/23 06:51 AM |
Pointer compression is atypical | Wilco | 2012/08/17 02:41 PM |
Pointer compression is atypical | Richard Cownie | 2012/08/17 04:13 PM |
Pointer compression is atypical | Ricardo B | 2012/08/19 10:44 AM |
Pointer compression is atypical | Howard Chu | 2012/08/22 10:08 PM |
Unified libraries? | Paul A. Clayton | 2012/08/23 07:49 AM |
Pointer compression is atypical | Richard Cownie | 2012/08/23 08:44 AM |
Pointer compression is atypical | Howard Chu | 2012/08/23 05:17 PM |
Pointer compression is atypical | anon | 2012/08/23 08:15 PM |
Pointer compression is atypical | Howard Chu | 2012/08/23 09:33 PM |
64-bit pointers eat some performance | Foo_ | 2012/08/18 12:09 PM |
64-bit pointers eat some performance | Richard Cownie | 2012/08/18 05:25 PM |
64-bit pointers eat some performance | Richard Cownie | 2012/08/18 05:32 PM |
Page-related benefit of small pointers | Paul A. Clayton | 2012/08/23 08:36 AM |
Number of GPRs | Wilco | 2012/08/17 06:31 AM |
Number of GPRs | Kenneth Jonsson | 2012/08/17 11:54 AM |
Number of GPRs | Exophase | 2012/08/17 12:44 PM |
Number of GPRs | Kenneth Jonsson | 2012/08/17 01:22 PM |
Number of GPRs | Wilco | 2012/08/17 02:53 PM |
What about dynamic utilization? | Exophase | 2012/08/17 09:30 AM |
Compiler vs. assembly aliasing knowledge? | Paul A. Clayton | 2012/08/17 10:20 AM |
Compiler vs. assembly aliasing knowledge? | Exophase | 2012/08/17 11:09 AM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/18 02:23 AM |
Compiler vs. assembly aliasing knowledge? | Ricardo B | 2012/08/19 11:02 AM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/19 06:07 PM |
Compiler vs. assembly aliasing knowledge? | Ricardo B | 2012/08/19 07:26 PM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/19 10:03 PM |
Compiler vs. assembly aliasing knowledge? | anon | 2012/08/20 01:59 AM |
Number of GPRs | David Kanter | 2012/08/17 12:46 PM |
RAT issues as part of reason 1 | Paul A. Clayton | 2012/08/17 02:18 PM |
Number of GPRs | name99 | 2012/11/17 06:37 PM |
Large ARFs increase renaming cost | Paul A. Clayton | 2012/11/17 09:23 PM |
Number of GPRs | David Kanter | 2012/08/16 03:31 PM |
Number of GPRs | Richard Cownie | 2012/08/16 05:17 PM |
32 GPRs ~2-3% | Paul A. Clayton | 2012/08/16 06:27 PM |
Oops, Message-ID: aaed6e38-c7bd-467e-ba41-f40cf1020e5e@googlegroups.com (NT) | Paul A. Clayton | 2012/08/16 06:29 PM |
32 GPRs ~2-3% | Exophase | 2012/08/16 10:06 PM |
R31 as SP/zero is kind of neat (NT) | Paul A. Clayton | 2012/08/17 06:23 AM |
32 GPRs ~2-3% | rwessel | 2012/08/17 08:24 AM |
32 GPRs ~2-3% | Exophase | 2012/08/17 09:16 AM |
32 GPRs ~2-3% | Max | 2012/08/17 04:19 PM |
32 GPRs ~2-3% | name99 | 2012/11/17 07:43 PM |
Number of GPRs | mpx | 2012/08/17 01:11 AM |
Latency and power | Paul A. Clayton | 2012/08/17 06:54 AM |
Number of GPRs | bakaneko | 2012/08/17 03:09 AM |
New Article: ARM Goes 64-bit | Steve | 2012/08/17 02:12 PM |
New Article: ARM Goes 64-bit | David Kanter | 2012/08/19 12:42 PM |
New Article: ARM Goes 64-bit | Doug S | 2012/08/19 02:02 PM |
New Article: ARM Goes 64-bit | Anon | 2012/08/19 07:16 PM |
New Article: ARM Goes 64-bit | Steve | 2012/08/30 07:51 AM |
Scalar vs Vector registers | Robert David Graham | 2012/08/19 05:19 PM |
Scalar vs Vector registers | David Kanter | 2012/08/19 05:29 PM |
New Article: ARM Goes 64-bit | Baserock ARM servers | 2012/08/21 04:13 PM |
Baserock ARM servers | Sysanon | 2012/08/21 04:14 PM |
A-15 virtualization and LPAE? | Paul A. Clayton | 2012/08/21 06:13 PM |
A-15 virtualization and LPAE? | Anon | 2012/08/21 07:13 PM |
Half-depth advantages? | Paul A. Clayton | 2012/08/21 08:42 PM |
Half-depth advantages? | Anon | 2012/08/22 03:33 PM |
Thanks for the information (NT) | Paul A. Clayton | 2012/08/22 04:04 PM |
A-15 virtualization and LPAE? | C. Ladisch | 2012/08/23 11:12 AM |
A-15 virtualization and LPAE? | Paul | 2012/08/23 03:17 PM |
Excessive pessimism | Paul A. Clayton | 2012/08/23 04:08 PM |
Excessive pessimism | David Kanter | 2012/08/23 05:05 PM |
New Article: ARM Goes 64-bit | Michael S | 2012/08/22 07:12 AM |
BTW, Baserock==product, Codethink==company (NT) | Paul A. Clayton | 2012/08/22 08:56 AM |
New Article: ARM Goes 64-bit | Reinoud Zandijk | 2012/08/21 11:27 PM |
New Article: ARM Goes 64-bit | Robert Pearson | 2021/07/26 09:11 AM |
New Article: ARM Goes 64-bit | anon | 2021/07/26 11:03 AM |
New Article: ARM Goes 64-bit | none | 2021/07/26 11:45 PM |
New Article: ARM Goes 64-bit | dmcq | 2021/07/27 07:36 AM |
New Article: ARM Goes 64-bit | Chester | 2021/07/27 01:21 PM |
New Article: ARM Goes 64-bit | none | 2021/07/27 10:37 PM |
New Article: ARM Goes 64-bit | anon | 2021/07/26 11:04 AM |