JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)

Article: ARM Goes 64-bit
By: VMguy (vmguy.not.here.delete@this.unknown.net), November 22, 2012 3:21 am
Room: Moderated Discussions
Hi,

Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on November 21, 2012 7:44 am wrote:
> EduardoS (no.delete@this.spam.com) on November 19, 2012 11:41 am wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on November 19, 2012 9:31 am wrote:
> > > Cache locality is certainly important, but GC doesn't solve that - in reality it actually makes it worse.
> >
> > Well... Cache locality improvements with GC is measurable... No need to discuss.
>
> Rubbish. GC is memory inefficient by definition, so claiming it is better for locality is just wishful
> thinking. Compacting GC's typically need 2-3 times more memory than a non-compacting GC, so are worse on
> average. Also the much higher memory allocation rate and resulting collections are bad for locality.

As for locality advantages, there is lot of research in that area refuting your claims:

Creating and preserving locality of java applications at allocation and garbage collection times
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.7067

Profile-guided proactive garbage collection for locality optimization
http://research.microsoft.com/en-us/um/people/trishulc/papers/halo.pdf

The garbage collection advantage: improving program locality
http://users.cecs.anu.edu.au/~steveb/downloads/pdf/oor-oopsla-2004.pdf

and many more which you can find either by using a dedicated portal (acm.org) or simply a search engine of your preference. Possible locality advantages due to the capability of moving around objects at will are straightforward to think of too.

"GC is by definition inefficient": you are right that GC by definition imposes some overhead, but you always have to weigh it against other factors like ease of programming and that it enables other optimizations like the above which may outweigh the disadvantages.

Compacting GCs are always more memory efficient than non-compacting GCs because in the worst case they can compact the heap perfectly at any time while non-compacting obviously cannot and suffer (sometimes a lot) from fragmentation - just like allocation in a non-gc'ed language. Compacting GC in the worst case need to do that at a ridiculous cost in performance of course.

The 2-3 times number you mention I only know as general guideline for sizing the available heap in a GC'ed VM, i.e. the overhead of gc starts getting negligible when you size the heap to 2-3 times the maximum amount of used memory.
This also depends a lot on your application, the language and the VM and the configuration used.

I do not think this figure is too far off from what you need for a non-trivial non-gc'ed program.

Allocation rates are not so much dependent on the JIT or VM but mostly influenced by the programming language (and the program itself): since eg. Java only knows heap allocated objects (which can be elided using program analysis to a certain degree). If it were possible to e.g. allocate on stack as it is with C++, a well written program would likely have a similar allocation rate.

I'm not talking about custom memory managers tailored for your application here - they are always better than any generic allocator if written reasonably, but really, allocation itself is not the problem for gc'ed languages you single them out.

> > > And that is before we consider the actual overhead of the
> > > collection itself, often having to stop all threads
> > > for long periods.
> >
> > No,
> >
> > 1) A background GC can run on another thread, specially usefull on not threaded software;
>
> Concurrent GC has even larger overheads. It stops threads for shorter periods but stops them more often, so
> takes far longer overall.

That I agree with, however as the original text said, in case of single threaded programs the GC can be done using extra resources. Of course, you also use more resources, but the net effect is that the program runs faster.

> And then we haven't considered the far higher overheads on the generated code.
>

You mean e.g. write barrier code? The overhead for them in the best case is negligible (depending on the type, e.g. "Barriers reconsidered, friendlier still" users.cecs.anu.edu.au/~steveb/downloads/pdf/barrier-ismm-2012.pdf; it presents lots of numbers), and a lot of work has been going into trying to avoid their generation as much as possible.

> > 2) A full stop still the less resource hungry GC, but don't look at it without
> > considering that, thanks to this GC allocations and deallocations are much faster
> > and heap is compacted periodically, in the end, often it is a win.
> >
> > > Then there is the optimization overhead and extra tables causing code bloat.

Not sure what you're mentioning here: gc implementers are quite aware of any extra overhead due to the gc helper data structures, both in extra memory and cacheability. This memory overhead typically ranges in the low single digit percent of total memory, and given enough memory the total performance overhead is not that much more either.

Depending on the GC they may have different (unacceptable to you) timing characteristics though.

> >
> > And C++ allocators waste space to avoid memory fragmentation... But doing so also hits locality.
>
> No, no space is wasted, unlike GC which requires descriptors for every object.

I do not understand what you mean with "descriptors"; the per-object overhead typically consists of a single word used for internal purposes, and another one that contains a reference to the class metadata (i.e. vtable).

The latter is similar to what any other OO language has, and the former is comparable to extra per-memory block information of your allocator. Typically normal allocators actually use a few words per block to store internal data, e.g. block length (at minimum), other data and various markers to detect invalid memory accesses. And certainly they do suffer from fragmentation.
Feel free to direct me to some information that shows the contrary.

I do not think a GC'ed system is at a disadvantage here; in the worst case you could use the same mechanism.

> > > array bounds checks,
> >
> > I was thinking about this one when you mentioned "other features", sometimes the
> > compiler is able to optimize it away and when not, last part of the post.
> >
> > > null pointer checks, assuming
> > > any pointer access may cause an exception,
> >
> > In x86 .Net this check is "cmp eax, [eax]" with the pointer in eax, on field access there
> > is no check at all since it will raise an exception anyway in the case of a null pointer.
> >
> > Since null pointer checks are so cheap it is not clear wich optimizations are disabled
> > by them, just put the check where it is needed to keep the correct order.
>
> Since when is a memory access cheap? Every unnecesary instruction has a cost.
>
> > > multithreading support etc etc.
> >
> > How exactly this lowers performance?
>
> The barriers and other checks for concurrent GC or multithreaded access
> to fields are not exactly zero-cost and block many optimizations.
>
> > > It is significantly harder to write a good compiler
> > > for them, and even then you can never get close to C++ performance. Many optimizations have to be
> > > disabled or turned extremely conservative as an exception or GC may occur at any time.

I am not sure whether it is significantly harder to write a good compiler for them: I agree that you need to spend more time to get to the same performance, but also because there are more components involved (e.g. VM/memory management).

However, there is already a body of previous work/research in that area that should give good results already, and you can also apply much of the existing research in static compilation optimizations.

Especially the compiler can be made a reasonably well encapsulated component of the whole system, while it's harder for other parts.

> managed languages are usually
> > more strict about ordering, and it is not obvious weak ordering improves performance by that much.
>
> The problem is not just the ordering, but the fact that more operations can cause exceptions. That alone
> creates a lot of overhead as you need to model flows from every possible exception to all possible exception
> handlers. Local variable values need to be preserved for example, severely limiting optimizations.

The exceptions can only occur at well-defined points, and while it may lead to scaling back optimizations, typically they are not. To a very large degree, after an exception many values are not of any interest to the program at all (i.e. unused in the remaining program) and so you need not keep them around or generate code that keeps them around at all.

(When really needed, e.g. when debugging, you typically fall back to an interpreter.)

In the contrary, optimizations are typically extremely aggressive to the point of removing seemingly unused code parts using lots of different types of information, keeping only a check and calls to recompile if even necessary. E.g. in the example of the logging/diagnosing functionality you can decide from the information of available loaded classes that there cannot be multiple different subclasses.

A VM may even change these compilation decisions during runtime, e.g. noticing that a given branch that has been used a lot earlier is not used anymore.

> > And finally, back to array checks, yes, they reduce performance, and yes, it is a different discussion,
> > but frankly, the performance reduction is pretty small and a lot of security bugs would be avoided by
> > array bound checks, it is not something I would left behind even if performance was a big concern.
>
> The performance cost is high if you happen to use arrays a lot. Even if you think it is worth it,
> and ignore the overhead as small enough, many of such costs add up to something quite large.

That may be true, although a lot of effort has been put into hoisting these checks out of critical loops, making them negligible. They still add up, but in total I believe you can get performance that is at least not too far off to a statically compiled program for a JIT'ed one (in general).
Such JIT'ed/VMs often also provide more features than what a statical language (can) provide, e.g. dynamic class loading and more advanced introspection/reflection, so a comparison may not be completely valid.

What is true, is that performance tuning in such a system is much harder, as it now depends on much more factors than in a statically compiled program.

V
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: ARM Goes 64-bitDavid Kanter08/14/12 12:04 AM
  New Article: ARM Goes 64-bitnone08/14/12 12:44 AM
    New Article: ARM Goes 64-bitDavid Kanter08/14/12 01:04 AM
    MIPS MT-ASEPaul A. Clayton08/14/12 09:01 AM
      MONITOR/MWAITEduardoS08/14/12 10:08 AM
        MWAIT not specifically MTPaul A. Clayton08/14/12 10:36 AM
          MWAIT not specifically MTEduardoS08/15/12 03:16 PM
        MONITOR/MWAITanonymou508/14/12 11:07 AM
          MONITOR/MWAITEduardoS08/15/12 03:20 PM
      MIPS MT-ASErwessel08/14/12 10:14 AM
  New Article: ARM Goes 64-bitSHK08/14/12 02:01 AM
  New Article: ARM Goes 64-bitanon08/14/12 02:37 AM
    New Article: ARM Goes 64-bitRichard Cownie08/14/12 03:57 AM
      New Article: ARM Goes 64-bitanon08/14/12 04:29 AM
      New Article: ARM Goes 64-bitnone08/14/12 04:44 AM
        New Article: ARM Goes 64-bitanon08/14/12 05:28 AM
          New Article: ARM Goes 64-bitanon08/14/12 05:32 AM
            New Article: ARM Goes 64-bitEduardoS08/14/12 06:06 AM
          New Article: ARM Goes 64-bitnone08/14/12 05:40 AM
            AArch64 select better than cmovPaul A. Clayton08/14/12 06:08 AM
            New Article: ARM Goes 64-bitanon08/14/12 06:12 AM
              New Article: ARM Goes 64-bitnone08/14/12 06:25 AM
                Predicated ld/store are usefulPaul A. Clayton08/14/12 06:48 AM
                  Predicated ld/store are usefulnone08/14/12 06:56 AM
                    Predicated ld/store are usefulanon08/14/12 07:07 AM
                    Predicated stores might not be that badPaul A. Clayton08/14/12 07:27 AM
                      Predicated stores might not be that badDavid Kanter08/15/12 01:14 AM
                        Predicated stores might not be that badMichael S08/15/12 11:41 AM
                        Predicated stores might not be that badR Byron08/17/12 04:09 AM
                New Article: ARM Goes 64-bitanon08/14/12 06:54 AM
                  New Article: ARM Goes 64-bitnone08/14/12 07:04 AM
                    New Article: ARM Goes 64-bitanon08/14/12 07:43 AM
          New Article: ARM Goes 64-bitEduardoS08/14/12 06:07 AM
            New Article: ARM Goes 64-bitanon08/14/12 06:20 AM
              New Article: ARM Goes 64-bitnone08/14/12 06:29 AM
                New Article: ARM Goes 64-bitanon08/14/12 07:00 AM
            New Article: ARM Goes 64-bitMichael S08/14/12 03:43 PM
        New Article: ARM Goes 64-bitRichard Cownie08/14/12 06:53 AM
          OT: Conrad's "Youth"Richard Cownie08/14/12 07:20 AM
      New Article: ARM Goes 64-bitEduardoS08/14/12 06:04 AM
        New Article: ARM Goes 64-bitmpx08/14/12 08:59 AM
          New Article: ARM Goes 64-bitAntti-Ville Tuunainen08/14/12 09:16 AM
        New Article: ARM Goes 64-bitanonymou508/14/12 11:03 AM
          New Article: ARM Goes 64-bitname9911/17/12 03:31 PM
            Microarchitecting a counter registerPaul A. Clayton11/17/12 07:37 PM
    New Article: ARM Goes 64-bitbakaneko08/14/12 04:21 AM
      New Article: ARM Goes 64-bitname9911/17/12 03:40 PM
        New Article: ARM Goes 64-bitEduardoS11/17/12 04:52 PM
        New Article: ARM Goes 64-bitDoug S11/17/12 05:48 PM
        New Article: ARM Goes 64-bitbakaneko11/18/12 05:40 PM
          New Article: ARM Goes 64-bitWilco11/19/12 07:59 AM
            New Article: ARM Goes 64-bitEduardoS11/19/12 08:23 AM
              New Article: ARM Goes 64-bitWilco11/19/12 09:31 AM
                Downloading µarch-specific binaries?Paul A. Clayton11/19/12 11:21 AM
                New Article: ARM Goes 64-bitEduardoS11/19/12 11:41 AM
                  New Article: ARM Goes 64-bitWilco11/21/12 07:44 AM
                    JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)VMguy11/22/12 03:21 AM
                      JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)David Kanter11/22/12 12:12 PM
                        JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)Gabriele Svelto11/23/12 03:50 AM
                    New Article: ARM Goes 64-bitEduardoS11/23/12 10:09 AM
                      New Article: ARM Goes 64-bitEBFE11/26/12 01:24 AM
                        New Article: ARM Goes 64-bitGabriele Svelto11/26/12 03:33 AM
                          New Article: ARM Goes 64-bitEBFE11/27/12 11:17 PM
                            New Article: ARM Goes 64-bitGabriele Svelto11/28/12 02:32 AM
                        New Article: ARM Goes 64-bitEduardoS11/26/12 12:16 PM
                          New Article: ARM Goes 64-bitEBFE11/28/12 12:33 AM
                            New Article: ARM Goes 64-bitEduardoS11/28/12 05:53 AM
                              New Article: ARM Goes 64-bitMichael S11/28/12 06:15 AM
                                New Article: ARM Goes 64-bitEduardoS11/28/12 07:33 AM
                                  New Article: ARM Goes 64-bitMichael S11/28/12 09:16 AM
                                    New Article: ARM Goes 64-bitEduardoS11/28/12 09:53 AM
                                    New Article: ARM Goes 64-bitEugene Nalimov11/28/12 05:58 PM
                                      Amazing!EduardoS11/28/12 07:25 PM
                                        Amazing! (non-italic response)EduardoS11/28/12 07:25 PM
                                        Amazing!EBFE11/28/12 08:20 PM
                                          Undefined behaviour doubles downEduardoS11/28/12 09:10 PM
                              New Article: ARM Goes 64-bitEBFE11/28/12 07:54 PM
                                New Article: ARM Goes 64-bitEduardoS11/28/12 09:21 PM
                Have you heard of Transmeta?David Kanter11/19/12 03:47 PM
            New Article: ARM Goes 64-bitbakaneko11/19/12 09:08 AM
            New Article: ARM Goes 64-bitDavid Kanter11/19/12 03:40 PM
              Semantic Dictionary EncodingRay11/19/12 10:37 PM
              New Article: ARM Goes 64-bitRohit11/20/12 04:48 PM
                New Article: ARM Goes 64-bitDavid Kanter11/20/12 11:07 PM
                  New Article: ARM Goes 64-bitWilco11/21/12 06:41 AM
                    New Article: ARM Goes 64-bitDavid Kanter11/21/12 10:12 AM
                    A JIT exampleMark Roulo11/21/12 10:30 AM
                      A JIT exampleWilco11/21/12 07:04 PM
                        A JIT examplerwessel11/21/12 09:05 PM
                        A JIT exampleGabriele Svelto11/23/12 03:53 AM
                        A JIT exampleEduardoS11/23/12 10:13 AM
                          A JIT exampleWilco11/23/12 01:41 PM
                            A JIT exampleEduardoS11/23/12 02:06 PM
                            A JIT exampleGabriele Svelto11/23/12 04:09 PM
                              A JIT exampleSymmetry11/26/12 05:58 AM
            New Article: ARM Goes 64-bitRay11/19/12 10:27 PM
    New Article: ARM Goes 64-bitDavid Kanter08/14/12 09:11 AM
  v7-M is Thumb-onlyPaul A. Clayton08/14/12 06:58 AM
  Minor suggested correctionPaul A. Clayton08/14/12 08:33 AM
    Minor suggested correctionanon08/14/12 08:57 AM
  New Article: ARM Goes 64-bitExophase08/14/12 08:33 AM
    New Article: ARM Goes 64-bitDavid Kanter08/14/12 09:16 AM
      New Article: ARM Goes 64-bitjigal08/15/12 01:49 PM
  Correction re ARM and BBC MicroPaul08/14/12 08:59 PM
    Correction re ARM and BBC MicroPer Hesselgren08/15/12 03:27 AM
  Memory BW so lowPer Hesselgren08/15/12 03:14 AM
    Memory BW so lownone08/15/12 11:16 AM
  New Article: ARM Goes 64-bitdado08/15/12 10:25 AM
  Number of GPRsKenneth Jonsson08/16/12 02:35 PM
    Number of GPRsExophase08/16/12 02:52 PM
      Number of GPRsKenneth Jonsson08/17/12 02:41 AM
        Ooops, missing link...Kenneth Jonsson08/17/12 02:44 AM
        64-bit pointers eat some performancePaul A. Clayton08/17/12 06:19 AM
          64-bit pointers eat some performancebakaneko08/17/12 08:37 AM
            Brute force seems to workPaul A. Clayton08/17/12 10:08 AM
              Brute force seems to workbakaneko08/17/12 11:15 AM
          64-bit pointers eat some performanceRichard Cownie08/17/12 08:46 AM
            Pointer compression is atypicalPaul A. Clayton08/17/12 10:43 AM
              Pointer compression is atypicalRichard Cownie08/17/12 12:57 PM
                Pointer compression is atypicalHoward Chu08/22/12 10:17 PM
                  Pointer compression is atypicalRichard Cownie08/23/12 04:48 AM
                    Pointer compression is atypicalHoward Chu08/23/12 06:51 AM
              Pointer compression is atypicalWilco08/17/12 02:41 PM
                Pointer compression is atypicalRichard Cownie08/17/12 04:13 PM
                  Pointer compression is atypicalRicardo B08/19/12 10:44 AM
                  Pointer compression is atypicalHoward Chu08/22/12 10:08 PM
                    Unified libraries?Paul A. Clayton08/23/12 07:49 AM
                    Pointer compression is atypicalRichard Cownie08/23/12 08:44 AM
                      Pointer compression is atypicalHoward Chu08/23/12 05:17 PM
                        Pointer compression is atypicalanon08/23/12 08:15 PM
                          Pointer compression is atypicalHoward Chu08/23/12 09:33 PM
            64-bit pointers eat some performanceFoo_08/18/12 12:09 PM
              64-bit pointers eat some performanceRichard Cownie08/18/12 05:25 PM
                64-bit pointers eat some performanceRichard Cownie08/18/12 05:32 PM
            Page-related benefit of small pointersPaul A. Clayton08/23/12 08:36 AM
        Number of GPRsWilco08/17/12 06:31 AM
          Number of GPRsKenneth Jonsson08/17/12 11:54 AM
            Number of GPRsExophase08/17/12 12:44 PM
              Number of GPRsKenneth Jonsson08/17/12 01:22 PM
                Number of GPRsWilco08/17/12 02:53 PM
        What about dynamic utilization?Exophase08/17/12 09:30 AM
          Compiler vs. assembly aliasing knowledge?Paul A. Clayton08/17/12 10:20 AM
            Compiler vs. assembly aliasing knowledge?Exophase08/17/12 11:09 AM
            Compiler vs. assembly aliasing knowledge?anon08/18/12 02:23 AM
              Compiler vs. assembly aliasing knowledge?Ricardo B08/19/12 11:02 AM
                Compiler vs. assembly aliasing knowledge?anon08/19/12 06:07 PM
                  Compiler vs. assembly aliasing knowledge?Ricardo B08/19/12 07:26 PM
                    Compiler vs. assembly aliasing knowledge?anon08/19/12 10:03 PM
                      Compiler vs. assembly aliasing knowledge?anon08/20/12 01:59 AM
        Number of GPRsDavid Kanter08/17/12 12:46 PM
          RAT issues as part of reason 1Paul A. Clayton08/17/12 02:18 PM
        Number of GPRsname9911/17/12 06:37 PM
          Large ARFs increase renaming costPaul A. Clayton11/17/12 09:23 PM
    Number of GPRsDavid Kanter08/16/12 03:31 PM
    Number of GPRsRichard Cownie08/16/12 05:17 PM
    32 GPRs ~2-3%Paul A. Clayton08/16/12 06:27 PM
      Oops, Message-ID: aaed6e38-c7bd-467e-ba41-f40cf1020e5e@googlegroups.com (NT)Paul A. Clayton08/16/12 06:29 PM
      32 GPRs ~2-3%Exophase08/16/12 10:06 PM
        R31 as SP/zero is kind of neat (NT)Paul A. Clayton08/17/12 06:23 AM
        32 GPRs ~2-3%rwessel08/17/12 08:24 AM
          32 GPRs ~2-3%Exophase08/17/12 09:16 AM
            32 GPRs ~2-3%Max08/17/12 04:19 PM
      32 GPRs ~2-3%name9911/17/12 07:43 PM
    Number of GPRsmpx08/17/12 01:11 AM
      Latency and powerPaul A. Clayton08/17/12 06:54 AM
    Number of GPRsbakaneko08/17/12 03:09 AM
  New Article: ARM Goes 64-bitSteve08/17/12 02:12 PM
    New Article: ARM Goes 64-bitDavid Kanter08/19/12 12:42 PM
      New Article: ARM Goes 64-bitDoug S08/19/12 02:02 PM
      New Article: ARM Goes 64-bitAnon08/19/12 07:16 PM
      New Article: ARM Goes 64-bitSteve08/30/12 07:51 AM
  Scalar vs Vector registersRobert David Graham08/19/12 05:19 PM
    Scalar vs Vector registersDavid Kanter08/19/12 05:29 PM
  New Article: ARM Goes 64-bitBaserock ARM servers08/21/12 04:13 PM
    Baserock ARM serversSysanon08/21/12 04:14 PM
    A-15 virtualization and LPAE?Paul A. Clayton08/21/12 06:13 PM
      A-15 virtualization and LPAE?Anon08/21/12 07:13 PM
        Half-depth advantages?Paul A. Clayton08/21/12 08:42 PM
          Half-depth advantages?Anon08/22/12 03:33 PM
            Thanks for the information (NT)Paul A. Clayton08/22/12 04:04 PM
      A-15 virtualization and LPAE?C. Ladisch08/23/12 11:12 AM
        A-15 virtualization and LPAE?Paul08/23/12 03:17 PM
        Excessive pessimismPaul A. Clayton08/23/12 04:08 PM
          Excessive pessimismDavid Kanter08/23/12 05:05 PM
    New Article: ARM Goes 64-bitMichael S08/22/12 07:12 AM
      BTW, Baserock==product, Codethink==company (NT)Paul A. Clayton08/22/12 08:56 AM
  New Article: ARM Goes 64-bitReinoud Zandijk08/21/12 11:27 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?